Main

We examine the distribution of fatalities from major pandemics in history (spanning about 2,500 years), and build a statistical picture of their tail properties. Using tools from extreme value theory (EVT), we show that the distribution of the victims of infectious diseases is extremely fat-tailed, more than what one could be led to believe from the outset. Our goal is not, at this stage, to explain the emergence of these tail properties, but several plausible explanations are already available1,2,3.

A non-negative continuous random variable X is ‘fat-tailed’ if its survival function S(x) = P(Xx) decays as a power law x–1/ξ the more we move into the tail, that is, for x growing towards the right endpoint of X. More technically, X has a regularly varying survival function, that is S(x) = L(x) x–1/ξ, where L(x) is a slowly varying function, such that \(\mathop {{\lim }}\limits_{x \to \infty } {\frac{{L(cx)}}{{L(x)}}} = 1\) for c > 0 (refs. 2,3). The parameter ξ > 0 is known as the tail parameter, and it governs the ‘fatness’ of the tail: the larger ξ, the fatter the tail. Moreover, ξ determines the existence of moments, which include quantities such as the mean, the variance or the skewness. In fact, the moment of order p exists, that is E[Xp] < ∞, if and only if ξ < 1/p. For example, when ξ ≥ 1/2, the second moment E[X2] is not finite, and so the variance, which is nothing more than the second central moment, does not exist; just recall that Var(X)= E[X2] − E[X]2. In some literature, for example, ref. 4, the tail index is re-parameterised as α = 1/ξ, and its interpretation is naturally reversed.

In EVT, fat-tailed distributions are said to be in the maximum domain of attraction of the Fréchet distribution2,3, representing the potentially most-dangerous generators of extreme risks. In this class, we naturally find all the different versions of the Pareto distribution, as well as the Cauchy, the log-gamma and all stable distributions with stability parameter α < 2, to name a few. Well-known distributions such as the Gaussian and the exponential do not belong to this group, and are therefore considered thin-tailed. There are then distributions such as the log-normal, which are extremely tricky. Despite being theoretically in the same EVT class as the Gaussian distribution, the so-called Gumbel class, the log-normal distribution can be compared to a wolf in sheep’s clothing, showing an erratic behaviour depending on its parameters5,6.

While it is known that fat tails represent a common—yet often ignored5—regularity in many fields of science and knowledge4, to the best of our knowledge, only war casualties and operational risk losses show a behaviour7,8,9 as erratic and wild as the one we observe for pandemic fatalities.

The core of the problem is shown in Fig. 1, with the maximum-to-sum plot9 of the number of pandemic fatalities in history, using the data collected in Table 1. Such a plot relies on a simple consequence of the law of large numbers: for a sequence X1, X2, …, Xn of non-negative independent and identically distributed random variables, if E[Xp] < ∞ for p = 1, 2, 3…, then \(R_n^p\)= \(M_n^p/S_n^p \to\) 0 almost surely for n \(\to \infty\), where \(S_n^p\)=\(\mathop {\sum }\limits_{i = 1}^n X_i^p\) is the partial sum of order p, and \(M_n^p = {\mathrm{max}}(X_1^p, \ldots ,X_n^p)\) the corresponding partial maximum. Figure 1 clearly shows that no finite moment is likely to exist for the number of victims in pandemics, as the \(R_n^p\) ratio does not converge to 0 for p = 1, 2, 3, 4, no matter how many data points we use. Such a behaviour suggests that the victims’ distribution has such a fat right tail that not even the first theoretical moment is finite. We are therefore dealing with a phenomenon for which observed quantities such as the sample average and standard deviation are meaningless for inference.

Fig. 1: Maximum to sum plot.
figure 1

Maximum to sum plot (MS plot) of the average death numbers in pandemic events in history, based on the data presented in Table 1. The quantities \(M_n^p\) and \(S_n^p\) are the partial maximum and the partial sum of order p, respectively. The plot suggests that all moments, including the mean, could be infinite.

Table 1 The data set used for the analysis

However, this does not mean that pandemic risk is infinite and there is nothing we can do or model to get a handle on the problem. Using the methodology we have developed to study war casualties7,10, it is possible to extract useful information from the data, and quantify the large (yet finite) risk of pandemic diseases. In fact, our method provides rough estimates for quantities not immediately observable from the data.

The tail-wags-the-dog effect

The central point we wish to convey is the following: the more fat-tailed a statistical distribution, the more the ‘tail wags the dog’. That is to say, more statistical information resides in the extremes and less in the ‘bulk’—the events of high frequency—where it becomes almost noise. Under fat tails, the law of large numbers works slowly, and moments—even when they exist—may become uninformative and unreliable5. All this makes EVT the most effective and robust approach for risk management purposes, even with relatively small datasets like ours (see Table 1).

The presence of a fat right tail in the distribution of pandemic fatalities has the following policy implications, useful in the wake of the COVID-19 pandemic.

First, it should be evident that it is not appropriate to compare fatalities from multiplicative infectious diseases (fat-tailed, like a Pareto distribution) to those from car accidents, heart attacks or falls from ladders (thin-tailed, like a Gaussian). This remains a common (and costly) error in policy making, and in both the decision sciences and the journalistic literature. Some research papers even criticise the wider public’s ‘paranoia’ with respect to pandemics, not appreciating that such a paranoia is merely responsible (and realistic) risk management in front of potentially destructive events. The main problem is that those articles—often relied upon for policy making—consistently use the wrong thin-tailed distributions, underestimating tail risk, so that every conservative or preventative reaction is bound to be considered an overreaction.

Second, epidemiological models such as the susceptible–infectious–recovered (SIR) differential equations11, sometimes supplemented with simulation experiments12, while useful for scientific discussions for the bulk of the distributions of infections and deaths, or for understanding the dynamics of events after they have happened, should not be used for precautionary risk management, which should focus on maxima and tail exposures instead. It is not rigorous to use naive (but reassuring) statistics, such as the expected average outcome of compartmental models, or one or more point estimates, as a motivation for policies. Owing to the compounding effect of parameter uncertainty, the tail-wags-the-dog effect easily invalidates both point estimates and scenario analyses. However, it is encouraging to note that the impact of parameter uncertainty on the scenarios generated by epidemiological models has recently started to be examined more carefully13.

Extreme value theory is a natural candidate to handle pandemics. It was developed as a means to cope with maxima14, and it has subsequently evolved to deal with tail risk in a robust way, even with a limited number of observations and their associated uncertainty3. In the Netherlands, for example, EVT has been used to get a handle on the distribution of the maxima—and not the average—of sea levels in order to build dams and dykes high and strong enough for the safety of its citizens2.

Finally, EVT-based risk management is compatible with the (non-naive) precautionary principle15, which should be the leading driver for policy decisions under jointly systemic and extreme risks.

Data and descriptive statistics

We investigate the distribution of deaths from the major epidemic and pandemic diseases of history, from 429 bc until the present. The data are available in Table 1, together with their sources, and only refer to events with more than 1,000 estimated victims, for a total of 72 observations. As a consequence, potentially high-risk diseases such as the Middle East Respiratory Syndrome (MERS) do not appear in our collection. All diseases whose end year is 2020 are to be considered ongoing, as is the case for the current COVID-19 pandemic.

We use three estimates of the reported cumulative death toll: minimum, average and maximum. When the three numbers coincide in Table 1, our sources simply do not provide intervals for the estimates. Since we are well aware of the volatility and possible unreliability of historical data10,16, in Section 5 we deal with the issue by perturbing and omitting observations.

In order to compare fatalities with respect to the coeval population (that is, the relative impact of pandemics), the ‘Rescaled’ column in Table 1 provides the rescaled version of the ‘Average estimate’ column, using the information in the ‘Population’ column17,18,19. For example, the Antonine plague of 165–180 killed an average of 7.5 million people, that is to say 3.7% of the coeval world population of 202 million people. Using today’s population, such a number would correspond to about 283 million deaths, a terrible hecatomb, killing more people than World War II.

We restrict our attention to the actual average estimates in Table 1, but all our findings and conclusions also hold for the lower, the upper and the rescaled estimates as well.

Figure 2a shows the histogram of the average numbers of deaths in the 72 large contagious events. The distribution appears highly skewed and possibly fat-tailed. The numbers are as follows: the sample average is 4.9 million, while the median is 76 thousand, compatible with the skewness observable in Fig. 2. The 90% quantile is 6.5 million and the 99% quantile is 137.5 million. The sample standard deviation is 19 million.

Fig. 2: Graphical analyses of the average number of deaths.
figure 2

a, Histogram of the average number of deaths in the 72 contagious diseases of Table 1. b, Log–log plot of the empirical survival function (Zipf plot) of the actual average death numbers in Table 1. The red line represents a naive linear fit of the decaying tail. c, Mean excess function plot (meplot) of the average death numbers in Table 1. The plot excludes three points on the top right corner, consistently with the suggestions in ref. 9 about the exclusion of the more volatile observations. d, Hill plot of the average death numbers in Table 1, with 95% confidence intervals. Clearly ξ > 1, suggesting the non-existence of moments.

Using common graphical tools for fat tails3,6, in Fig. 2b we show the log–log plot (also known as the Zipf plot) of the empirical survival functions for the average victims over the diverse contagious events. In such a plot, possible fat tails can be identified in the presence of a linearly decreasing behaviour of the plotted curve. To improve interpretability, a naive linear fit is also proposed. Figure 2b also suggests the presence of fat tails.

The Zipf plot shows a necessary but not sufficient condition for fat tails6. Therefore, in Fig. 2c we complement the analysis with a mean excess function plot (meplot). If a random variable X is possibly fat tailed, its mean excess function \(e_X\left( u \right) = E[X - u|X \ge u]\) should grow linearly in the threshold u, at least above a certain value identifying the actual power law tail3. In a meplot, where the empirical eX(u) is plotted against the different values of u, one thus looks for some (more or less) increasing trend, such as the one we observe in Fig. 2c.

A useful tool for the analysis of tails—when one suspects them of being fat—is the non-parametric Hill estimator2,3. For a collection \(X_1, \ldots ,X_n\), let \(X_{n,n} \le \ldots \le X_{1,n}\)be the corresponding order statistics. We can then estimate the tail parameter ξ as

$$\hat \xi = \frac{1}{k}\mathop {\sum }\limits_{i = 1}^k \log \left( {X_{i,n}} \right) - \log \left( {X_{k,n}} \right),{\mathrm{ }}2 \le k \le n.$$

In Fig. 2d, the estimate \(\hat \xi\) is plotted against different values of k, creating the so-called Hill plot3. The plot suggests ξ > 1, in line with Fig. 1, further supporting the evidence of infinite moments.

Other graphical tools could be used and they would all confirm our basic point: we are in the presence of a fat right tail in the distribution of the victims of pandemic diseases. Furthermore, it is possibly a distribution with no finite moment (not even the mean).

The dual distribution

As we observed for war casualties7, the lack of moments for the distribution of pandemic victims is questionable. Since the distribution of victims is naturally bounded by the coeval world population, no disease can kill more people than those living on the planet at any given time. We are indeed looking at an apparently infinite-mean phenomenon, like in the case of war casualties7,10 and operational risk8.

Let [L, H] be the support of the distribution of pandemic victims today (that is, the range of variation for the number of fatalities), with L >> 0 to ignore small events not officially definable as a pandemic20. For what concerns H, its value cannot be larger than the world population, and we can safely take today’s estimates as the historical upper bound, that is, 7.7 billion people in 202019. Evidently, H is so large that the probability of observing values in its vicinity is in practice zero, and one always finds observations below a given M << H < ∞ (something like 150 million deaths using actual data). Thus, one could be fooled by data into ignoring H and taking it as infinite, up to the point of believing in an infinite mean phenomenon, as Fig. 1 suggests. However, notice that a finite upper bound H—no matter how large it is—is not compatible with infinite moments3, hence Fig. 1 risks being misleading.

In Fig. 3, the real tail of the random variable Y with remote upper bound H is represented by the dashed line. If one only observes values up to M << H, and more or less consciously ignores the existence of H, one could be fooled by the data into believing that the tail is actually the continuous one, the so-called apparent tail8. The tails are indeed indistinguishable for most cases, virtually in all finite samples, as the divergence is only clear in the vicinity of H. A bounded tail with very large upper limit is therefore mistakenly confused with an unbounded one, and no model will be able to tell the difference, even if epistemologically we are in two extremely different situations. This is the typical case in which critical reasoning, and the a priori analysis of the characteristics of the phenomenon under scrutiny, should precede any instinctive and uncritical fitting of the data.

Fig. 3: The apparent and real tail.
figure 3

Graphical representation (log–log plot) of what may happen if one ignores the existence of the finite upper bound H, since only M is observed.

A solution to this quandary is provided by the approach of refs. 7,8, which introduces the concept of dual data via a special log transformation. The basic idea is to find a way of matching naive extrapolations (apparently infinite moments) with appropriate modelling.

Let L and H be, respectively, the finite lower and upper bounds of a random variable Y, and define the function

$$\varphi \left( Y \right) = L - H\log \left( {\frac{{H - Y}}{{H - L}}} \right),$$
(1)

from which it follows that

$$\varphi \in C^\infty,$$
$$\varphi ^{ - 1}\left( \infty \right) = H,$$
$$\varphi ^{ - 1}\left( L \right) = \varphi \left( L \right) = L.$$

We then define \(Z = \varphi \left( Y \right)\) as a new random variable with lower bound L and an infinite upper bound. The transformation induced by \(\varphi\) does not depend on any of the parameters of the distribution of Y, and \(\varphi\) is monotonic. We call the distributions of Y and Z the real and the dual distribution, respectively. It can be verified that for values smaller than M << H, Y and Z are, in practice, indistinguishable (and so are their quantiles8).

As described in refs. 7,8, we take the observations in the ‘Average estimate’ column of Table 1, our Y values, and transform them into their dual Z values. We then study the unbounded duals using EVT, to find out that the naive observation of infinite moments can make sense in such a framework (but not for the bounded world population). Finally, by reverting to the real distribution, we compute the so-called shadow mean8 of pandemics, equal to

$$E\left[ Y \right] = \left( {H - L} \right){\mathrm{e}}^{\frac{{\frac{1}{\xi }\sigma }}{H}}\left( {\frac{\sigma }{{H\xi }}} \right)^{\frac{1}{\xi }}{{\varGamma }}\left( {1 - \frac{1}{\xi },\frac{\sigma }{{H\xi }}} \right) + L,$$
(2)

where \({\varGamma }\) is the Gamma function.

Notice that the random quantity Y is defined above L, therefore its expectation corresponds to a tail expectation with respect to the random variable Z, an ‘expected shortfall’ in financial jargon7. All moments of the random variable Y are called shadow moments in ref. 8, as they are not immediately visible from the data, but from plug-in estimation.

The dual tail via EVT and the shadow mean

Take the dual random variable Z whose distribution function G is unknown. Given a finite threshold \(u\), we can define the exceedance distribution of Z as

$$G_u\left( z \right) = P\left( {Z \le z{\mathrm{|}}Z > u} \right) = \frac{{G\left( z \right) - G(u)}}{{1 - G(u)}}$$
(3)

for \(z \ge u.\)

For a large class of distributions G, and high thresholds \(u \to \infty ,G_u\) can be approximated by a three-parameter generalized Pareto distribution (GPD)2, i.e.

$$G_u\left( z \right) \approx {\mathrm{GPD}}\left( {z;\xi ,\beta ,u} \right) = \left\{ {\begin{array}{*{20}{c}} {1 - \left( {1 + \xi \frac{{z - u}}{\beta }} \right)^{ - \frac{1}{\xi }}\quad \quad \xi \ne 0} \\ {1 - {\mathrm{e}}^{ - \frac{{z - u}}{\beta }}\quad \quad \quad \quad \quad \xi = 0} \end{array}} \right. ,$$
(4)

where \(z \ge u\) for \(\xi \ge 0\), \(u \le z \le u - \beta /\xi\) for \(\xi < 0\), \(u \in {\Bbb R}\), \(\xi \in {\Bbb R}\) and \(\beta > 0\).

Let us just consider \(\xi > 0\), which is the case for fat tails3. From equation (3), we see that \(G\left( z \right) = \left( {1 - G\left( u \right)} \right)G_u\left( z \right) + G(u)\), hence we obtain

$$G\left( z \right) \approx \left( {1 - G\left( u \right)} \right){\mathrm{GPD}}\left( {z;\xi ,\beta ,u} \right) + G\left( u \right) = 1 - \bar G(u)\left( {1 + \xi \frac{{z - u}}{\beta }} \right)^{ - 1/\xi },$$

with \(\bar G\left( x \right) = 1 - G(x)\). The tail of Z is therefore

$$\bar G\left( z \right) = \bar G\left( u \right)\left( {1 + \xi \frac{{z - u}}{\beta }} \right)^{ - \frac{1}{\xi }}.$$
(5)

Equation (5) is called the tail estimator of G(z) for \(z \ge u\). Given that G is, in principle, unknown, one usually substitutes G(u) with its empirical estimator nu/n, where n is the total number of observations in the sample, and nu is the number of exceedances above u.

Equation (5) then becomes

$$\bar G\left( z \right) = \frac{{n_u}}{n}\left( {1 + \xi \frac{{z - u}}{\beta }} \right)^{ - 1/\xi } \approx 1 - {\mathrm{GPD}}\left( {z^\ast ;\xi ,\sigma ,u} \right),$$
(6)

where \(\sigma = \beta \left( {\frac{{n_u}}{n}} \right)^\xi\), \(\mu = u - \frac{\beta }{\xi }\left( {1 - \left( {\frac{{n_u}}{n}} \right)^\xi } \right),\) and \(z^\ast \ge \mu\) is an auxiliary variable. Both σ and μ can be estimated semi-parametrically, starting from the estimates of ξ and β in equation (4). Since we are considering the case ξ > 0, the preferred estimation method is maximum likelihood2,3. For both the exceedances distribution and the recovered tail, the parameter ξ is the same, and it also coincides with the tail parameter we have used to define fat tails.

One can thus study the tail of Z without worrying too much about the rest of the distribution, that is, the part below u. All in all, the most destructive risks come from the right tail, and not from the first quantiles or even the bulk of the distribution. The identification of the correct u is a relevant question in extreme value statistics2,3. One can rely on heuristic graphical tools6, like the Zipf plot and the meplot we plotted above, or on statistical tests for extreme value conditions14 and GPD goodness-of-fit21.

What is important to stress—once again—is that the GPD fit needs to be performed on the dual quantities, to be statistically and epistemologically correct. One could in fact work with the raw observation directly, without the log-transformation of equation (1), surely ending up with ξ > 1, in line with Figs. 1 and 2d. But such an approach would be incorrect because only the dual observations are actually unbounded.

Working with the dual observations, we find out that the best GPD fit threshold is around 200,000 victims, with 34.7% of the observations lying above this value. For the GPD parameters, we estimate ξ = 1.62 (standard error 0.52), and β = 1.1747 × 106 (standard error 5.365 × 105). As expected, ξ > 1 once again supporting the idea of an infinite first moment. Visual inspections and statistical tests14,21 support the goodness-of-fit for the exceedance distribution and the tail.

Looking at the standard error of ξ, one could also argue that, with more data from the upper tail, the first moment could possibly become finite. Yet there is no doubt about the non-existence of the second moment, and thus about the unreliability of the sample mean5, which remains too volatile to be safely used. Pandemic fatalities would still be an extremely erratic phenomenon, with substantial tail risk. In any case, Figs 1 and 2d lead us to consider the first moment as infinite, and not to trust sample averages.

Given ξ and β, we can use equations (2) and (6) to compute the shadow mean of the numbers of victims in pandemics. For actual data we get a shadow mean of 20.1 million, which is definitely larger (almost 1.5 times) than the corresponding sample tail mean of 13.9 million (this is the mean of all the actual numbers above the 200,000 threshold). Combining the shadow mean with the sample mean below the 200,000 threshold, we get an overall mean of 7 million instead of the value 4.9 million we have computed initially. It is therefore important to stress that a naive use of the sample mean would be misleading, inducing a substantial underestimation of risk. Under fat tails, averages are very often tricky objects.

Other sample quantities and moments could be computed via the dual approach8, as the large (yet finite) upper bound guarantees their existence. Our argument is that it is these quantities, rather than the naive sample equivalents, or the reassuring point estimates of models underestimating tail risk, that should be the starting point of policy discussions.

Data reliability issues

Clearly, estimates of the number of victims in past pandemics are not at all unique and precise. Figures are very often anecdotal, based on citations and vague reports, and usually dependent on the source of the estimate. In Table 1, it is evident that some events vary considerably in estimates.

Natural questions thus arise: are the tail risk estimates we have presented robust? What happens if some of the casualties estimates change? What is the impact of ignoring some events in our collection? The use of extreme value statistics in studying tail risk already guarantees the robustness of our estimates to changes in the underlying data, when these lie below the threshold u. However, to verify robustness more rigorously and thoroughly, we have decided to stress the data, to study how the tail may potentially vary.

First of all, we have generated 10,000 distorted copies of our dual data. Each copy contains exactly the same number of observations as per Table 1, but every data point has been allowed to vary between 80% and 120% of its recorded value before imposing the log transformation of equation (1). In other words, each of the 10,000 new samples contains 72 observations, and each observation is a (dual) perturbation (20%) of the corresponding observation in Table 1.

Figure 4a contains the histogram of the ξ parameter over the 10,000 distorted copies of the dual numbers. The values are always above 1, indicating an apparently infinite mean, and the average value is 1.62 (standard deviation 0.10), in line with our previous findings. Our tail estimates are thus robust to imprecise observations. We find consistent results also for the β parameter.

Fig. 4: Historical imprecision and missing observations.
figure 4

a, Values of the shape parameter ξ over 10,000 distorted copies of the dual versions of the average deaths in Table 1, allowing for a random variation of ±20% for each single observation, to account for the likely imprecisions in the historical data. The ξ parameter consistently indicates an apparently infinite-mean phenomenon. b, Values of the shape parameter ξ over 10,000 jack-knifed versions of the dual versions of the actual average numbers in Table 1, when allowing at least 1% and up to about 10% of the observations to be missing or redundant. The ξ parameter consistently indicates an apparently infinite-mean phenomenon.

It also true that our data set is likely to be incomplete, not containing all epidemics and pandemics with more than 1,000 victims, or that some of the events we have collected are too biased to be reliable and should be discarded anyway. To account for this, we have once again generated 10,000 copies of our sample using a jack-knife resampling technique. Each new dual sample is obtained by removing from 1 to 7 observations at random, so that one sample could, for example, not contain the Spanish flu, while another could ignore the Yellow Fever and AIDS. In Fig. 4b we show the impact of such a procedure on the ξ parameter. Once again, it is clear that it is robust to these significant changes in the underlying data set.

In conclusion, the central message based on this work is that pandemics are a fat-tailed phenomenon, with an extremely large tail risk and potentially destructive consequences. These should not be downplayed in any serious policy discussion.