Skip to main content
Intended for healthcare professionals

Abstract

Since the beginning of the COVID-19 pandemic, the reproduction number R has become a popular epidemiological metric used to communicate the state of the epidemic. At its most basic, R is defined as the average number of secondary infections caused by one primary infected individual. R seems convenient, because the epidemic is expanding if R > 1 and contracting if R < 1 . The magnitude of R indicates by how much transmission needs to be reduced to control the epidemic. Using R in a naïve way can cause new problems. The reasons for this are threefold: (1) There is not just one definition of R but many, and the precise definition of R affects both its estimated value and how it should be interpreted. (2) Even with a particular clearly defined R , there may be different statistical methods used to estimate its value, and the choice of method will affect the estimate. (3) The availability and type of data used to estimate R vary, and it is not always clear what data should be included in the estimation. In this review, we discuss when R is useful, when it may be of use but needs to be interpreted with care, and when it may be an inappropriate indicator of the progress of the epidemic. We also argue that careful definition of R , and the data and methods used to estimate it, can make R a more useful metric for future management of the epidemic.

What is the reproduction number R ?

Since the start of the novel coronavirus (severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2)) pandemic, the reproduction number R has become a popular summary statistic, used by policymakers to assess the state of the epidemic and the efficacy of interventions, and by the media to communicate the progress of the epidemic to the general public. The primary appeal of R is that it offers a single number that indicates whether the transmission of the pathogen is increasing or decreasing, depending on whether R is above or below one. Early R estimates for SARS-CoV-2 in different countries were in the range of 2.0–6.51,2. However, the use of R can be problematic in terms of both its definition and estimation. Its usefulness is precisely because it is a summary statistic rather than a basic parameter describing the dynamic processes of infection, transmission and recovery. To understand how R is calculated and how it can be affected by interventions, the epidemic process needs to be considered in more detail. When epidemic numbers are small or concentrated in possibly atypical parts of a population, R may be an unreliable descriptor of the state of the outbreak.
In this paper, we discuss these issues and determine the situations when the reproduction number R is most useful for assessing and communicating the state of an outbreak (see Figure 1). We focus on the definitions of different types of R , for example, the basic reproduction number or the effective reproduction number that can be considered different quantities and their applicability to different phases of an epidemic. However, as we explain below, care must also be taken with subtly different definitions of the same type of R , for example, when using different models to analyse the progress of an epidemic.
Figure 1. Flowchart summarising the main points explained in the main text depending on the state of the epidemic.

The beginning of a pandemic – R 0

In the early stages of a new outbreak of an infectious disease, we can define an initial R value, known as the basic reproduction number R 0 , that is the average number of individuals infected by each infectious individual in a fully susceptible population35. An outbreak resulting from one infected individual may die out within a few infection generations by chance6,7. Otherwise, if R 0 > 1 , the incidence of cases will grow exponentially, with on average R 0 n cases in the n th generation. Already, this simple description introduces a number of concepts and assumptions. An individual’s infection generation specifies their position in the chain of infections, the ( n 1 ) th generation infects the n th generation, and so on. It also assumes an underlying scenario (model) in which the average number of susceptibles infected by each infective stays the same over successive infection generations, and ignores the depletion of susceptibles. (We refer to those members of the population who are uninfected and susceptible to infection as susceptibles, and those that are infected and infectious as infectives.) The potential importance of these assumptions depends on the contact structure of the population, to which we return below.
Thus, R 0 (and other R values to be defined later) is not just a property of the infectious agent (pathogen). It depends on demography, and whatever human behaviour is associated with the possibility of infectious contact (an effective contact is one that results in transmission if made with a susceptible, while a contact in the common sense of the word has a certain probability of transmission). For the simplest models, R 0 > 1 implies that an introduction of infection will result in an epidemic. Furthermore, if there were no interventions or changes in behaviour, then the proportion of the population infected during the entire course of an epidemic would be approximately the non-zero solution of the equation P = 1 e R 0 P (e.g. if R 0 = 2 , then P 0.8 ). This result is referred to as the final size equation and underscores the fact that during an epidemic it is not generally true that everybody will be infected at some point.
Individuals may vary considerably in their susceptibility to infection and in their propensity to pass it on through their biology or behaviour. Age is often an important determinant. If the population is grouped in some way, so that for instance some groups have higher R values than others, then the overall outbreak is expected to grow as described by an R 0 that depends on all of these values, and also depends on how each group infects the others, i.e. on the R values between groups as well as within them ( R 0 is then the dominant eigenvalue of the matrix of R values3,5,8). The first few stages of the outbreak may be atypical, depending on which group is first infected.
For the simplest mathematical model of the beginning of an outbreak, it is assumed that because only a small fraction of the population has been infected, all potential contacts are with susceptibles. This may be an unrealistic assumption because human interaction networks tend to be clustered (e.g. through households, workplaces or schools). Growth through successive generations of infection, which is the basis for defining R 0 , does not translate simply into time because the generation interval of an infection (the time interval back from the instant when a susceptible is infected, to that when their infector was infected) is variable, and infection generations may overlap temporally. Typically, growth in the early stages is faster than the simple assumption of a fixed average generation time would suggest and this is a major problem in estimating R 0 from early outbreak data. In addition, the implicit assumption is that all infectives are identifiable as such. If there is a significant proportion of asymptomatic cases, an estimate of R 0 may be affected by the time from when an asymptomatic infective has become infected to when he/she is expected to infect susceptibles. If this timing is the same for asymptomatic and symptomatic cases, then the estimate for R 0 will be unaffected.

The second simplest case: where an outbreak is widespread – R t

When the pandemic is well-established in a country (or region), with large numbers of cases most of which are internal to the country, an ‘effective reproduction number’ at time t , R t (sometimes denoted R e or R e f f ), is a useful descriptor of the progress of the outbreak (Figure 1). Again, the concept is of an average of how many new cases each infectious case causes. The value of R t may be affected by interventions: typically the aim is to reduce R t below one and to as small a value as possible. For models including detailed, and therefore complex, contact networks there may be more than one way of defining R t ; however, definitions should always agree that the value of R t is 1 when the expected number of new infections is constant.
The relevance of the assumptions here (large numbers of cases, mostly internal to the region) is that in such circumstances we expect R t to have a fairly stable value that changes substantially over time only when interventions are introduced or cease. The definition of R t here is in terms of actual new infectious cases, i.e. excluding potentially infectious contacts with individuals who have been infected and are immune to reinfection. As the number of immune individuals grows large compared to the entire population, the spread of infections will gradually slow, because many contacts will be with immune individuals, and hence the value of R t will be reduced. The level of immunity at which R t = 1 is the herd immunity threshold (see the ‘Methods’ section on vaccination and herd immunity below).

When the outbreak is at a low level or fragmented the concept of R may be less useful

If the outbreak is at a low level either because it has run its course or because of successful interventions, the definition and the use of an R value are problematic (Figure 1). At low levels of prevalence, there will (as in the early stages of the outbreak) be greater statistical variability. Additionally, there are likely to be heterogeneities associated with the infection being unevenly spread among different subgroups of the population (possibly dependent on age, behaviour or geographical location9), with some parts of the population having had more exposure than others. There may also be local variability in interventions, and it may not be easy to allow for the effect of some cases being introductions from outside the population under consideration. If the outbreak is fragmented, particularly when close to elimination, it will make more sense to think of it as composed of separate local outbreaks, which can be modelled separately, rather than trying to specify an average R value overall.

Relating R to details of the infection process

If the population is heterogeneous or structured, defining a reproduction number needs care, as the number of new cases, an infective is expected to cause will depend on both their infectiousness and how well connected they are. It has been shown that in the early stages of an epidemic, when the relevant contact structures of a population are not known and interventions are not targeted, assuming a homogeneous contact structure results in conservative estimates of R 0 and the required control effort. However, designing targeted intervention strategies requires reliable information on infectious contact structures10. There are several basic ways to use structured population models to capture departures from the simplest epidemic models. The four most common are (i) household models, (ii) multi-type models, (iii) network models, and (iv) spatial models.
In a household model, every person in the population is assumed to be part of a single household, which is typically small, and may even be of size one. Those in the same household have a higher probability of infecting each other than is the case for two people chosen randomly from the population. In this model, reproduction numbers can still be defined11,12. The most commonly used is the household reproduction number R * , which is the expected number of members of other households that are infected by people from a primary infected household. It is still possible to consider the average number of susceptibles infected by a single infectious person. However, for this to be useful, the average has to be computed in a sophisticated way, because the number of people a person can infect will depend on how many members of the same household are still susceptible when s/he becomes infectious13.
A second way of modelling heterogeneity in the population is to assume that the population can be subdivided into groups. The groups may be defined through age bands, social activity levels, health status, type of job, place of residence and so on. Characteristics such as susceptibility, infectivity and frequency of contact may depend on an individual’s group, but all those in a single group have the same characteristics. It is often assumed that all these groups are large. If there are regular inter-group contacts then the largest eigenvalue of the so-called next-generation matrix5,8 has many similar properties to those of R 0 for an epidemic spreading in a homogeneously mixing population, although the final size equation is generally not satisfied.
A third way of introducing heterogeneity is to represent the population by a network, where transmission is only possible between people sharing a link in the network. For many network models, it is still possible to define a reproduction number14. It is important to note that the person initially infected in a population is often atypical and should be ignored in computing or estimating the reproduction number. A useful extension is a mixture of a network model and a homogeneous mixing model, in which both regular and casual contacts are captured. In this extension, a reproduction number with the desired threshold properties can be defined15.
Sometimes most transmission is restricted to people living close to each other, and spatial models are useful when physical location should be incorporated. For these, it is often difficult to define a reproduction number because there is no phase in which the number of infected is growing exponentially16,17. If standard estimation methods are used where there is a considerable spatial component then the estimates will be close to one, even when the spread is highly supercritical and transmission needs to be much reduced to control the epidemic.

R , vaccination and herd immunity

As immunity builds up in a population through infection during the course of an epidemic, even when the contact rate between individuals remains the same (assuming no change in interventions), both the chance that a contact is susceptible to infection and the effective reproduction number, R t , will decrease. Herd immunity is achieved when enough individuals have become immune so that R t falls below the value 1 without the need to reduce contacts among individuals by non-pharmaceutical interventions.
Vaccination provides another means of building up immunity in a population. Depending on the coverage, it can slow or halt the spread of an epidemic, preventing individual infection or limiting experiences of the disease. All vaccination programmes aim to achieve sufficient immunity in the population that R t < 1 without modifying contact patterns among individuals. In this situation, there are insufficient susceptibles in the population for sustained transmission. The susceptible proportion of a population for which R t = 1 is known as the critical vaccination threshold (CVT). When the susceptible proportion is below this threshold, there is herd immunity, which means that the population is protected from a major outbreak even though not everyone is vaccinated or otherwise immune.
In simple mathematical models (e.g. models in which the population is only subdivided into susceptible, infected and recovered individuals), the CVT is determined by the basic reproduction number R 0 . Specifically, vaccination of a uniform randomly chosen proportion 1 ( 1 / R 0 ) of the population is sufficient to create herd immunity and prevent an epidemic, as long as the vaccine-induced immunity is sufficiently long-lasting18. As a simple example, if R 0 = 2 then 50% of a population would need to be vaccinated or otherwise immune to prevent outbreaks. If R 0 = 3 , as is approximately the case for COVID-19, then 67% of a population would need to be vaccinated or immune. When setting such vaccination targets, waning immunity needs to be taken into account. The implementation and impact of a vaccination programme depend on whether vaccination is performed before or during an outbreak19,20.
As outlined above, the population structure affects the reproduction numbers R 0 and R t as well as the probability that an epidemic will spread. Therefore, it has important effects on the threshold for herd immunity and the optimal vaccination strategy. For models with small mixing groups such as households, the basic reproduction number R 0 , as defined in the ‘The beginning of a pandemic – R 0 ’ section, does not provide a good indicator of whether or not an epidemic can take off because repeated contacts within households are likely even in the early stages of an outbreak. However, in the early stages of an epidemic, between-household contacts are likely to be with individuals in otherwise fully susceptible households, so the reproduction number R * , which is given by the average number of between-household contacts that emanate from a typical within-household epidemic21,22 can be used instead. For household models, herd immunity is achieved if a uniform randomly chosen proportion 1 ( 1 / R * ) of all households in a population is fully vaccinated.
For COVID-19, a toy model has been used to illustrate the effect of population heterogeneity on herd immunity. It showed23 that age structure and variation in social contacts among individuals could reduce the herd immunity threshold to 43%, almost a third less than that for a homogeneous population. Assuming a more extreme variation in social contact rates and that the most exposed individuals become infected first, another study estimates that the herd immunity threshold in some populations could be as low as 20%24. In addition, there is some indication that immunity gained from infection by some common cold coronavirus strains may provide cross-immunity to SARS-COV-225,26. There have also been reports that immunity gained from COVID-19 infection may wane, reducing individual and population levels of immunity over time. If these observations are indeed applicable here, the herd immunity threshold could be further modified26.
One important difference between immunisation by vaccination and by infection is that, during an epidemic, individuals with higher susceptibilities and/or larger numbers of contacts are likely to be infected earlier. If herd immunity is to be achieved by vaccination, optimal planning can reduce the coverage required to achieve herd immunity. For example, in an illustrative households model for variola minor infections in Brazil, it is shown that under the optimal vaccination strategy the proportion of the population that needs to be vaccinated is a third less than under a strategy that fully vaccinates randomly chosen households27. Although several COVID-19 vaccines have been developed, global demand in the early phases of vaccine roll-out still exceeds supply. Designing optimal vaccination strategies for different settings that take into account population structure alongside other public health concerns, e.g. protecting the vulnerable, could greatly enhance the chances of achieving herd immunity and the cost-effectiveness of vaccination as an intervention.

How can R be estimated?

Before estimating R , the purpose of the estimation needs to be clarified. Is it intended simply to track the changes in the trajectory of case numbers over time? Or is it intended to assess the potential for pathogen transmission in a specific population, perhaps in the context of considering interventions? If the latter, the relevant population needs to be defined. Depending on the purpose, different data sets and statistical methods can be used.
There are several approaches to estimating R t from epidemiological data. In the most direct method, high-quality contact tracing data can be used, in theory at least, to estimate both R t and the generation time interval, and this has been attempted for COVID-1928. However, contact tracing of SARS-CoV-2 infections is notoriously difficult because of the high proportion of asymptomatic infections. Moreover, effective contact tracing reduces the number of contacts of traced individuals so that the corresponding estimates are biased.
More commonly, R t can be estimated by inferring the rate of infection transmission within a dynamical model fitted to observed cases, hospitalisations, deaths or a combination of those29,30. Dynamical models have been used widely to forecast the spread of COVID-19 and the effect of interventions. These models allow the impact of assumed changes in specific interventions on R t to be explored, so estimating R t in this way can be convenient. Dynamical models can be described by systems of differential equations and assume very large to infinite population sizes. In completely deterministic dynamical models, the uncertainty in estimated R t values depends only on data and parameter uncertainty, and not on stochastic uncertainty. However, if the number of new infections is small, the value of R t is strongly affected by chance events, which increases the uncertainty in the estimate. This situation can be addressed by the use of stochastic models or incorporating stochastic assumptions in otherwise deterministic model frameworks.
But this approach is not without drawbacks. Not least, R t estimates from dynamical models depend critically on assumptions (e.g. model structure and which parameter values are estimated), and on data quality. Another potential drawback is that many parameters of dynamical models are often assumed to be fixed over time. These approaches are therefore less suited to capture the effects of gradual, continuous changes in behaviour, mobility or social network structure. However, gradual changes in dynamic models can be incorporated by assuming that transmission parameters change over given intervals, while at the same time the possible amount of change is constrained to avoid big jumps caused by a small number of noisy data points31. In this way, models that include change-points in the rate of infection near specific interventions can infer the impact of control policies, as well as the effect of susceptible depletion.
There is also a difference in how R t is estimated between compartmental and agent- or individual-based models. In an agent-based model, it is possible simply to count exactly how many secondary infections are caused by each primary infection. Thus, all details of the epidemic including time-varying viral loads, population-level and localised immunity, interventions, network factors, and other effects are automatically incorporated and do not need to be considered separately32. As agent-based models explicitly include stochastic effects, the uncertainty in R t estimates can be greater than for those derived from deterministic dynamical models. Because of the greater number of parameters included in dynamical and particularly agent-based models, they require more data and more different types of data than the simpler statistical models described below to identify estimates for all parameters.
A third approach uses statistical models to estimate R t , and continuous changes in it, empirically from case notification data. These methods make minimal structural assumptions about epidemic dynamics, and only require users to specify the distribution of the generation interval. They are agnostic to population susceptibility or epidemic phase, but as we discuss below, care must still be taken to avoid quantitative and temporal biases. The most common empirical methods are the Cori method33,34 and the Wallinga–Teunis method35. The drawbacks of some statistical models include that they cannot be used to combine different data streams into a coherent picture.
Where genome sequences from viral samples taken from infected patients are available and the date of sampling is known, R t can also be estimated using phylogenetic methods. An evolutionary model is fitted that best explains the patterns of nucleotide substitution in the dated samples. The fitted model parameters include the nucleotide substitution rate and the population size of the virus at a given time in the past. Using a metapopulation analogy, the effective population size of a pathogen has been shown to be proportional to the number of infected individuals and inversely proportional to the transmission rate from which the reproduction number can be determined36.

Statistical methods to estimate R

In this section, we discuss two frequently used simple statistical methods to estimate R and common issues associated with them. The Cori and Wallinga–Teunis methods estimate subtly different versions of R t ; the Cori method generates estimates of the instantaneous reproduction number and the Wallinga–Teunis method generates estimates of the case reproduction number33,37. The key difference is that the instantaneous reproduction number gives an average R t for a homogeneous population at a single point in time, whereas the case reproduction number can accommodate individual heterogeneity, but blurs over several dates of transmission. Furthermore, the case reproduction number is a leading estimator of the instantaneous reproduction number, i.e. it depends on data from after the time for which the reproduction number is to be estimated, and must be adjusted accurately to infer the impact of time-specific interventions38.
The instantaneous reproduction number represents the expected number of infections generated at time t by currently infectious individuals33. For real-time analysis, one of the benefits of estimating the instantaneous reproduction number is that it does not require information about future changes in transmissibility, and it reflects the effectiveness of control measures in place at time t . But as an aggregate measure of transmission by all individuals infected in the past (who may now be shedding virus), it does not easily consider heterogeneity in transmission. In contrast, the case reproduction number represents the expected number of infections generated by an individual who is first infected at time t and has yet to progress through the full course of viral shedding. This leads to ‘right censoring’ when the case reproduction number is estimated in real-time; if all infections generated by individuals who were infected at time t have not yet been observed, then the data must be adjusted3941 or the case reproduction number will be underestimated.
The Cori method and the Wallinga–Teunis method involve inferring the values of R t that are most consistent with observed incidence data (for a review, see Gostic et al.38). In the Cori method, typically this inference is carried out by assuming that R t is constant over fixed time windows. Smoothing windows are used to avoid spurious fluctuations in estimates of R t . These can occur if imperfect observation and reporting effects, rather than actual bursts in transmission, are the main source of noise in the data. Cross-validation and proper scoring rules can be used to avoid under- or oversmoothing R t estimates42.
An important concept, basic to both methods, is the intrinsic generation time also referred to as the infectiousness profile. The intrinsic generation interval is a theoretical quantity derived from the renewal equation of Lotka and Euler30,43. It describes the time distribution of potentially infectious contacts made by an index case and is independent of population susceptibility44. In practice, the intrinsic generation interval is not observable, and it must be estimated carefully from observed serial intervals within contact tracing or household data4447. The serial interval is generally defined as the duration of time between the onset of symptoms in an index case and in a secondary case48. In the early stages of an outbreak, accurate estimation should adjust for right truncation of observations, for changes over time in population susceptibility, and for interventions such as case isolation, which may shorten the generation interval by limiting transmission events late in the course of infectiousness44,45,49.
Both the Cori and Wallinga–Teunis methods are conceptually based on separating the infectiousness of an infective into two components, total amount and timing. The timing is expressed by the generation time distribution while the total amount is expressed by R t . The variation of (average) infectivity over time is ascribed, at least in practical implementations of the methods, to changes in R t , while the intrinsic generation time is assumed to remain fixed. This is a simplification that may lead to inaccurate estimation of R t , since, in reality, the observed generation time distribution varies over time, both because of the epidemic dynamics48,50,51, because of the epidemic affecting different subgroups of the population, with possibly different generation time distributions over time52,53, and, more importantly, because of interventions that affect the length or efficacy of the infectious period49. An additional complication is that the ‘intrinsic’ generation interval of the Cori and Wallinga–Teunis estimators includes potentially infectious contacts with both susceptible and immune individuals, whereas only contacts with susceptible individuals cause new infections, and are observed in contact tracing44,45. Even when using an accurately estimated fixed generation time distribution, both R t estimators are numerically sensitive to the specified mean and variance of the intrinsic generation interval54.

Data used to estimate R

Fundamentally, R t is a measure of transmission. Ideally, it would be estimated from data on the total number of incident infections (i.e. transmission events) occurring each day. But in practice, only a small fraction of infections are observed, and notifications do not occur until days or weeks after the moment of infection. Temporally accurate R t estimation requires adjusting for lags to observation, which can be estimated as the sum of the incubation period and delays from symptom onset to case observation54,55. Delays not only shift observations into the future, but they also blur infections incident on a particular date across many dates of observation. This blurring can be particularly problematic when working with long and variable delays (e.g. from infection to death), and when R t is changing. Deconvolution5659, or R t estimation models that include forward delays60 can be used to adjust lagged observations. Simpler approaches may be justifiable under some circumstances. If observation delays are relatively short and not highly variable, and if R t is not rapidly changing, simply shifting unadjusted R t estimates back in time by the mean delay can provide a reasonable approximation to the true value (see Challen et al.,54 in this volume, for an in-depth discussion). The advantages and disadvantages of each approach are reviewed in Gostic et al.38. Changes over time in case ascertainment can also bias R t estimates, so ideally data should be drawn from structured surveillance (see, e.g., the REACT study61) or adjusted for known changes in testing or reporting effort61,62.
In practice, R t can be estimated from a time series of new symptom onset reports, cases, hospitalisations or deaths. Choosing an appropriate data stream involves weighing representativeness, timeliness of reporting, consistency of ascertainment, and length of lag. For example, reported deaths may be reasonably unaffected by changes over time in ascertainment, but adjusting for long lags to observation can be challenging, and deaths may not be representative of overall transmission (e.g. if the epidemic shifts towards younger age groups)63,64. Extensions of existing statistical models for R t estimation could potentially integrate multiple kinds of data, by assuming that, for example, cases, hospitalisations and deaths, arise from a shared, latent infection process, with different delays38. A mechanistic model can also pull multiple data streams together by modelling the different processes underlying each data stream. Problems can arise if different data streams disagree on the progress of the pandemic. However, if the disagreement is caused by a shift in delays from events to reporting in different data streams, a mechanistic model can highlight these changes. Sometimes different data streams can be used for model validation.
All methods used to estimate R t must decide on the length of the time window over which it is to be estimated. All data used to estimate R t are noisy. The shorter the time window used for estimation, the higher will be the noise-to-signal ratio and, therefore, the uncertainty in the estimate of R t . In contrast, longer time windows will produce estimates with lower uncertainty, but sudden changes in transmission may not be detected if the time window is too long.

Summary: cautions and recommendations

During the early phase of the epidemic:
R 0 estimates in the early phase may not be representative for the population as a whole if the group of initial transmitters is atypical.
R 0 may be incorrectly estimated in the early phase if infected but asymptomatic individuals are not counted or recognised, and their epidemiologically relevant behaviour differs from that of symptomatic individuals.
When the epidemic is established in the population:
R t can differ for different population groups, and the value of R t is dominated by the group in which most transmission occurs. To improve targeted containment measures, where possible additional information should be reported alongside case data, such as demographic, socio-economic and occupational information.
The estimated value of R t and its associated uncertainty depend on the data stream(s) used and the time window over which R t was estimated, and these should be reported alongside the estimates. This will make it possible to draw more robust conclusions when considering results from different models.
Model components that are likely to change over the time course of the epidemic (e.g. the generation time distribution) should be updated regularly, and sensitivity to changing assumptions should be kept under consideration.
When the ongoing epidemic is fragmented:
R t estimates from local outbreaks, if they can be contained, cannot inform on the progress of the epidemic and efficacy of interventions at the national level. They may inform local interventions. Other descriptors should be considered to assess the progress of the epidemic, such as the number of new cases per capita per day in a defined area, the number of hospitalisations and the spare hospital and intensive care capacity.
Imported cases that are effectively quarantined should not be counted towards R t estimates as they do not contribute to the local transmission potential in the community.
Vaccination and herd immunity:
If the available vaccine supply is limited, optimal vaccination strategies should be designed that take into account population structure and the transmission potential within different groups and other public health priorities, e.g. protection of the vulnerable groups.
In conclusion, estimated R values do not exactly correspond to the theoretically defined quantities. In statistical terms, model uncertainty, sampling variability, and data accuracy affect the estimates. Nevertheless, R 0 and R t are useful quantities to assess the potential and progress of an epidemic. Their usefulness for decision making varies depending on the phase of the epidemic (early, established and fragmented). Clearly defining the context, the data streams and the statistical methods used to estimate R can improve its value for the management of an epidemic.

Acknowledgements

The authors would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Infectious Dynamics of Pandemics where work on this paper was undertaken. This work was supported by EPSRC grant no EP/R014604/1. EBP acknowledges funding from the Medical Research Council.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This work was supported by EPSRC grant no EP/R014604/1. EBP acknowledges funding from the Medical Research Council (MRC) (MC/PC/19067) and the NIHR Health Protection Research Unit in Behavioural Science and Evaluation at the University of Bristol. Support for RC’s research is provided by the EPSRC via grant EP/N014391/1, RC is also funded by the NHS Global Digital Exemplar programme (GDE). JMH acknowledges funding from the Natural Science and Engineering Research Council of Canada (NSERC), Canadian Institutes for Health Research (CIHR). MGR is supported by the Marsden Fund under contract MAU1718. GPST acknowledges the MIUR Excellence Department Project awarded to the Department of Mathematics, University of Rome Tor Vergata, CUP E83C18000100006. LP acknowledges the Wellcome Trust and the Royal Society (grant 202562/Z/16/Z) for funding. EK was supported by the National Institute of Allergy and Infectious Diseases (NIAID) grant R01 AI116770. PT acknowledges Vetenskapsrådet (Swedish Research Council), grant 2016-04566. KMG acknowledges fellowship support from the James S. McDonnell Foundation. The contents are solely the responsibility of the authors and do not necessarily represent the official views of NIAID or the US National Institute of Health.

ORCID iDs

References

1. Tang B, Wang X, Li Q, et al. Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions. J Clin Med 2020; 9: 462.
2. Jit M, Jombart T, Nightingale ES, et al. Estimating number of cases and spread of coronavirus disease (COVID-19) using critical care admissions, United Kingdom, February to March 2020. Eurosurveillance 2020; 25: 2000632.
3. Heesterbeek J. R 0 . PhD Thesis, University of Leiden, 1992.
4. Heesterbeek J. A brief history of R 0 and a recipe for its calculation. Acta Biotheor 2002; 50: 189–204.
5. Diekmann O, Heesterbeek J, Britton T. Mathematical tools for understanding infectious disease dynamics. Princeton, Oxford: Princeton University Press, 2013.
6. Kendall DG. Deterministic and stochastic epidemics in closed populations. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, Statistical Laboratory, University of California, Berkeley and Los Angeles: University of California Press, 1956, vol. 4, pp. 149–165.
7. Thompson RN. Novel coronavirus outbreak in Wuhan, China, 2020: Intense surveillance is vital for preventing sustained transmission in new locations. J Clin Med 2020; 9: 498.
8. Diekmann O, Heesterbeek J, Roberts M. The construction of next-generation matrices for compartmental epidemic models. J R Soc Interface 2010; 7: 873–885.
9. Thompson RN, Hollingsworth TD, Isham V, et al. Key questions for modelling COVID-19 exit strategies. Proc R Soc B 2020; 287: 20201405.
10. Trapman P, Ball F, Dhersin JS, et al. Inferring R 0 in emerging epidemics—the effect of common population structure is small. J R Soc Interface 2016; 13: 20160288.
11. Ball F, Pellis L, Trapman P. Reproduction numbers for epidemic models with households and other social structures. II. Comparisons and implications for vaccination. Math Biosci 2016; 274: 108–139.
12. Goldstein E, Paur K, Fraser C, et al. Reproductive numbers, epidemic spread and control in a community of households. Math Biosci 2009; 221: 11–25.
13. Pellis L, Ball F, Trapman P. Reproduction numbers for epidemic models with households and other social structures. I. Definition and calculation of R 0 . Math Biosci 2012; 235: 85–97.
14. Kiss I, Miller J, Simon P. Mathematics of epidemics on networks. Cham: Springer, 2017. ISBN 978-3-319-50806-1.
15. Ball F, Neal P. Network epidemic models with two levels of mixing. Math Biosci 2008; 212: 69–87.
16. Davis S, Trapman P, Leirs H, et al. The abundance threshold for plague as a critical percolation phenomenon. Nature 2008; 454: 634–637.
17. Riley S, Eames K, Isham V, et al. Five challenges for spatial epidemic models. Epidemics 2015; 10: 68–71.
18. Smith C. Factors in the transmission of virus infections from animal to man. Sci Basis Med Annual Rev 1964; 125–150.
19. Heffernan J, Keeling M. Implications of vaccination and waning immunity. Proc R Soc B 2009; 276: 2071–2080.
20. Carlsson RM, Childs LM, Feng Z, et al. Modeling the waning and boosting of immunity from infection or vaccination. J Theor Biol 2020; 497: 110265.
21. Ball F, Mollison D, Scalia Tomba G. Epidemics with two levels of mixing. Ann Probab 1997; 7: 46–89.
22. Becker N, Dietz K. The effect of household distribution on transmission and control of highly infectious diseases. Math Biosci 1995; 127: 207–219.
23. Britton T, Ball F, Trapman P. A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2. Science 2020; 369: 846–849.
24. Gomes MGM, Corder RM, King JG, et al. Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold. medRxiv 2020; https://www.medrxiv.org/content/early/2020/05/21/2020.04.27.20081893.
25. Yaqinuddin A. Cross-immunity between respiratory coronaviruses may limit COVID-19 fatalities. Med Hypotheses 2020; 144: 110049. .
26. Sariol A, Perlman S. Lessons for COVID-19 immunity from other coronavirus infections. Immunity 2020; 53: 248–263.
27. Ball F, Lyne O. Optimal vaccination schemes for epidemics among a population of households, with application to variola minor in Brazil. Stat Methods Med Res 2006; 15: 481–497.
28. Ferretti L, Wymant C, Kendall M, et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 2020; 368: eabb6936.
29. Roberts M, Heesterbeek J. Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection. J Math Biol 2007; 55: 803–816.
30. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc R Soc B 2007; 274: 599–604.
31. Birrell P, Blake J, Van Leeuwen E, et al. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Philos Trans R Soc B Biol Sci 2021; 376: 20200279.
32. Panovska-Griffiths J, Kerr CC, Stuart RM, et al. Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the UK: a modelling study. Lancet Child Adolescent Health 2020; 4: P817–P827.
33. Cori A, Ferguson NM, Fraser C, et al. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol 2013; 178: 1505–1512.
34. Thompson RN, Stockwin JE, van Gaalen RD, et al. Improved inference of time-varying reproduction numbers during infectious disease outbreaks. Epidemics 2019; 29: 100356.
35. Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol 2004; 160: 509–516.
36. Lai A, Bergna A, Acciarri C, et al. Early phylogenetic estimate of the effective reproduction number of SARS-CoV-2. J Med Virol 2020; 92: 675–679.
37. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE 2007; 2: e758.
38. Gostic KM, McGough L, Baskerville EB, et al. Practical considerations for measuring the effective reproductive number, R t . PLoS Comput Biol 2020; 16: 1–21.
39. Cauchemez S, Boëlle PY, Thomas G, et al. Estimating in real time the efficacy of measures to control emerging communicable diseases. Am J Epidemiol 2006; 164: 591–597.
40. Cauchemez S, Boëlle PY, Donnelly CA, et al. Real-time estimates in early detection of SARS. Emerg Infect Dis 2006; 12: 110.
41. Overton CE, Stage HB, Ahmad S, et al. Using statistics and mathematical modelling to understand infectious disease outbreaks: COVID-19 as an example. Infect Dis Modell 2020; 5: 409–441.
42. Gasser T, Rosenblatt M. Smoothing techniques for curve estimation. Berlin, Heidelberg: Springer, 1979.
43. Kot M. The Lotka integral equation. In: Elements of mathematical ecology. Cambridge: Cambridge University Press, 2001. pp.353–364.
44. Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proc R Soc B 2015; 282: 20152026.
45. Park SW, Champredon D, Dushoff J. Inferring generation-interval distributions from contact-tracing data. J R Soc Interface 2020; 17: 20190719.
46. Hart WS, Maini PK, Thompson RN. High infectiousness immediately before COVID-19 symptom onset highlights the importance of continued contact tracing. eLife 2021; 10: e65534.
47. Hart WS, Endo A, Hellewell J, et al. Inference of SARS-CoV-2 generation times using UK household data. medRxiv 2021.
48. Svensson Å. A note on generation times in epidemic models. Math Biosci 2007; 208: 300–311.
49. Ali ST, Wang L, Lau EHY, et al. Serial interval of SARS-CoV-2 was shortened over time by non-pharmaceutical interventions. Science 2020; 369: 1106–1109.
50. Torneri A, Azmon A, Faes C, et al. Realized generation times: contraction and impact of infectious period, reproduction number and population size. bioRxiv 2019; 568485. https://www.biorxiv.org/content/early/2019/03/08/568485
51. Britton T, Scalia Tomba G . Estimation in emerging epidemics: biases and remedies. J R Soc Interface 2018; 16: 20180670.
52. Kenah E, Lipsitch M, Robins JM. Generation interval contraction and epidemic data analysis. Math Biosci 2008; 213: 71–79.
53. Liu QH, Ajelli M, Aleta A, et al. Measurability of the epidemic reproduction number in data-driven contact networks. Proc Natl Acad Sci USA 2018; 115: 12680–12685.
54. Challen R, Brooks-Pollock E, Danon L, et al. Impact of uncertainty in serial interval, generation interval, incubation period and delayed observations in estimating the reproduction number for COVID-19. Stat Methods Med Res 2020; this volume.
55. Bi Q, Wu Y, Mei S. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. Lancet Infect Dis 2020; 20: 911–919.
56. Marschner I. Back-projection of COVID-19 diagnosis counts to assess infection incidence and control measures: analysis of Australian data. Epidemiol Infect 2020; 148: e97.
57. Goldstein E, Dushoff J, Ma J, et al. Reconstructing influenza incidence by deconvolution of daily mortality time series. Proc Natl Acad Sci USA 2009; 106: 21825–21829.
58. Becker NG, Watson LF, Carlin JB. A method of non-parametric back-projection and its application to AIDS data. Stat Med 1991; 10: 1527–1542.
59. Huisman JS, Scire J, Angst DC, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2 medRxiv 2020; https://doi.org/10.1101/2020.11.26.20239368. https://www.medrxiv.org/content/early/2020/11/30/2020.11.26.20239368
60. Abbott S, Hellewell J, Thompson RN, et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts [version 2; peer review: 1 approved with reservations]. Wellcome Open Res 2020; 5: 112. https://doi.org/10.12688/wellcomeopenres.16006.2
61. Riley S, Ainslie KEC, Eales O, et al. Resurgence of SARS-CoV-2: detection by community viral surveillance. Science 2021; 372: 990–995.
62. Omori R, Mizumoto K, Chowell G. Changes in testing rates could mask the novel coronavirus disease (COVID-19) growth rate. Int J Infect Dis 2020; 94: 116–118.
63. Malmgren J, Guo B, Kaplan HG. Continued proportional age shift of confirmed positive COVID-19 incidence over time to children and young adults: Washington State March–August 2020. PLoS ONE 2021; 16: 1–12.
64. Seaman S, De Angelis D. Update on estimates of numbers of COVID-19 deaths accounting for reporting delay. https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/2020/06/Adjusting-COVID-19-deaths-to-account-for-reporting-delay-.pdf (2020, accessed 28 September 2020).

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: September 27, 2021
Issue published: September 2022

Keywords

  1. Reproduction number
  2. COVID-19 pandemic

Rights and permissions

© The Author(s) 2021.
Creative Commons License (CC BY 4.0)
This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
PubMed: 34569883

Authors

Affiliations

Carolin Vegvari
Medical Research Council Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
Sam Abbott
Center for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, UK
Frank Ball
School of Mathematical Sciences, University of Nottingham, UK
Ellen Brooks-Pollock
Bristol Veterinary School, University of Bristol, UK
NIHR Health Protection Research Unit in Behavioural Science and Evaluation at the University of Bristol, UK
Robert Challen
EPSRC Centre for Predictive Modelling in Healthcare, University of Exeter, UK
Somerset NHS Foundation Trust, UK
Benjamin S Collyer
Medical Research Council Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK
Ciara Dangerfield
Isaac Newton Institute for Mathematical Sciences, UK
Julia R Gog
Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK
Katelyn M Gostic
Department of Ecology and Evolution, University of Chicago, USA
Jane M Heffernan
Centre for Disease Modelling, Mathematics & Statistics, York University, Canada
COVID Modelling Task-Force, The Fields Institute, Canada
T Déirdre Hollingsworth
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK
Valerie Isham
Department of Statistical Science, University College London, UK
Eben Kenah
Division of Biostatistics, College of Public Health, The Ohio State University, USA
Denis Mollison
Department of Actuarial Mathematics and Statistics, Heriot-Watt University, UK
Jasmina Panovska-Griffiths
The Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK
Wolfson Centre for Mathematical Biology, Mathematical Institute and The Queen's College, University of Oxford, Oxford, UK
Lorenzo Pellis
Department of Mathematics, The University of Manchester, UK
The Alan Turing Institute, UK
Michael G Roberts
School of Natural and Computational Sciences and New Zealand Institute for Advanced Study, Massey University, New Zealand
Gianpaolo Scalia Tomba
Department of Mathematics, University of Rome Tor Vergata, Italy
Robin N Thompson
Mathematics Institute, University of Warwick, Coventry, UK
Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, University of Warwick, Coventry, UK
Pieter Trapman
Department of Mathematics, Stockholm University, Sweden

Notes

Ellen Brooks-Pollock, Robert Challen, Ciara Dangerfield, Julia Gog, T Deirdre Hollingsworth, Lorenzo Pellins and Robin Thompson are affiliated to JUNIPER – Joint UNIversities Pandemic and Epidemiological Research, UK.
Carolin Vegvari, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, Norfolk Place, London W2 1PG, UK. Email: [email protected]

Metrics and citations

Metrics

Journals metrics

This article was published in Statistical Methods in Medical Research.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 3272

*Article usage tracking started in December 2016


Articles citing this one

Receive email alerts when this article is cited

Web of Science: 14 view articles Opens in new tab

Crossref: 3

  1. Heterogeneity in the onwards transmission risk between local and impor...
    Go to citation Crossref Google Scholar
  2. Update on COVID-19 and Effectiveness of a Vaccination Campaign in a Gl...
    Go to citation Crossref Google Scholar
  3. Real-time estimation of the effective reproduction number of SARS-CoV-...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Purchase 24 hour online access to view and download content.

Access journal content via a DeepDyve subscription or find out more about this option.