New content
Statistical Methods in Medical Research

Open access

Research article

First published online September 27, 2021

Commentary on the use of the reproduction number R during the COVID-19 pandemic

https://doi.org/10.1177/09622802211037079

Abstract

Since the beginning of the COVID-19 pandemic, the reproduction number

R

has become a popular epidemiological metric used to communicate the state of the epidemic. At its most basic,

R

is defined as the average number of secondary infections caused by one primary infected individual.

R

seems convenient, because the epidemic is expanding if

R > 1

and contracting if

R < 1

. The magnitude of

R

indicates by how much transmission needs to be reduced to control the epidemic. Using

R

in a naïve way can cause new problems. The reasons for this are threefold: (1) There is not just one definition of

R

but many, and the precise definition of

R

affects both its estimated value and how it should be interpreted. (2) Even with a particular clearly defined

R

, there may be different statistical methods used to estimate its value, and the choice of method will affect the estimate. (3) The availability and type of data used to estimate

R

vary, and it is not always clear what data should be included in the estimation. In this review, we discuss when

R

is useful, when it may be of use but needs to be interpreted with care, and when it may be an inappropriate indicator of the progress of the epidemic. We also argue that careful definition of

R

, and the data and methods used to estimate it, can make

R

a more useful metric for future management of the epidemic.

What is the reproduction number $R$ ?

Since the start of the novel coronavirus (severe acute respiratory syndrome-coronavirus-2 (SARS-CoV-2)) pandemic, the reproduction number

R

has become a popular summary statistic, used by policymakers to assess the state of the epidemic and the efficacy of interventions, and by the media to communicate the progress of the epidemic to the general public. The primary appeal of

R

is that it offers a single number that indicates whether the transmission of the pathogen is increasing or decreasing, depending on whether

R

is above or below one. Early

R

estimates for SARS-CoV-2 in different countries were in the range of 2.0–6.5^1,2. However, the use of

R

can be problematic in terms of both its definition and estimation. Its usefulness is precisely because it is a summary statistic rather than a basic parameter describing the dynamic processes of infection, transmission and recovery. To understand how

R

is calculated and how it can be affected by interventions, the epidemic process needs to be considered in more detail. When epidemic numbers are small or concentrated in possibly atypical parts of a population,

R

may be an unreliable descriptor of the state of the outbreak.

In this paper, we discuss these issues and determine the situations when the reproduction number

R

is most useful for assessing and communicating the state of an outbreak (see Figure 1). We focus on the definitions of different types of

R

, for example, the basic reproduction number or the effective reproduction number that can be considered different quantities and their applicability to different phases of an epidemic. However, as we explain below, care must also be taken with subtly different definitions of the same type of

R

, for example, when using different models to analyse the progress of an epidemic.

Figure 1. Flowchart summarising the main points explained in the main text depending on the state of the epidemic.

The beginning of a pandemic – $R_{0}$

In the early stages of a new outbreak of an infectious disease, we can define an initial

R

value, known as the basic reproduction number

R_{0}

, that is the average number of individuals infected by each infectious individual in a fully susceptible population^3–5. An outbreak resulting from one infected individual may die out within a few infection generations by chance^6,7. Otherwise, if

R_{0} > 1

, the incidence of cases will grow exponentially, with on average

R_{0}^{n}

cases in the

n th

generation. Already, this simple description introduces a number of concepts and assumptions. An individual’s infection generation specifies their position in the chain of infections, the

(n - 1) th

generation infects the

n^{th}

generation, and so on. It also assumes an underlying scenario (model) in which the average number of susceptibles infected by each infective stays the same over successive infection generations, and ignores the depletion of susceptibles. (We refer to those members of the population who are uninfected and susceptible to infection as susceptibles, and those that are infected and infectious as infectives.) The potential importance of these assumptions depends on the contact structure of the population, to which we return below.

Thus,

R_{0}

(and other

R

values to be defined later) is not just a property of the infectious agent (pathogen). It depends on demography, and whatever human behaviour is associated with the possibility of infectious contact (an effective contact is one that results in transmission if made with a susceptible, while a contact in the common sense of the word has a certain probability of transmission). For the simplest models,

R_{0} > 1

implies that an introduction of infection will result in an epidemic. Furthermore, if there were no interventions or changes in behaviour, then the proportion of the population infected during the entire course of an epidemic would be approximately the non-zero solution of the equation

P = 1 - e^{- R_{0} P}

(e.g. if

R_{0} = 2

, then

P \approx 0.8

). This result is referred to as the final size equation and underscores the fact that during an epidemic it is not generally true that everybody will be infected at some point.

Individuals may vary considerably in their susceptibility to infection and in their propensity to pass it on through their biology or behaviour. Age is often an important determinant. If the population is grouped in some way, so that for instance some groups have higher

R

values than others, then the overall outbreak is expected to grow as described by an

R_{0}

that depends on all of these values, and also depends on how each group infects the others, i.e. on the

R

values between groups as well as within them (

R_{0}

is then the dominant eigenvalue of the matrix of

R

values^3,5,8). The first few stages of the outbreak may be atypical, depending on which group is first infected.

For the simplest mathematical model of the beginning of an outbreak, it is assumed that because only a small fraction of the population has been infected, all potential contacts are with susceptibles. This may be an unrealistic assumption because human interaction networks tend to be clustered (e.g. through households, workplaces or schools). Growth through successive generations of infection, which is the basis for defining

R_{0}

, does not translate simply into time because the generation interval of an infection (the time interval back from the instant when a susceptible is infected, to that when their infector was infected) is variable, and infection generations may overlap temporally. Typically, growth in the early stages is faster than the simple assumption of a fixed average generation time would suggest and this is a major problem in estimating

R_{0}

from early outbreak data. In addition, the implicit assumption is that all infectives are identifiable as such. If there is a significant proportion of asymptomatic cases, an estimate of

R_{0}

may be affected by the time from when an asymptomatic infective has become infected to when he/she is expected to infect susceptibles. If this timing is the same for asymptomatic and symptomatic cases, then the estimate for

R_{0}

will be unaffected.

The second simplest case: where an outbreak is widespread – $R_{t}$

When the pandemic is well-established in a country (or region), with large numbers of cases most of which are internal to the country, an ‘effective reproduction number’ at time

t

R_{t}

(sometimes denoted

R_{e}

R_{e f f}

), is a useful descriptor of the progress of the outbreak (Figure 1). Again, the concept is of an average of how many new cases each infectious case causes. The value of

R_{t}

may be affected by interventions: typically the aim is to reduce

R_{t}

below one and to as small a value as possible. For models including detailed, and therefore complex, contact networks there may be more than one way of defining

R_{t}

; however, definitions should always agree that the value of

R_{t}

is 1 when the expected number of new infections is constant.

The relevance of the assumptions here (large numbers of cases, mostly internal to the region) is that in such circumstances we expect

R_{t}

to have a fairly stable value that changes substantially over time only when interventions are introduced or cease. The definition of

R_{t}

here is in terms of actual new infectious cases, i.e. excluding potentially infectious contacts with individuals who have been infected and are immune to reinfection. As the number of immune individuals grows large compared to the entire population, the spread of infections will gradually slow, because many contacts will be with immune individuals, and hence the value of

R_{t}

will be reduced. The level of immunity at which

R_{t} = 1

is the herd immunity threshold (see the ‘Methods’ section on vaccination and herd immunity below).

When the outbreak is at a low level or fragmented the concept of $R$ may be less useful

If the outbreak is at a low level either because it has run its course or because of successful interventions, the definition and the use of an

R

value are problematic (Figure 1). At low levels of prevalence, there will (as in the early stages of the outbreak) be greater statistical variability. Additionally, there are likely to be heterogeneities associated with the infection being unevenly spread among different subgroups of the population (possibly dependent on age, behaviour or geographical location⁹), with some parts of the population having had more exposure than others. There may also be local variability in interventions, and it may not be easy to allow for the effect of some cases being introductions from outside the population under consideration. If the outbreak is fragmented, particularly when close to elimination, it will make more sense to think of it as composed of separate local outbreaks, which can be modelled separately, rather than trying to specify an average

R

value overall.

Relating $R$ to details of the infection process

If the population is heterogeneous or structured, defining a reproduction number needs care, as the number of new cases, an infective is expected to cause will depend on both their infectiousness and how well connected they are. It has been shown that in the early stages of an epidemic, when the relevant contact structures of a population are not known and interventions are not targeted, assuming a homogeneous contact structure results in conservative estimates of

R_{0}

and the required control effort. However, designing targeted intervention strategies requires reliable information on infectious contact structures¹⁰. There are several basic ways to use structured population models to capture departures from the simplest epidemic models. The four most common are (i) household models, (ii) multi-type models, (iii) network models, and (iv) spatial models.

In a household model, every person in the population is assumed to be part of a single household, which is typically small, and may even be of size one. Those in the same household have a higher probability of infecting each other than is the case for two people chosen randomly from the population. In this model, reproduction numbers can still be defined^11,12. The most commonly used is the household reproduction number

R_{*}

, which is the expected number of members of other households that are infected by people from a primary infected household. It is still possible to consider the average number of susceptibles infected by a single infectious person. However, for this to be useful, the average has to be computed in a sophisticated way, because the number of people a person can infect will depend on how many members of the same household are still susceptible when s/he becomes infectious¹³.

A second way of modelling heterogeneity in the population is to assume that the population can be subdivided into groups. The groups may be defined through age bands, social activity levels, health status, type of job, place of residence and so on. Characteristics such as susceptibility, infectivity and frequency of contact may depend on an individual’s group, but all those in a single group have the same characteristics. It is often assumed that all these groups are large. If there are regular inter-group contacts then the largest eigenvalue of the so-called next-generation matrix^5,8 has many similar properties to those of

R_{0}

for an epidemic spreading in a homogeneously mixing population, although the final size equation is generally not satisfied.

A third way of introducing heterogeneity is to represent the population by a network, where transmission is only possible between people sharing a link in the network. For many network models, it is still possible to define a reproduction number¹⁴. It is important to note that the person initially infected in a population is often atypical and should be ignored in computing or estimating the reproduction number. A useful extension is a mixture of a network model and a homogeneous mixing model, in which both regular and casual contacts are captured. In this extension, a reproduction number with the desired threshold properties can be defined¹⁵.

Sometimes most transmission is restricted to people living close to each other, and spatial models are useful when physical location should be incorporated. For these, it is often difficult to define a reproduction number because there is no phase in which the number of infected is growing exponentially^16,17. If standard estimation methods are used where there is a considerable spatial component then the estimates will be close to one, even when the spread is highly supercritical and transmission needs to be much reduced to control the epidemic.

$R$ , vaccination and herd immunity

As immunity builds up in a population through infection during the course of an epidemic, even when the contact rate between individuals remains the same (assuming no change in interventions), both the chance that a contact is susceptible to infection and the effective reproduction number,

R_{t}

, will decrease. Herd immunity is achieved when enough individuals have become immune so that

R_{t}

falls below the value

1

without the need to reduce contacts among individuals by non-pharmaceutical interventions.

Vaccination provides another means of building up immunity in a population. Depending on the coverage, it can slow or halt the spread of an epidemic, preventing individual infection or limiting experiences of the disease. All vaccination programmes aim to achieve sufficient immunity in the population that

R_{t} < 1

without modifying contact patterns among individuals. In this situation, there are insufficient susceptibles in the population for sustained transmission. The susceptible proportion of a population for which

R_{t} = 1

is known as the critical vaccination threshold (CVT). When the susceptible proportion is below this threshold, there is herd immunity, which means that the population is protected from a major outbreak even though not everyone is vaccinated or otherwise immune.

In simple mathematical models (e.g. models in which the population is only subdivided into susceptible, infected and recovered individuals), the CVT is determined by the basic reproduction number

R_{0}

. Specifically, vaccination of a uniform randomly chosen proportion

1 - (1 / R_{0})

of the population is sufficient to create herd immunity and prevent an epidemic, as long as the vaccine-induced immunity is sufficiently long-lasting¹⁸. As a simple example, if

R_{0} = 2

then 50% of a population would need to be vaccinated or otherwise immune to prevent outbreaks. If

R_{0} = 3

, as is approximately the case for COVID-19, then 67% of a population would need to be vaccinated or immune. When setting such vaccination targets, waning immunity needs to be taken into account. The implementation and impact of a vaccination programme depend on whether vaccination is performed before or during an outbreak^19,20.

As outlined above, the population structure affects the reproduction numbers

R_{0}

and

R_{t}

as well as the probability that an epidemic will spread. Therefore, it has important effects on the threshold for herd immunity and the optimal vaccination strategy. For models with small mixing groups such as households, the basic reproduction number

R_{0}

, as defined in the ‘The beginning of a pandemic –

R_{0}

’ section, does not provide a good indicator of whether or not an epidemic can take off because repeated contacts within households are likely even in the early stages of an outbreak. However, in the early stages of an epidemic, between-household contacts are likely to be with individuals in otherwise fully susceptible households, so the reproduction number

R_{*}

, which is given by the average number of between-household contacts that emanate from a typical within-household epidemic^21,22 can be used instead. For household models, herd immunity is achieved if a uniform randomly chosen proportion

1 - (1 / R_{*})

of all households in a population is fully vaccinated.

For COVID-19, a toy model has been used to illustrate the effect of population heterogeneity on herd immunity. It showed²³ that age structure and variation in social contacts among individuals could reduce the herd immunity threshold to 43%, almost a third less than that for a homogeneous population. Assuming a more extreme variation in social contact rates and that the most exposed individuals become infected first, another study estimates that the herd immunity threshold in some populations could be as low as 20%²⁴. In addition, there is some indication that immunity gained from infection by some common cold coronavirus strains may provide cross-immunity to SARS-COV-2^25,26. There have also been reports that immunity gained from COVID-19 infection may wane, reducing individual and population levels of immunity over time. If these observations are indeed applicable here, the herd immunity threshold could be further modified²⁶.

One important difference between immunisation by vaccination and by infection is that, during an epidemic, individuals with higher susceptibilities and/or larger numbers of contacts are likely to be infected earlier. If herd immunity is to be achieved by vaccination, optimal planning can reduce the coverage required to achieve herd immunity. For example, in an illustrative households model for variola minor infections in Brazil, it is shown that under the optimal vaccination strategy the proportion of the population that needs to be vaccinated is a third less than under a strategy that fully vaccinates randomly chosen households²⁷. Although several COVID-19 vaccines have been developed, global demand in the early phases of vaccine roll-out still exceeds supply. Designing optimal vaccination strategies for different settings that take into account population structure alongside other public health concerns, e.g. protecting the vulnerable, could greatly enhance the chances of achieving herd immunity and the cost-effectiveness of vaccination as an intervention.

How can $R$ be estimated?

Before estimating

R

, the purpose of the estimation needs to be clarified. Is it intended simply to track the changes in the trajectory of case numbers over time? Or is it intended to assess the potential for pathogen transmission in a specific population, perhaps in the context of considering interventions? If the latter, the relevant population needs to be defined. Depending on the purpose, different data sets and statistical methods can be used.

There are several approaches to estimating

R_{t}

from epidemiological data. In the most direct method, high-quality contact tracing data can be used, in theory at least, to estimate both

R_{t}

and the generation time interval, and this has been attempted for COVID-19²⁸. However, contact tracing of SARS-CoV-2 infections is notoriously difficult because of the high proportion of asymptomatic infections. Moreover, effective contact tracing reduces the number of contacts of traced individuals so that the corresponding estimates are biased.

More commonly,

R_{t}

can be estimated by inferring the rate of infection transmission within a dynamical model fitted to observed cases, hospitalisations, deaths or a combination of those^29,30. Dynamical models have been used widely to forecast the spread of COVID-19 and the effect of interventions. These models allow the impact of assumed changes in specific interventions on

R_{t}

to be explored, so estimating

R_{t}

in this way can be convenient. Dynamical models can be described by systems of differential equations and assume very large to infinite population sizes. In completely deterministic dynamical models, the uncertainty in estimated

R_{t}

values depends only on data and parameter uncertainty, and not on stochastic uncertainty. However, if the number of new infections is small, the value of

R_{t}

is strongly affected by chance events, which increases the uncertainty in the estimate. This situation can be addressed by the use of stochastic models or incorporating stochastic assumptions in otherwise deterministic model frameworks.

But this approach is not without drawbacks. Not least,

R_{t}

estimates from dynamical models depend critically on assumptions (e.g. model structure and which parameter values are estimated), and on data quality. Another potential drawback is that many parameters of dynamical models are often assumed to be fixed over time. These approaches are therefore less suited to capture the effects of gradual, continuous changes in behaviour, mobility or social network structure. However, gradual changes in dynamic models can be incorporated by assuming that transmission parameters change over given intervals, while at the same time the possible amount of change is constrained to avoid big jumps caused by a small number of noisy data points³¹. In this way, models that include change-points in the rate of infection near specific interventions can infer the impact of control policies, as well as the effect of susceptible depletion.

There is also a difference in how

R_{t}

is estimated between compartmental and agent- or individual-based models. In an agent-based model, it is possible simply to count exactly how many secondary infections are caused by each primary infection. Thus, all details of the epidemic including time-varying viral loads, population-level and localised immunity, interventions, network factors, and other effects are automatically incorporated and do not need to be considered separately³². As agent-based models explicitly include stochastic effects, the uncertainty in

R_{t}

estimates can be greater than for those derived from deterministic dynamical models. Because of the greater number of parameters included in dynamical and particularly agent-based models, they require more data and more different types of data than the simpler statistical models described below to identify estimates for all parameters.

A third approach uses statistical models to estimate

R_{t}

, and continuous changes in it, empirically from case notification data. These methods make minimal structural assumptions about epidemic dynamics, and only require users to specify the distribution of the generation interval. They are agnostic to population susceptibility or epidemic phase, but as we discuss below, care must still be taken to avoid quantitative and temporal biases. The most common empirical methods are the Cori method^33,34 and the Wallinga–Teunis method³⁵. The drawbacks of some statistical models include that they cannot be used to combine different data streams into a coherent picture.

Where genome sequences from viral samples taken from infected patients are available and the date of sampling is known,

R_{t}

can also be estimated using phylogenetic methods. An evolutionary model is fitted that best explains the patterns of nucleotide substitution in the dated samples. The fitted model parameters include the nucleotide substitution rate and the population size of the virus at a given time in the past. Using a metapopulation analogy, the effective population size of a pathogen has been shown to be proportional to the number of infected individuals and inversely proportional to the transmission rate from which the reproduction number can be determined³⁶.

Statistical methods to estimate $R$

In this section, we discuss two frequently used simple statistical methods to estimate

R

and common issues associated with them. The Cori and Wallinga–Teunis methods estimate subtly different versions of

R_{t}

; the Cori method generates estimates of the instantaneous reproduction number and the Wallinga–Teunis method generates estimates of the case reproduction number^33,37. The key difference is that the instantaneous reproduction number gives an average

R_{t}

for a homogeneous population at a single point in time, whereas the case reproduction number can accommodate individual heterogeneity, but blurs over several dates of transmission. Furthermore, the case reproduction number is a leading estimator of the instantaneous reproduction number, i.e. it depends on data from after the time for which the reproduction number is to be estimated, and must be adjusted accurately to infer the impact of time-specific interventions³⁸.

The instantaneous reproduction number represents the expected number of infections generated at time

t

by currently infectious individuals³³. For real-time analysis, one of the benefits of estimating the instantaneous reproduction number is that it does not require information about future changes in transmissibility, and it reflects the effectiveness of control measures in place at time

t

. But as an aggregate measure of transmission by all individuals infected in the past (who may now be shedding virus), it does not easily consider heterogeneity in transmission. In contrast, the case reproduction number represents the expected number of infections generated by an individual who is first infected at time

t

and has yet to progress through the full course of viral shedding. This leads to ‘right censoring’ when the case reproduction number is estimated in real-time; if all infections generated by individuals who were infected at time

t

have not yet been observed, then the data must be adjusted^39–41 or the case reproduction number will be underestimated.

The Cori method and the Wallinga–Teunis method involve inferring the values of

R_{t}

that are most consistent with observed incidence data (for a review, see Gostic et al.³⁸). In the Cori method, typically this inference is carried out by assuming that

R_{t}

is constant over fixed time windows. Smoothing windows are used to avoid spurious fluctuations in estimates of

R_{t}

. These can occur if imperfect observation and reporting effects, rather than actual bursts in transmission, are the main source of noise in the data. Cross-validation and proper scoring rules can be used to avoid under- or oversmoothing

R_{t}

estimates⁴².

An important concept, basic to both methods, is the intrinsic generation time also referred to as the infectiousness profile. The intrinsic generation interval is a theoretical quantity derived from the renewal equation of Lotka and Euler^30,43. It describes the time distribution of potentially infectious contacts made by an index case and is independent of population susceptibility⁴⁴. In practice, the intrinsic generation interval is not observable, and it must be estimated carefully from observed serial intervals within contact tracing or household data^44–47. The serial interval is generally defined as the duration of time between the onset of symptoms in an index case and in a secondary case⁴⁸. In the early stages of an outbreak, accurate estimation should adjust for right truncation of observations, for changes over time in population susceptibility, and for interventions such as case isolation, which may shorten the generation interval by limiting transmission events late in the course of infectiousness^44,45,49.

Both the Cori and Wallinga–Teunis methods are conceptually based on separating the infectiousness of an infective into two components, total amount and timing. The timing is expressed by the generation time distribution while the total amount is expressed by

R_{t}

. The variation of (average) infectivity over time is ascribed, at least in practical implementations of the methods, to changes in

R_{t}

, while the intrinsic generation time is assumed to remain fixed. This is a simplification that may lead to inaccurate estimation of

R_{t}

, since, in reality, the observed generation time distribution varies over time, both because of the epidemic dynamics^48,50,51, because of the epidemic affecting different subgroups of the population, with possibly different generation time distributions over time^52,53, and, more importantly, because of interventions that affect the length or efficacy of the infectious period⁴⁹. An additional complication is that the ‘intrinsic’ generation interval of the Cori and Wallinga–Teunis estimators includes potentially infectious contacts with both susceptible and immune individuals, whereas only contacts with susceptible individuals cause new infections, and are observed in contact tracing^44,45. Even when using an accurately estimated fixed generation time distribution, both

R_{t}

estimators are numerically sensitive to the specified mean and variance of the intrinsic generation interval⁵⁴.

Data used to estimate $R$

Fundamentally,

R_{t}

is a measure of transmission. Ideally, it would be estimated from data on the total number of incident infections (i.e. transmission events) occurring each day. But in practice, only a small fraction of infections are observed, and notifications do not occur until days or weeks after the moment of infection. Temporally accurate

R_{t}

estimation requires adjusting for lags to observation, which can be estimated as the sum of the incubation period and delays from symptom onset to case observation^54,55. Delays not only shift observations into the future, but they also blur infections incident on a particular date across many dates of observation. This blurring can be particularly problematic when working with long and variable delays (e.g. from infection to death), and when

R_{t}

is changing. Deconvolution^56–59, or

R_{t}

estimation models that include forward delays⁶⁰ can be used to adjust lagged observations. Simpler approaches may be justifiable under some circumstances. If observation delays are relatively short and not highly variable, and if

R_{t}

is not rapidly changing, simply shifting unadjusted

R_{t}

estimates back in time by the mean delay can provide a reasonable approximation to the true value (see Challen et al.,⁵⁴ in this volume, for an in-depth discussion). The advantages and disadvantages of each approach are reviewed in Gostic et al.³⁸. Changes over time in case ascertainment can also bias

R_{t}

estimates, so ideally data should be drawn from structured surveillance (see, e.g., the REACT study⁶¹) or adjusted for known changes in testing or reporting effort^61,62.

In practice,

R_{t}

can be estimated from a time series of new symptom onset reports, cases, hospitalisations or deaths. Choosing an appropriate data stream involves weighing representativeness, timeliness of reporting, consistency of ascertainment, and length of lag. For example, reported deaths may be reasonably unaffected by changes over time in ascertainment, but adjusting for long lags to observation can be challenging, and deaths may not be representative of overall transmission (e.g. if the epidemic shifts towards younger age groups)^63,64. Extensions of existing statistical models for

R_{t}

estimation could potentially integrate multiple kinds of data, by assuming that, for example, cases, hospitalisations and deaths, arise from a shared, latent infection process, with different delays³⁸. A mechanistic model can also pull multiple data streams together by modelling the different processes underlying each data stream. Problems can arise if different data streams disagree on the progress of the pandemic. However, if the disagreement is caused by a shift in delays from events to reporting in different data streams, a mechanistic model can highlight these changes. Sometimes different data streams can be used for model validation.

All methods used to estimate

R_{t}

must decide on the length of the time window over which it is to be estimated. All data used to estimate

R_{t}

are noisy. The shorter the time window used for estimation, the higher will be the noise-to-signal ratio and, therefore, the uncertainty in the estimate of

R_{t}

. In contrast, longer time windows will produce estimates with lower uncertainty, but sudden changes in transmission may not be detected if the time window is too long.

Summary: cautions and recommendations

During the early phase of the epidemic:

•

R_{0}

estimates in the early phase may not be representative for the population as a whole if the group of initial transmitters is atypical.

•

R_{0}

may be incorrectly estimated in the early phase if infected but asymptomatic individuals are not counted or recognised, and their epidemiologically relevant behaviour differs from that of symptomatic individuals.

When the epidemic is established in the population:

•

R_{t}

can differ for different population groups, and the value of

R_{t}

is dominated by the group in which most transmission occurs. To improve targeted containment measures, where possible additional information should be reported alongside case data, such as demographic, socio-economic and occupational information.

•

The estimated value of

R_{t}

and its associated uncertainty depend on the data stream(s) used and the time window over which

R_{t}

was estimated, and these should be reported alongside the estimates. This will make it possible to draw more robust conclusions when considering results from different models.

•

Model components that are likely to change over the time course of the epidemic (e.g. the generation time distribution) should be updated regularly, and sensitivity to changing assumptions should be kept under consideration.

When the ongoing epidemic is fragmented:

•

R_{t}

estimates from local outbreaks, if they can be contained, cannot inform on the progress of the epidemic and efficacy of interventions at the national level. They may inform local interventions. Other descriptors should be considered to assess the progress of the epidemic, such as the number of new cases per capita per day in a defined area, the number of hospitalisations and the spare hospital and intensive care capacity.

•

Imported cases that are effectively quarantined should not be counted towards

R_{t}

estimates as they do not contribute to the local transmission potential in the community.

Vaccination and herd immunity:

•

If the available vaccine supply is limited, optimal vaccination strategies should be designed that take into account population structure and the transmission potential within different groups and other public health priorities, e.g. protection of the vulnerable groups.

In conclusion, estimated

R

values do not exactly correspond to the theoretically defined quantities. In statistical terms, model uncertainty, sampling variability, and data accuracy affect the estimates. Nevertheless,

R_{0}

and

R_{t}

are useful quantities to assess the potential and progress of an epidemic. Their usefulness for decision making varies depending on the phase of the epidemic (early, established and fragmented). Clearly defining the context, the data streams and the statistical methods used to estimate

R

can improve its value for the management of an epidemic.

Acknowledgements

The authors would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Infectious Dynamics of Pandemics where work on this paper was undertaken. This work was supported by EPSRC grant no EP/R014604/1. EBP acknowledges funding from the Medical Research Council.

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research and/or authorship of this article: This work was supported by EPSRC grant no EP/R014604/1. EBP acknowledges funding from the Medical Research Council (MRC) (MC/PC/19067) and the NIHR Health Protection Research Unit in Behavioural Science and Evaluation at the University of Bristol. Support for RC’s research is provided by the EPSRC via grant EP/N014391/1, RC is also funded by the NHS Global Digital Exemplar programme (GDE). JMH acknowledges funding from the Natural Science and Engineering Research Council of Canada (NSERC), Canadian Institutes for Health Research (CIHR). MGR is supported by the Marsden Fund under contract MAU1718. GPST acknowledges the MIUR Excellence Department Project awarded to the Department of Mathematics, University of Rome Tor Vergata, CUP E83C18000100006. LP acknowledges the Wellcome Trust and the Royal Society (grant 202562/Z/16/Z) for funding. EK was supported by the National Institute of Allergy and Infectious Diseases (NIAID) grant R01 AI116770. PT acknowledges Vetenskapsrådet (Swedish Research Council), grant 2016-04566. KMG acknowledges fellowship support from the James S. McDonnell Foundation. The contents are solely the responsibility of the authors and do not necessarily represent the official views of NIAID or the US National Institute of Health.

ORCID iDs

Carolin Vegvari https://orcid.org/0000-0003-2018-011X

Robert Challen https://orcid.org/0000-0002-5504-7768

Robin N Thompson https://orcid.org/0000-0001-8545-5212

References

1. Tang B, Wang X, Li Q, et al. Estimation of the transmission risk of the 2019-nCoV and its implication for public health interventions. J Clin Med 2020; 9: 462.

Crossref

Google Scholar

2. Jit M, Jombart T, Nightingale ES, et al. Estimating number of cases and spread of coronavirus disease (COVID-19) using critical care admissions, United Kingdom, February to March 2020. Eurosurveillance 2020; 25: 2000632.

Crossref

Google Scholar

3. Heesterbeek J.

R_{0}

. PhD Thesis, University of Leiden, 1992.

Google Scholar

4. Heesterbeek J. A brief history of

R_{0}

and a recipe for its calculation. Acta Biotheor 2002; 50: 189–204.

Crossref

PubMed

ISI

Google Scholar

5. Diekmann O, Heesterbeek J, Britton T. Mathematical tools for understanding infectious disease dynamics. Princeton, Oxford: Princeton University Press, 2013.

Google Scholar

6. Kendall DG. Deterministic and stochastic epidemics in closed populations. In: Proceedings of the third Berkeley symposium on mathematical statistics and probability, Statistical Laboratory, University of California, Berkeley and Los Angeles: University of California Press, 1956, vol. 4, pp. 149–165.

Google Scholar

7. Thompson RN. Novel coronavirus outbreak in Wuhan, China, 2020: Intense surveillance is vital for preventing sustained transmission in new locations. J Clin Med 2020; 9: 498.

Crossref

Google Scholar

8. Diekmann O, Heesterbeek J, Roberts M. The construction of next-generation matrices for compartmental epidemic models. J R Soc Interface 2010; 7: 873–885.

Crossref

PubMed

Google Scholar

9. Thompson RN, Hollingsworth TD, Isham V, et al. Key questions for modelling COVID-19 exit strategies. Proc R Soc B 2020; 287: 20201405.

Crossref

PubMed

Google Scholar

10. Trapman P, Ball F, Dhersin JS, et al. Inferring

R_{0}

in emerging epidemics—the effect of common population structure is small. J R Soc Interface 2016; 13: 20160288.

Crossref

PubMed

Google Scholar

11. Ball F, Pellis L, Trapman P. Reproduction numbers for epidemic models with households and other social structures. II. Comparisons and implications for vaccination. Math Biosci 2016; 274: 108–139.

Crossref

PubMed

Google Scholar

12. Goldstein E, Paur K, Fraser C, et al. Reproductive numbers, epidemic spread and control in a community of households. Math Biosci 2009; 221: 11–25.

Crossref

PubMed

Google Scholar

13. Pellis L, Ball F, Trapman P. Reproduction numbers for epidemic models with households and other social structures. I. Definition and calculation of

R_{0}

. Math Biosci 2012; 235: 85–97.

Crossref

PubMed

Google Scholar

14. Kiss I, Miller J, Simon P. Mathematics of epidemics on networks. Cham: Springer, 2017. ISBN 978-3-319-50806-1.

Crossref

Google Scholar

15. Ball F, Neal P. Network epidemic models with two levels of mixing. Math Biosci 2008; 212: 69–87.

Crossref

PubMed

Google Scholar

16. Davis S, Trapman P, Leirs H, et al. The abundance threshold for plague as a critical percolation phenomenon. Nature 2008; 454: 634–637.

Crossref

PubMed

Google Scholar

17. Riley S, Eames K, Isham V, et al. Five challenges for spatial epidemic models. Epidemics 2015; 10: 68–71.

Crossref

PubMed

Google Scholar

18. Smith C. Factors in the transmission of virus infections from animal to man. Sci Basis Med Annual Rev 1964; 125–150.

PubMed

Google Scholar

19. Heffernan J, Keeling M. Implications of vaccination and waning immunity. Proc R Soc B 2009; 276: 2071–2080.

Crossref

PubMed

Google Scholar

20. Carlsson RM, Childs LM, Feng Z, et al. Modeling the waning and boosting of immunity from infection or vaccination. J Theor Biol 2020; 497: 110265.

Crossref

PubMed

Google Scholar

21. Ball F, Mollison D, Scalia Tomba G. Epidemics with two levels of mixing. Ann Probab 1997; 7: 46–89.

Google Scholar

22. Becker N, Dietz K. The effect of household distribution on transmission and control of highly infectious diseases. Math Biosci 1995; 127: 207–219.

Crossref

PubMed

Google Scholar

23. Britton T, Ball F, Trapman P. A mathematical model reveals the influence of population heterogeneity on herd immunity to SARS-CoV-2. Science 2020; 369: 846–849.

Crossref

PubMed

Google Scholar

24. Gomes MGM, Corder RM, King JG, et al. Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold. medRxiv 2020; https://www.medrxiv.org/content/early/2020/05/21/2020.04.27.20081893.

Crossref

Google Scholar

25. Yaqinuddin A. Cross-immunity between respiratory coronaviruses may limit COVID-19 fatalities. Med Hypotheses 2020; 144: 110049. .

Crossref

PubMed

Google Scholar

26. Sariol A, Perlman S. Lessons for COVID-19 immunity from other coronavirus infections. Immunity 2020; 53: 248–263.

Crossref

PubMed

Google Scholar

27. Ball F, Lyne O. Optimal vaccination schemes for epidemics among a population of households, with application to variola minor in Brazil. Stat Methods Med Res 2006; 15: 481–497.

Crossref

PubMed

ISI

Google Scholar

28. Ferretti L, Wymant C, Kendall M, et al. Quantifying SARS-CoV-2 transmission suggests epidemic control with digital contact tracing. Science 2020; 368: eabb6936.

Crossref

PubMed

Google Scholar

29. Roberts M, Heesterbeek J. Model-consistent estimation of the basic reproduction number from the incidence of an emerging infection. J Math Biol 2007; 55: 803–816.

Crossref

PubMed

Google Scholar

30. Wallinga J, Lipsitch M. How generation intervals shape the relationship between growth rates and reproductive numbers. Proc R Soc B 2007; 274: 599–604.

Crossref

PubMed

Google Scholar

31. Birrell P, Blake J, Van Leeuwen E, et al. Real-time nowcasting and forecasting of COVID-19 dynamics in England: the first wave. Philos Trans R Soc B Biol Sci 2021; 376: 20200279.

Crossref

PubMed

Google Scholar

32. Panovska-Griffiths J, Kerr CC, Stuart RM, et al. Determining the optimal strategy for reopening schools, the impact of test and trace interventions, and the risk of occurrence of a second COVID-19 epidemic wave in the UK: a modelling study. Lancet Child Adolescent Health 2020; 4: P817–P827.

Crossref

PubMed

Google Scholar

33. Cori A, Ferguson NM, Fraser C, et al. A new framework and software to estimate time-varying reproduction numbers during epidemics. Am J Epidemiol 2013; 178: 1505–1512.

Crossref

PubMed

Google Scholar

34. Thompson RN, Stockwin JE, van Gaalen RD, et al. Improved inference of time-varying reproduction numbers during infectious disease outbreaks. Epidemics 2019; 29: 100356.

Crossref

PubMed

Google Scholar

35. Wallinga J, Teunis P. Different epidemic curves for severe acute respiratory syndrome reveal similar impacts of control measures. Am J Epidemiol 2004; 160: 509–516.

Crossref

PubMed

Google Scholar

36. Lai A, Bergna A, Acciarri C, et al. Early phylogenetic estimate of the effective reproduction number of SARS-CoV-2. J Med Virol 2020; 92: 675–679.

Crossref

PubMed

Google Scholar

37. Fraser C. Estimating individual and household reproduction numbers in an emerging epidemic. PLoS ONE 2007; 2: e758.

Crossref

PubMed

Google Scholar

38. Gostic KM, McGough L, Baskerville EB, et al. Practical considerations for measuring the effective reproductive number,

R_{t}

. PLoS Comput Biol 2020; 16: 1–21.

Crossref

Google Scholar

39. Cauchemez S, Boëlle PY, Thomas G, et al. Estimating in real time the efficacy of measures to control emerging communicable diseases. Am J Epidemiol 2006; 164: 591–597.

Crossref

PubMed

Google Scholar

40. Cauchemez S, Boëlle PY, Donnelly CA, et al. Real-time estimates in early detection of SARS. Emerg Infect Dis 2006; 12: 110.

Crossref

PubMed

Google Scholar

41. Overton CE, Stage HB, Ahmad S, et al. Using statistics and mathematical modelling to understand infectious disease outbreaks: COVID-19 as an example. Infect Dis Modell 2020; 5: 409–441.

Crossref

PubMed

Google Scholar

42. Gasser T, Rosenblatt M. Smoothing techniques for curve estimation. Berlin, Heidelberg: Springer, 1979.

Crossref

Google Scholar

43. Kot M. The Lotka integral equation. In: Elements of mathematical ecology. Cambridge: Cambridge University Press, 2001. pp.353–364.

Crossref

Google Scholar

44. Champredon D, Dushoff J. Intrinsic and realized generation intervals in infectious-disease transmission. Proc R Soc B 2015; 282: 20152026.

Crossref

PubMed

Google Scholar

45. Park SW, Champredon D, Dushoff J. Inferring generation-interval distributions from contact-tracing data. J R Soc Interface 2020; 17: 20190719.

Crossref

PubMed

Google Scholar

46. Hart WS, Maini PK, Thompson RN. High infectiousness immediately before COVID-19 symptom onset highlights the importance of continued contact tracing. eLife 2021; 10: e65534.

Crossref

PubMed

Google Scholar

47. Hart WS, Endo A, Hellewell J, et al. Inference of SARS-CoV-2 generation times using UK household data. medRxiv 2021.

Crossref

Google Scholar

48. Svensson Å. A note on generation times in epidemic models. Math Biosci 2007; 208: 300–311.

Crossref

PubMed

Google Scholar

49. Ali ST, Wang L, Lau EHY, et al. Serial interval of SARS-CoV-2 was shortened over time by non-pharmaceutical interventions. Science 2020; 369: 1106–1109.

Crossref

PubMed

Google Scholar

50. Torneri A, Azmon A, Faes C, et al. Realized generation times: contraction and impact of infectious period, reproduction number and population size. bioRxiv 2019; 568485. https://www.biorxiv.org/content/early/2019/03/08/568485

Crossref

Google Scholar

51. Britton T, Scalia Tomba G . Estimation in emerging epidemics: biases and remedies. J R Soc Interface 2018; 16: 20180670.

Crossref

Google Scholar

52. Kenah E, Lipsitch M, Robins JM. Generation interval contraction and epidemic data analysis. Math Biosci 2008; 213: 71–79.

Crossref

PubMed

Google Scholar

53. Liu QH, Ajelli M, Aleta A, et al. Measurability of the epidemic reproduction number in data-driven contact networks. Proc Natl Acad Sci USA 2018; 115: 12680–12685.

Crossref

PubMed

Google Scholar

54. Challen R, Brooks-Pollock E, Danon L, et al. Impact of uncertainty in serial interval, generation interval, incubation period and delayed observations in estimating the reproduction number for COVID-19. Stat Methods Med Res 2020; this volume.

Google Scholar

55. Bi Q, Wu Y, Mei S. Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study. Lancet Infect Dis 2020; 20: 911–919.

Crossref

PubMed

Google Scholar

56. Marschner I. Back-projection of COVID-19 diagnosis counts to assess infection incidence and control measures: analysis of Australian data. Epidemiol Infect 2020; 148: e97.

Crossref

PubMed

Google Scholar

57. Goldstein E, Dushoff J, Ma J, et al. Reconstructing influenza incidence by deconvolution of daily mortality time series. Proc Natl Acad Sci USA 2009; 106: 21825–21829.

Crossref

PubMed

Google Scholar

58. Becker NG, Watson LF, Carlin JB. A method of non-parametric back-projection and its application to AIDS data. Stat Med 1991; 10: 1527–1542.

Crossref

PubMed

Google Scholar

59. Huisman JS, Scire J, Angst DC, et al. Estimation and worldwide monitoring of the effective reproductive number of SARS-CoV-2 medRxiv 2020; https://doi.org/10.1101/2020.11.26.20239368. https://www.medrxiv.org/content/early/2020/11/30/2020.11.26.20239368

Google Scholar

60. Abbott S, Hellewell J, Thompson RN, et al. Estimating the time-varying reproduction number of SARS-CoV-2 using national and subnational case counts [version 2; peer review: 1 approved with reservations]. Wellcome Open Res 2020; 5: 112. https://doi.org/10.12688/wellcomeopenres.16006.2

Google Scholar

61. Riley S, Ainslie KEC, Eales O, et al. Resurgence of SARS-CoV-2: detection by community viral surveillance. Science 2021; 372: 990–995.

Crossref

PubMed

Google Scholar

62. Omori R, Mizumoto K, Chowell G. Changes in testing rates could mask the novel coronavirus disease (COVID-19) growth rate. Int J Infect Dis 2020; 94: 116–118.

Crossref

PubMed

Google Scholar

63. Malmgren J, Guo B, Kaplan HG. Continued proportional age shift of confirmed positive COVID-19 incidence over time to children and young adults: Washington State March–August 2020. PLoS ONE 2021; 16: 1–12.

Crossref

Google Scholar

64. Seaman S, De Angelis D. Update on estimates of numbers of COVID-19 deaths accounting for reporting delay. https://www.mrc-bsu.cam.ac.uk/wp-content/uploads/2020/06/Adjusting-COVID-19-deaths-to-account-for-reporting-delay-.pdf (2020, accessed 28 September 2020).

Google Scholar

Cite article

If you have citation software installed, you can download article citation data to the citation manager of your choice

Information, rights and permissions

Information

Published In

Statistical Methods in Medical Research

Volume 31, Issue 9

Pages: 1675 - 1685

Article first published online: September 27, 2021

Issue published: September 2022

Keywords

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

PubMed: 34569883

Authors

Affiliations

Carolin Vegvari

Medical Research Council Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK

https://orcid.org/0000-0003-2018-011X

[email protected]

View all articles by this author

Sam Abbott

Center for the Mathematical Modelling of Infectious Diseases, London School of Hygiene & Tropical Medicine, UK

View all articles by this author

Frank Ball

School of Mathematical Sciences, University of Nottingham, UK

View all articles by this author

Ellen Brooks-Pollock

Bristol Veterinary School, University of Bristol, UK

NIHR Health Protection Research Unit in Behavioural Science and Evaluation at the University of Bristol, UK

View all articles by this author

Robert Challen

EPSRC Centre for Predictive Modelling in Healthcare, University of Exeter, UK

Somerset NHS Foundation Trust, UK

https://orcid.org/0000-0002-5504-7768

View all articles by this author

Benjamin S Collyer

Medical Research Council Centre for Global Infectious Disease Analysis, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, London, UK

View all articles by this author

Ciara Dangerfield

Isaac Newton Institute for Mathematical Sciences, UK

View all articles by this author

Julia R Gog

Department of Applied Mathematics and Theoretical Physics, University of Cambridge, UK

View all articles by this author

Katelyn M Gostic

Department of Ecology and Evolution, University of Chicago, USA

View all articles by this author

Jane M Heffernan

Centre for Disease Modelling, Mathematics & Statistics, York University, Canada

COVID Modelling Task-Force, The Fields Institute, Canada

View all articles by this author

T Déirdre Hollingsworth

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, UK

View all articles by this author

Valerie Isham

Department of Statistical Science, University College London, UK

View all articles by this author

Eben Kenah

Division of Biostatistics, College of Public Health, The Ohio State University, USA

View all articles by this author

Denis Mollison

Department of Actuarial Mathematics and Statistics, Heriot-Watt University, UK

View all articles by this author

Jasmina Panovska-Griffiths

The Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK

Wolfson Centre for Mathematical Biology, Mathematical Institute and The Queen's College, University of Oxford, Oxford, UK

View all articles by this author

Lorenzo Pellis

Department of Mathematics, The University of Manchester, UK

The Alan Turing Institute, UK

View all articles by this author

Michael G Roberts

School of Natural and Computational Sciences and New Zealand Institute for Advanced Study, Massey University, New Zealand

View all articles by this author

Gianpaolo Scalia Tomba

Department of Mathematics, University of Rome Tor Vergata, Italy

View all articles by this author

Robin N Thompson

Mathematics Institute, University of Warwick, Coventry, UK

Zeeman Institute for Systems Biology and Infectious Disease Epidemiology Research, University of Warwick, Coventry, UK

https://orcid.org/0000-0001-8545-5212

View all articles by this author

Pieter Trapman

Department of Mathematics, Stockholm University, Sweden

View all articles by this author

Notes

Ellen Brooks-Pollock, Robert Challen, Ciara Dangerfield, Julia Gog, T Deirdre Hollingsworth, Lorenzo Pellins and Robin Thompson are affiliated to JUNIPER – Joint UNIversities Pandemic and Epidemiological Research, UK.

Carolin Vegvari, Department of Infectious Disease Epidemiology, School of Public Health, Imperial College London, Norfolk Place, London W2 1PG, UK. Email: [email protected]

Metrics and citations

Metrics

This article was published in Statistical Methods in Medical Research.

VIEW ALL JOURNAL METRICS

Total views and downloads: 3272

^*Article usage tracking started in December 2016

Receive email alerts when this article is cited

Web of Science: 14 view articles Opens in new tab

Crossref: 3

Heterogeneity in the onwards transmission risk between local and impor...

Go to citation Crossref Google Scholar
Update on COVID-19 and Effectiveness of a Vaccination Campaign in a Gl...

Go to citation Crossref Google Scholar
Real-time estimation of the effective reproduction number of SARS-CoV-...

Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

Sage Journals profile

Sign in

Access personal subscriptions, purchases, paired institutional or society access and free tools such as email alerts and saved searches.

Required fields

Email:

Password:

Remember me

Forgotten your password?

Create profile

Institution

Society

Alternatively, view purchase options below:

Purchase access

Purchase 24 hour online access to view and download content.

Issue - $543.66

Article - $41.50

Subscribe to this journal

Read with DeepDyve

Need help?

Abstract

What is the reproduction number R ?

The beginning of a pandemic – R 0

The second simplest case: where an outbreak is widespread – R t

When the outbreak is at a low level or fragmented the concept of R may be less useful

Relating R to details of the infection process

R , vaccination and herd immunity

How can R be estimated?

Statistical methods to estimate R

Data used to estimate R

Summary: cautions and recommendations

Acknowledgements

Declaration of conflicting interests

Funding

ORCID iDs

References

Cite article

Cite article

Download to reference manager

Share

Share this article

Share with email

Share on social media

Share access to this article

Information

Published In

Keywords

Rights and permissions

Authors

Affiliations

Notes

Metrics

Journals metrics

Article usage*

Articles citing this one

Figures & Media

Tables

View options

PDF/ePub

Get access

Access options

Sign in

Also from Sage

What is the reproduction number $R$ ?

The beginning of a pandemic – $R_{0}$

The second simplest case: where an outbreak is widespread – $R_{t}$

When the outbreak is at a low level or fragmented the concept of $R$ may be less useful

Relating $R$ to details of the infection process

$R$ , vaccination and herd immunity

How can $R$ be estimated?

Statistical methods to estimate $R$

Data used to estimate $R$

Article usage^*