Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences
Open Access Research articles

Stochastic compartmental models of the COVID-19 pandemic must have temporally correlated uncertainties

Konstantinos Mamis

Konstantinos Mamis

Department of Mathematics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC 27695-8205, USA

Contribution: Data curation, Formal analysis, Investigation, Software, Visualization, Writing – original draft

Google Scholar

Find this author on PubMed

and
Mohammad Farazmand

Mohammad Farazmand

Department of Mathematics, North Carolina State University, 2311 Stinson Drive, Raleigh, NC 27695-8205, USA

[email protected]

Contribution: Conceptualization, Supervision, Validation, Writing – review & editing

Google Scholar

Find this author on PubMed

Published:https://doi.org/10.1098/rspa.2022.0568

    Abstract

    Compartmental models are an important quantitative tool in epidemiology, enabling us to forecast the course of a communicable disease. However, the model parameters, such as the infectivity rate of the disease, are riddled with uncertainties, which has motivated the development and use of stochastic compartmental models. Here, we first show that a common stochastic model, which treats the uncertainties as white noise, is fundamentally flawed since it erroneously implies that greater parameter uncertainties will lead to the eradication of the disease. Then, we present a principled modelling of the uncertainties based on reasonable assumptions on the contacts of each individual. Using the central limit theorem and Doob’s theorem on Gaussian Markov processes, we prove that the correlated Ornstein–Uhlenbeck (OU) process is the appropriate tool for modelling uncertainties in the infectivity rate. We demonstrate our results using a compartmental model of the COVID-19 pandemic and the available US data from the Johns Hopkins University COVID-19 database. In particular, we show that the white noise stochastic model systematically underestimates the severity of the Omicron variant of COVID-19, whereas the OU model correctly forecasts the course of this variant. Moreover, using an SIS model of sexually transmitted disease, we derive an exact closed-form solution for the final distribution of infected individuals. This analytical result shows that the white noise model underestimates the severity of the pandemic because of unrealistic noise-induced transitions. Our results strongly support the need for temporal correlations in modelling of uncertainties in compartmental models of infectious disease.

    1. Introduction

    Quantitative models for forecasting the spread of communicable disease are a valuable tool for policymaking during pandemics, such as the ongoing COVID-19, and endemic diseases, such as gonorrhoea. Compartmental models are simple, widely used, quantitative tools for studying the spread of such infectious disease [16]. For instance, an appropriate compartmental model for the COVID-19 pandemic is the susceptible-exposed-infected-removed (SEIR) model [715], depicted in figure 1a. As shown in figure 1c, if the model parameters are chosen properly, the deterministic SEIR model accurately forecasts the spread of the Omicron variant in the USA.

    Figure 1.

    Figure 1. Summary of stochastic modelling and results for the Omicron variant of the COVID-19 pandemic in the USA. (a) SEIRS deterministic compartmental model suitable for airborne diseases, such as COVID-19. The parameters in the SEIRS model are: (i) the average adequate contact rate λ of each individual (also called the infectivity rate), (ii) the average incubation rate α , which is the inverse of the average incubation (or latency) period, during which the individual has contracted the virus but is not infectious yet, (iii) the average curing rate γ , which is the inverse of the average time each individual needs to recover, (iv) 1 / ρ is the average period after which immunity wanes off. Links between compartments shown in dashed lines are those omitted in the SEIR model of equation (3.2) we use here. (b) Cumulative number of COVID-19 cases in the USA. Light red frame highlights the Omicron variant wave (December 2021–April 2022). (c) COVID-19 forecast obtained by least-squares fitting of the deterministic SEIR model to the data for Omicron cases. Contact rate λ of fitted SEIR model is then stochastically perturbed by noise ξ ( t ) with intensity σ , around its mean value λ ¯ . (d) Results of white noise stochastic SEIR models for increasing noise intensity σ , obtained from 50 000 Monte Carlo simulations. Mean trajectories of COVID-19 cases are shown, equipped with error bars, inside which the 50% of trajectories lie. (e) Same as (d), except that ξ ( t ) is the Ornstein–Uhlenbeck noise with correlation time τ = 1 week.

    However, the model parameters, such as the infectivity rate λ , the incubation rate α and the curing rate γ , are a priori unknown. These parameters can be estimated by collecting and averaging data over the population. Such data are inevitably riddled with uncertainties that have motivated the development and use of stochastic compartmental models.

    In stochastic compartmental models, one assumes that the model parameters are either random variables or stochastic processes [9,16]. The former assumes that parameter values are uncertain, but otherwise constant over time. In the present work, we model the parameters as stochastic processes, in order to account for random changes in parameters that are time-varying, and possibly correlated in time. Note that this time-varying uncertainty in model parameters is different from models with deterministic time-varying coefficients that account for the seasonal variability [17]. The most common approach in models with stochastic processes as parameters is to assume that the parameter comprises a constant mean perturbed with white noise [9,16,18,19]. For instance, the infectivity or contact rate is often modelled as λ ( t ) = λ ¯ + σ ξ ( t ) , where λ ¯ is the mean, ξ ( t ) is the standard white noise and σ is a constant controlling the noise intensity [16,2025]. Here, we first show that, although this assumption seems reasonable, it leads to systematic underestimation of the disease spread. For instance, figure 1d shows the SEIR prediction of the Omicron wave in the USA with the infectivity rate λ modelled as white noise. The resulting stochastic SEIR model consistently underestimates the true cumulative number of COVID-19 cases. Making matters worse, as the noise intensity σ increases, the white noise stochastic model further underestimates the number of COVID-19 cases. The mean trajectories of COVID-19 cases drop significantly compared with the predictions of the deterministic model. For σ = 0.6 λ ¯ , the deterministic forecast is even outside the region where the 50% of trajectories around the mean trajectory lie (figure 1d).

    As we discuss in §3, this behaviour is not specific to the SEIR model, but is also observed in SIR and SIS models. This indicates that the stochastic compartmental models with white noise have a fundamental flaw: they imply that greater uncertainty in the model parameters leads to a less severe spread of the disease. This model behaviour is certainly dubious since greater ignorance about the disease, e.g. its infectivity rate, does not in reality make a pandemic less severe. The counterintuitive and unrealistic implications of modelling parameter uncertainties with white noise had also been noted in oncology models, where increasing the noise intensity leads to tumour eradication, a behaviour that is not replicated in actual cases [26].

    By contrast, as shown in figure 1e, modelling uncertainties by correlated noise alleviates such contradictory behaviour. With correlated noise, stochastic compartmental models exhibit the behaviour one would expect: their mean trajectories are in line with the deterministic model for all values of σ , and higher noise intensity leads to higher variance around the mean trajectory.

    Here, for the first time, we present a principled modelling of uncertainties in the infectivity rate of epidemiological models. Starting from reasonable assumptions on the contact rate of each individual, and using the central limit theorem and Doob’s theorem on Gaussian Markov processes, we prove that the only admissible model for such uncertainties is the mean-reverting Ornstein–Uhlenbeck (OU) process. We demonstrate the implications of our results on two examples, an SIS model of sexually transmitted disease and SEIR model of COVID-19. For the SIS model, we derive a closed-form solution for the probability distribution of the infected population and show that the OU-based model always has a peak near the deterministic equilibrium. By contrast, modelling uncertainties with white noise erroneously predicts a less severe disease spread, or even its eradication, as the uncertainties grow. Similar observations are made for the SEIR model of COVID-19 using Monte Carlo simulations and the Johns Hopkins University database of COVID-19 cases.

    We note that many epidemiological models have already incorporated stochasticity [27]. A straightforward approach is the stochastic perturbation of the model parameters [16,2025]. A more principled approach is to model the number of susceptible and infected individuals as continuous-time Markov chain birth–death processes or as branching processes [2834]. Under the usual assumption that their state variables are continuous, Markov chain models lead to a system of stochastic differential equations whose noise covariance matrix is different from the parametric perturbation models [28,3537]. Nonetheless, these equations use white noise to account for uncertainties, overlooking the temporal correlations present in human social activities [38].

    The OU process has also been proposed for perturbation of parameters in compartmental epidemiological models [28,3941]. So far, the justification for the use of OU noise has not been based on first principles. Rather, OU has been chosen for its mean-reverting property; on average, OU noise stays closer to its mean value compared with the white noise. Thus, by choosing OU instead of white noise, unrealistic values of model parameters, though not eliminated, are less frequent.

    The first contribution of the present work is the observation that compartmental models with white noise fluctuations in contact rate systematically underestimate the severity of disease spread in the population. Secondly, we justify the use of OU noise fluctuations from first principles. More specifically, we prove that the OU process is the only admissible noise that takes into account the temporal correlations present in the social behaviour of individuals. Finally, we show that the predictions of compartmental models with OU noise are drastically different from the predictions of white noise models; OU model predictions are always in agreement with the actual data, and do not underestimate the disease spread.

    2. Stochastic modelling of the average contact rate

    There are several parameters in compartmental models of infectious disease. For instance, in the SEIR model, these are the infectivity or average contact rate λ , the incubation rate α and the curing rate γ . The source of their uncertainty is the underlying dynamical interplay between biological, social and environmental factors. Incubation and curing rates depend mainly on the biology of the virus. However, the contact rate λ , defined as the average number of adequate contacts per individual per unit time ([42], Sec. 2.7.1), [43], depends on both biological factors, e.g. how easily the virus transmits, and social factors, e.g. how many individuals each person comes in adequate contact with. As a result, the contact rate is the main source of uncertainty in epidemiological models. Therefore, for the sake of simplicity, we assume that the average incubation and curing rates are deterministic constants and that the contact rate λ is the only stochastic parameter. This is a common choice in the literature on stochastic compartmental models [10,16,44].

    We denote the total number of adequate contacts of the n th individual, up to time t , by C n ( t ) . A contact is adequate if it is sufficient for the transmission of the virus [45]. The contact rate of the n th individual is then defined as the rate of change of C n , i.e. λ n = d C n / d t . Since C n ( t ) is not necessarily differentiable, it is more convenient to work with its increments, Δ C n ( t ) , that is, the number of contacts of the n th individual over the time interval [ t , t + Δ t ] , where Δ t is a small time increment. The contact rate is then approximated by λ n ( t ) Δ C n ( t ) / Δ t .

    We treat the incremental contacts Δ C n ( t ) of each individual as a stochastic process satisfying the following assumptions:

    (1)

    There is no dependence between the incremental contacts of different individuals. In other words, we assume { Δ C n ( t ) } n = 1 N is a collection of N independent random processes.

    (2)

    The number of contacts each individual makes in [ t , t + Δ t ] does not depend on the history of their contacts in previous time instances; thus Δ C n ( t ) is a Markov process, where N is the population size.

    (3)

    The mean and covariance of the incremental contacts satisfy

    E [ Δ C n ( t ) ] = μ Δ t 2.1a
    and
    Cov [ Δ C n ( t ) , Δ C n ( s ) ] = σ 0 2 2 τ exp ( | t s | τ ) Δ t 2 , 2.1b
    where [ t , t + Δ t ] and [ s , s + Δ t ] are disjoint time intervals, and τ is the correlation time.

    Equation (2.1) models the assumption that the incremental contacts of every individual are proportional to the time interval Δ t . Equation (2.1a), where μ is a constant, models the assumption of no seasonal variability in contacts. The exponential decay in equation (2.1b) constitutes the simplest model for temporally correlated incremental contacts. The correlation time τ can be interpreted as a characteristic time scale for the social patterns of every individual (e.g. one day or one week). Note that, although the incremental constants Δ C n ( t ) have the same mean and covariance, we do not assume that they are identically distributed.

    Assumptions (1)–(3) determine the properties of the contact rate λ n ( t ) . In particular, λ n ( t ) inherits the Markovian property of Δ C n ( t ) . Also, the mean value and covariance of λ n ( t ) are determined via equation (2.1),

    E [ λ n ( t ) ] = μ 2.2a
    and
    Cov [ λ n ( t ) , λ n ( s ) ] = σ 0 2 2 τ exp ( | t s | τ ) . 2.2b
    The exponentially decaying correlation of equation (2.2b) for the contact rate of each individual was also assumed in the agent-based model of Ariel & Louzun [39]. Having defined the contact rate λ n ( t ) for every individual, we now introduce the contact rate averaged over the whole population as λ ( t ) = ( 1 / N ) n = 1 N λ n ( t ) .

    Now, we consider K time instances t 1 < < t k < < t K and define the vectors λ n = [ λ n ( t 1 ) , , λ n ( t k ) , , λ n ( t K ) ] T , which samples the stochastic process λ n ( t ) at the said time instances. The random vector λ n has mean value μ with μ k = μ , and covariance matrix Σ with Σ k = Cov [ λ n ( t k ) , λ n ( t ) ] , k , = 1 , , K . For the average vector λ = ( 1 / N ) n = 1 N λ λ n , the multi-dimensional central limit theorem [46] implies

    N ( λ μ ) N K ( 0 , Σ ) , 2.3
    where N is the total population, and N K is the K -variate normal distribution. Note that λ can be viewed as the discrete samples of the average contact rate λ ( t ) , so that λ = [ λ ( t 1 ) , , λ ( t k ) , , λ ( t K ) ] T .

    Thus, the stochastic process λ ( t ) of the average contact rate has the following properties:

    (i)

    It is temporally homogeneous, since for every sequence t 1 < < t k < < t K , all the K -variate joint distributions of λ ( t 1 ) , , λ ( t K ) are invariant to translations in time.

    (ii)

    It is a Markov process, since it is the average of independent Markov processes [47,48].

    (iii)

    For every pair of two time instances, t and s , its values λ ( t ) and λ ( s ) follow a bivariate Gaussian distribution.

    Doob’s theorem on Gaussian Markov processes [49] implies that the only two possible stochastic processes with properties (i)–(iii) are the uncorrelated white noise and the correlated OU noise. White noise is also ruled out since we have already proven by the central limit theorem that λ ( t ) and λ ( s ) are not independent. Therefore, the only choice for λ ( t ) is to be an OU process, with mean value
    E [ λ ( t ) ] = μ . 2.4
    and autocovariance
    Cov [ λ ( t ) , λ ( s ) ] = σ 2 2 τ exp ( | t s | τ ) , 2.5
    with σ = σ 0 / N .

    It is well known that the OU process, with mean (2.4) and stationary autocovariance (2.5), is generated by the stochastic differential equation,

    d λ = 1 τ ( μ λ ) d t + σ τ d W , 2.6
    where W ( t ) is the standard Wiener process [50]. We note that, as the correlation time τ tends to zero, the uncorrelated white noise is retrieved ([51], Sec. 6.6). In §3, we use this limiting relationship to compare results from stochastic compartmental models under white and OU noise.

    For notational simplicity, we write λ ( t ) = λ ¯ + σ ξ ( t ) , where λ ¯ = μ is its mean value, σ is its noise intensity and ξ ( t ) is a standard OU process with zero mean and autocovariance given by equation (2.5) with σ = 1 .

    3. Results

    (a) Stochastic SIS models

    Before discussing the COVID-19 data, we examine the simpler SIS model where we derive the probability distribution of the infected individuals in closed form. This allows us to highlight the differences between the white noise and OU noise with no ambiguity. The SIS model is governed by the equations

    d S d t = λ N S I + γ I 3.1a
    and
    d I d t = λ N S I γ I , 3.1b
    with the susceptible (S) and infected (I) compartments where N = S + I is the total population. We assume that the curing rate γ is a deterministic constant and the contact rate λ ( t ) = λ ¯ + σ ξ ( t ) is a stochastic process. We consider two types of noise ξ ( t ) : white noise and the standard OU noise. It is customary to express the results in terms of the infected fraction X = I / N . Note that since the total population N is conserved, we have S = N ( 1 X ) . Throughout the paper, the noise intensity σ is reported as a percentage of the mean value λ ¯ , in order to measure the extent of variability in relation to the mean. The parameter σ / λ ¯ is known as the standard deviation to mean value ratio or coefficient of variance.

    For stochastic SIS models, we are able to obtain the final distribution of the infected fraction X of the population in closed form (see §S2 of the electronic supplementary material). As seen in figure 2, the theoretical final distributions are always in excellent agreement with Monte Carlo simulations.

    Figure 2.

    Figure 2. (central figure) Bifurcation diagram for the SIS model under white noise perturbation of its contact rate, depending on two-dimensionless parameters, the basic reproduction number R 0 = λ ¯ / γ and the relative noise variance σ 2 / λ ¯ . The four regions shown depict different shapes for the distribution of the final number of infected X as a fraction of the population. I: Unimodal with mode at a non-zero X . II: Bimodal with one mode at X = 0 . III: Unimodal with mode at X = 0 . IV: Delta function at X = 0 . (peripheral figure ad) Distributions for the final infected fraction for gonorrhoea ( R 0 = λ ¯ / γ = 1.4 , γ = 1 / 20 days 1 [45]) for increasing noise intensity: σ = 0.2 λ ¯ for (a), σ = 0.5 λ ¯ for (b), σ = 0.8 λ ¯ for (c), σ = 1.2 λ ¯ for (d). In each figure, the stable equilibrium of the original deterministic model is marked by a red dashed line. The points corresponding to (ad) cases are also marked on the central figure. The additional case (e), not corresponding to a disease, is also shown as a representative of region II of the bifurcation diagram. In peripheral figures, we also depict the distribution under Ornstein–Uhlenbeck noise with correlation time half the characteristic time scale of the system (see appendix A). Open circles correspond to distributions obtained from 50 000 samples of direct Monte Carlo simulations, while solid lines mark the exact distributions. In (d), the distribution corresponding to white noise perturbation is a delta function at X = 0 , and thus it is depicted as a vertical arrow.

    The final distribution depends on two-dimensionless parameters (see §S2 of the electronic supplementary material), the deterministic basic reproduction number R 0 = λ ¯ / γ and the relative variance σ 2 / λ ¯ of the noise. Thus, we are able to produce the bifurcation diagram shown in figure 2, where the possible final forms of the distribution are depicted. For R 0 > 1 , the deterministic SIS model ( σ = 0 ), has one stable equilibrium at X = ( λ ¯ γ ) / λ ¯ , and one unstable equilibrium at X = 0 . However, the stochastic SIS model with white noise exhibits a richer asymptotic behaviour that includes parameter regions of bistability and regions where X = 0 is stable.

    We first focus on R 0 = 1.4 , which is the relevant basic reproduction number for gonorrhoea [45]. As shown in figure 2, by increasing the white noise intensity, the infected fraction distribution undergoes a transition. For low noise intensity, the distribution exhibits one mode (region I of figure 2), close to the stable deterministic equilibrium (figure 2a). As the noise intensity increases, the distribution mode shifts towards zero (figure 2b). For more noise intensity, the distribution exhibits its mode at zero (region III), see figure 2c. Increasing the white noise intensity further results in the eradication of the disease from the population (region IV), since the distribution of the infected fraction X becomes a delta function at zero (figure 2d). We emphasize that this is an unrealistic model behaviour since it implies that greater ignorance about the contact rate λ will lead to disease eradication.

    For more contagious diseases with basic reproduction number higher that 2, the final distribution undergoes a different noise-induced transition. At first, as the noise intensity increases, the mode of the distribution (region I) shifts towards values that are larger than the deterministic equilibrium. For even higher noise intensity (figure 2e), an additional mode at zero appears (region II), whose magnitude increases by further increase in noise intensity. This increase eventually results in the eradication of the disease from the population, since the distribution becomes a delta function centred at zero (region IV).

    In figure 2ad, we also plot the distribution obtained under an OU perturbation of the contact rate in the SIS model. As correlation time of the OU noise, we choose τ = 0.5 ( λ ¯ γ ) 1 , where ( λ ¯ γ ) 1 is the characteristic time scale of the deterministic SIS model (see appendix A).

    We observe that, compared with white noise perturbation, correlation in noise has the effect of suppressing transition away from the deterministic equilibrium. For noise levels as high as 80%, the mode of the distribution remains close to the stable deterministic equilibrium. For unrealistically large noise levels (such as σ = 1.2 λ ¯ of figure 2d), the distribution under OU perturbation also exhibits an additional mode at 0. However, most of distribution mass is still located around the deterministic equilibrium, whereas, for the same noise levels, the SIS model with white noise perturbation predicts the eradication of the disease from the population.

    (b) Omicron variant in the USA

    Next, we study the Omicron variant of COVID-19 in the USA, using the SEIR model,

    d S d t = λ N S I , 3.2a
    d E d t = λ N S I α E , 3.2b
    d I d t = α E γ I , 3.2c
    d R d t = γ I , 3.2d
    with the susceptible (S), exposed (E), infected (I) and removed (R) compartments. It shares the same set of parameters with the SIS model as well as the additional parameter α denoting the average incubation rate. We note that more complex models which include a vaccinated compartment, e.g. the susceptible-vaccinated-exposed-infected-removed (SVEIR) model [52], can also be used. Following earlier studies [715], here we treat individuals that are effectively vaccinated as part of the removed compartment and therefore use the simpler SEIR model.

    Our main COVID-19 results are shown in figure 1. First, we determine the parameters of the deterministic SEIR model by fitting its trajectory to the data from the cumulative COVID-19 cases in the USA [53] for the period of December 2021–April 2022, during the Omicron wave (figure 1b,c). The fitting procedure is described in appendix A. Then we consider the stochastic contact rate λ ( t ) = λ ¯ + σ ξ ( t ) , which is perturbed around its deterministic value λ ¯ = 1.54 days 1 . For the perturbation ξ ( t ) , we consider both white and OU processes, and perform Monte Carlo simulations of the respective stochastic SEIR models (figure 1d,e). The noise intensity is varied between 10% and 60% of the average contact rate λ ¯ . For the OU noise, we also have to choose the value of the correlation time τ . Unfortunately, empirical data on the temporal correlation of contact rates are scarce [38]. For the results shown in figure 1e, we choose one week as the correlation time, motivated by the weekly pattern in human activity [5457]. The results are similar for shorter and longer correlation times (see §S3 of the electronic supplementary material).

    As shown in figure 1d, increasing the intensity of both white and OU noise results in an expected increase in variance of the forecasted Omicron cases. However, we observe that, in the white noise case, increasing the noise intensity results in the mean to underestimate the final number of cases, predicting significantly less spread of the disease in the population. For instance, when the noise intensity is at 60%, the forecasted cumulative number of COVID-19 cases is underestimated by approximately 20%. This result is unrealistic, since higher uncertainty in the contact rate does not in reality result in a less severe pandemic.

    On the other hand, the results in figure 1e for the correlated OU noise are reliable. The mean trajectories stay close to the deterministic prediction and the data. As the noise intensity increases, the predictions attain slightly higher mean values of the final number of cases, compared with the value predicted by the deterministic model. For instance, at 60% noise intensity, the forecasted cumulative number of COVID-19 cases increases by approximately 4%.

    The mechanism by which the SEIR model under white noise underestimates the size of the COVID-19 pandemic is better understood by inspecting the noise-induced qualitative changes in the distribution of the final number of cases; see figure 3. In absence of a closed-form solution, we use Monte Carlo simulations to determine the distribution corresponding to the stochastic SEIR model.

    Figure 3.

    Figure 3. Distributions of the cumulative COVID-19 cases in the USA (total population 329.5 million), as determined from SEIR models with stochastically perturbed contact rate. Model parameters are the same as in figure 1. Stable equilibrium point of the deterministic SEIR is shown as a red dashed line. (a) SEIR with contact rate perturbed by white noise. (b) SEIR with contact rate perturbed by Ornstein–Uhlenbeck noise with correlation time τ = 1 week.

    As shown in figure 3, for small noise intensities, the distributions of both white and OU stochastic models are unimodal and narrow, with their modes approximately coinciding with the deterministic equilibrium, marked by a dashed line. The distribution under OU noise is slightly less diffusive than the one under the white noise perturbation, which is consistent with the sharpening effect of correlated noise [50,58]. As expected, increase in noise intensity results in distributions that are more diffusive, and in distribution modes that progressively move away from the deterministic equilibrium, a phenomenon called the peak drift [50,58,59]. However, the trends in peak drift are opposite in the two perturbations; the distribution mode under white noise moves towards lower numbers of cases, while under OU noise, it shifts towards higher values.

    The main difference between the two stochastic perturbations, as shown in figure 3, is the following. Whereas the distribution corresponding to OU noise stays always unimodal, the distribution corresponding to white noise undergoes a bifurcation at σ 0.5 λ ¯ , where an additional mode in the low COVID-19 cases regime ( 5 × 10 7 cases ) emerges, and the distribution becomes bimodal. This bimodality enables noise-induced transitions towards the lower number of cases, which in turn explains why the white noise model underestimates the severity of the pandemic (figure 1d).

    4. Conclusion

    Our study highlights the challenges of incorporating parameter uncertainties in compartmental epidemiological models. We showed that, although common, modelling these uncertainties with white noise is fundamentally flawed, since the corresponding stochastic models systematically underestimate the number of infected individuals.

    We then proposed a principled modelling of the average contact rate, which accounts for the uncertainties in human social behaviour. By considering temporal correlations in each individual’s behaviour, we showed that the OU process is the correct model for uncertainties in the contact rate. We demonstrated the efficacy of our proposed stochastic model on the SIS model of sexually transmitted disease and the SEIR model of the COVID-19 pandemic.

    The OU model has one unfavourable characteristic: similar to white noise, it allows for rare instances where the contact rate becomes negative, which is not epidemiologically allowed (see §S3 of the electronic supplementary material). Nonetheless, the resulting model is amenable to mathematical analysis and produces accurate forecasts that do not suffer from the idiosyncrasies of white noise. As such, the OU process serves as a reliable minimal model of the average contact rate.

    Finally, the OU process has a time correlation parameter whose value needs to be derived from empirical data; such data are currently very scarce [38,60,61]. Hence, our work indicates the need for more empirical studies that quantify the correlation time in close proximity interactions.

    Data accessibility

    The data used in this work are available from the COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University, https://github.com/CSSEGISandData/COVID-19.

    The data are provided in the electronic supplementary material [62].

    Authors' contributions

    K.M.: data curation, formal analysis, investigation, software, visualization, writing—original draft; M.F.: conceptualization, supervision, validation, writing—review and editing.

    Both authors gave final approval for publication and agreed to be held accountable for the work performed therein.

    Conflict of interest declaration

    We declare we have no competing interests.

    Funding

    No funding has been received for this article.

    Acknowledgements

    We are grateful to Prof. Alun Lloyd (North Carolina State University) for fruitful discussions.

    Appendix A Material and methods

    (a) SIS model

    In the SIS model (3.1), N denotes the total population. The term ( λ / N ) S I is the simplest form for the disease transmission, and it is based on the assumption of homogeneous mixing of population [43]. Transmission term without the division with N is sometimes used [16]; however, this choice is not supported by empirical evidence [63]. Under the constant population assumption S + I = N , the SIS model (3.1) can be reduced to one scalar ordinary differential equation. Consider the infected as a fraction of the population by defining X = I / N for the state variable. Then the scalar equation for the SIS model reads

    d X d t = λ X ( 1 X ) γ X . A 1
    For R 0 > 1 , equation (A 1) has an unstable equilibrium at X = 0 and a stable equilibrium at X = ( λ γ ) / λ . The characteristic Lyapunov time of the system is ( λ γ ) 1 ; see §S1 of the electronic supplementary material. We obtain the stochastic SIS model by perturbing λ in equation (A 1) by noise: λ = λ ¯ + σ ξ ( t ) . For the case of white noise perturbation, the evolution of distribution of X is governed by the classical Fokker–Planck equation [64]. Recently, we have also formulated an approximate nonlinear Fokker–Planck equation for the case of correlated noise perturbations [58,59]. Since equation (A 1) is scalar, the stationary solutions of the Fokker–Planck equations are available in closed form in both cases of white noise and correlated noise (see of §S2 of the electronic supplementary material). These closed-form stationary solutions are used in figure 2. For the Monte Carlo simulations, numerical solutions of stochastic SIS and SEIR models are generated by a predictor-corrector scheme [65].

    (b) SEIR model and fit to COVID-19 data

    For the SEIR model (3.2), the total population N is assumed constant and equal to the US population of 329.5 million. The data are obtained from the COVID-19 Dashboard by the Center for Systems Science and Engineering at Johns Hopkins University [53] and were analysed using a publicly available Matlab code [66]. 3 December 2021 is considered as the initial time t 0 , i.e. the start of the Omicron wave in the USA. The parameter values are determined by least-squares fitting, resulting in the basic reproduction number R 0 = λ / γ = 1.85 , incubation rate α = 1 / 3.5 days 1 and curing rate γ = 1 / 1.2 days 1 . The initial values of the exposed E ( t 0 ) , the infected I ( t 0 ) and the removed R ( t 0 ) at the beginning of the Omicron wave were chosen consistent with the Johns Hopkins database to be 0.14%, 0.18% and 14.88% of the total population, respectively. To obtain the cumulative COVID-19 cases from the SEIR model, we use the relation t 0 t I ( s ) d s = ( R ( t ) R ( t 0 ) ) / γ from equation (3.2d), which measures the total number of cases over [ t 0 , t ] .

    Footnotes

    Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.6360034.

    Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.

    References