Mutation rates per cell infection.
Table
2 shows mutation rates, defined as the probability of a nucleotide substitution per nucleotide per cell infection (
μs/n/c ) obtained from equation
2. For each of 37 studies, we provide the value of
μs/n/c and information about the mutational target size (
Ts for substitutions), the number of cell infection cycles (
c), and the selection correction factor (
α) (details of the calculations are in “Calculation of mutation rates per nucleotide per cell infection” in Appendix). The majority of these studies were originally designed to control for selection, such that all mutations were neutral (α = 1) or lethal (
cα = 1). In 6 of these 37 studies selection was not controlled for, so we corrected for its effect. The reliability of the mutation rate estimates increases as
Ts increases, since mutation sampling becomes more representative. It also increases as
c decreases, because there is less time for selection to act, and it increases if
α is known. Estimates based on a low
Ts , a large
c, or an undetermined
α or suffering from other problems are shown within parentheses and should be taken with caution. Also, it is clearly desirable that several independent estimates are available for each virus, and so we present average values where possible (since mutation rates vary by orders of magnitude, we used geometric means; i.e., we averaged in log scale). Finally, although the mutation rates in Table
2 refer to nucleotide substitutions, in some cases the mutation rate to indels has been measured and so we show their contribution to the total rate (
δ).
Mutation rates per strand copying and inference of replication modes.
Mutation rates defined as the probability of a nucleotide substitution per nucleotide per strand copying (
μs/n/r ) are shown in Table
3. The availability of these estimates is limited because of the lack of information about the replication modes of many viruses. The values shown were derived using the Luria-Delbrück fluctuation test null-class method (equation
10; details of each calculation can be found in “Calculation of mutation rates per nucleotide per strand copying” in Appendix). Consistently, the mutation rates per strand copying are lower than those per cell infection. For some viruses, the two kinds of estimate are available and we can thus calculate the number of copying cycles per infected cell as
rc =
μs/n/c /
μs/n/r (Table
4). Further, by comparing the observed
rc value with its minimum and maximum possible values, we can infer the likely mode of replication. A replication event requires one cycle of strand copying in double-stranded DNA (dsDNA) viruses, and hence min[
rc ] = 1, which corresponds to the purely stamping machine (linear) replication mode. Single-stranded RNA (ssRNA) viruses produce an intermediate strand of opposite polarity, most dsRNA viruses produce a single positive-sense strand which is later copied to reform dsRNA, and ssDNA viruses are first copied to form dsDNA. Therefore, min[
rc ] = 2 for all these virus types. This implies that, strictly speaking, fully linear replication is not possible in these viruses. By definition, the number of copying cycles per infected cell is maximal under binary replication and equals max[
rc ] = log
B/log 2. This holds for dsDNA viruses but for ssRNA, dsRNA, and ssDNA viruses we must use max[
rc ] = log
B/log 2 + 1 since there is an additional strand copying from a single strand. Previous work has shown that the mode of replication is close to linear in bacteriophages φX174 (
19) and φ6 (
9), binary in bacteriophage T2 (
55), and probably binary in bacteriophage λ during its lytic phase (
26). Comparison of
rc with max[
rc ] and min[
rc ] (Table
4) confirms these results and also leads us to suggest that replication is close to linear for influenza A virus (FLUVA), intermediate for vesicular stomatitis virus (VSV), and close to binary for poliovirus 1 (PV-1). For φX174 and VSV, we repeated this analysis but used only mutation rates per cell infection and per strand copying obtained from the same study (i.e., not comparing across studies). This gives results consistent with those shown in Table
4 (
rc = 1.0 for φX174 and
rc = 3.0 for VSV). Finally, in retroviruses, the genomic positive-sense RNA is reverse transcribed to obtain ssDNA, which is copied to form dsDNA, and hence
rc = 2 for virus-mediated replication. This is followed by integration into the host chromosome, transcription, and possibly a variable number of host cell replications, implying that
rc ≥ 3. However, since proofreading and repair systems are present in these additional host-mediated copying processes (
50,
87), they probably contribute little to the overall mutation rate.
Analysis of general mutation rate patterns.
One of the most general results concerning mutation rates is Drake's rule (
26), which states that the mutation rate per genome per strand copying is roughly constant across DNA-based microorganisms, including DNA viruses. There is therefore an inverse relationship between genome size and the mutation rate per nucleotide. However, this rule has not previously been tested using mutation rates expressed per cell infection. This test is necessary because DNA viruses with large genomes show the lowest mutation rates per strand copying but also tend to use binary replication. Binary replication produces more mutations per cell, and this might compensate for the lower rate per strand copying. However, the plot of mutation rates per cell infection against genome size (Fig.
2) indicates that Drake's rule is robust to the choice of units.
Another general observation is that RNA viruses have a higher mutation rate than DNA viruses (
29,
45). However, based on observations that some ssDNA viruses can evolve rapidly, it was recently suggested that they may have mutation rates close to those of RNA viruses (
32). We find that the lowest
μs/n/c estimate among RNA viruses is 1.6 × 10
−6 and the highest among DNA viruses is 1.1 × 10
−6. Hence, although our data show no overlap between viral RNA and DNA mutation rates, the separation may be less than is often thought. Indeed, the transition between DNA and RNA viruses appears to be relatively smooth in Fig.
2 and can be partially explained by differences in genome size.
Another question is whether the negative correlation between mutation rate and genome size observed for DNA-based microorganisms is also true for RNA viruses. The main difficulty in testing this is that the range of genome sizes among RNA viruses is only around 1 order of magnitude and the mutation rate estimates have large errors. Despite this limitation, we found a significantly negative correlation between mutation rate and genome size among the combined ssRNA(+), ssRNA(−), and dsRNA viruses (
n = 11; Spearman correlation,
ρ = −0.618 and
P = 0.043; Pearson correlation using log
10 scales,
r = −0.718 and
P = 0.013). However, the mutation rate estimates for the viruses with the smallest and largest genomes (bacteriophage Qβ and murine hepatitis virus [MHV]), which are key for testing this correlation, have problems such as small mutational targets or difficulties in accounting for selection bias (see Appendix, “Calculation of mutation rates per nucleotide per cell infection”). Moreover, the statistical significance of the correlation is lost when the less reliable estimates (shown in parentheses in Table
2) are removed (
n = 8;
ρ = −0.429,
P = 0.289;
r = −0.583,
P = 0.129). Also, inclusion of retroviruses slightly weakens the correlation (
n = 18;
ρ = −0.418,
P = 0.084;
r = −0.550,
P = 0.018). Therefore, the current data are consistent with there being a negative relationship between mutation rate and genome size among RNA viruses, but they do not strongly support it.
In previous reviews, it has been proposed that the mutation rates of retroviruses tend to be lower than those of other RNA viruses (
29,
59), despite HIV-1 being perhaps the prototypic fast-evolving RNA virus (
4,
92). However, there is no evidence for a lower mutation rate in retroviruses (geometric mean
μs/n/c = 3.0 × 10
−5) than in RNA viruses (
μs/n/c = 2.2 × 10
−5). Therefore, there is currently no reason to attribute differences between the evolutionary rate of retroviruses and other RNA viruses to differences in mutation rates. It is also worth mentioning that the high rate of evolution of HIV-1 is not extremely different from that of other RNA viruses such as, for instance, FLUVA or foot-and-mouth disease virus (
48).
Concerning the fraction of indels compared to total mutations, we observe
δ = 0.10 to 0.40 (Table
2) with a mean of 0.24 and a median of 0.20. This is very similar to the estimated
δ = 0.21 for a single viroid (
37) and is also consistent with the value obtained for several DNA microbes (
28). Although it has been suggested that indels are particularly frequent in some RNA viruses (
58,
71), our review of the literature confirms that nucleotide substitutions are the most frequent type of spontaneous mutations, being roughly four times more frequent than indels.
Conclusions and outstanding questions.
Whether mutation rates should be expressed per infected cell or per strand copying depends on the questions being addressed and the estimation method. In general, we see several advantages to using cell infection units. First, it is a natural definition of a viral generation, making comparative analyses across different types of virus or between viruses and other organisms more meaningful. Second, most theoretical models in viral population dynamics use this unit. For instance, this figure, together with the rate of infection of new cells, is used to calculate the probability of specific mutations occurring within an infected individual (
72) or to predict the outcome of lethal mutagenesis treatments (
5). Third, it is more inclusive than the per strand copying rate, since it accounts for other sources of mutation, such as host-mediated editing and copying, or spontaneous damage of the viral nucleic acid. Fourth, it facilitates a clear conceptual separation between the error rate of a viral polymerase and the mutation rate experienced by the virus. On the other hand, although counting cell infection cycles might be easy for animal and bacterial lytic viruses, it is more difficult for persistent viruses, plant viruses, and viruses that integrate in the host genome (e.g., retroviruses and prophages). Also, the complete cell infection cycle includes the extracellular stage, but the duration of this stage can be extremely variable and often indeterminate.
As we have shown, selection can bias mutation rate estimates. Ideally, mutations in the target under study should be strictly neutral or lethal, such that the conversion from mutation frequencies to mutation rates is straightforward (equations
5 and
6). The method based on lethality, for instance, can be implemented by looking at substitutions that produce premature stop codons, provided that no genetic complementation or suppression of stop codons occurs (
13), but mutations introduced during PCR amplification need to be taken into account. Another possibility is to use drug dependence, a form of drug resistance in which the ability to grow in the absence of the drug is lost. These mutants can be identified by isolating drug-resistant mutants and assaying them for growth in the absence of the drug. We have also addressed the problem of correcting selection bias when lethality or neutrality is not guaranteed. The selection correction method proposed here can be used in general for converting mutation frequencies into mutation rates, but the calculation of the correction factor
α depends on whether the sampling method is selection free (Fig.
1). For instance, in molecular clone sequencing experiments, the efficiency with which mutants are PCR amplified, cloned, and sequenced will not depend on their fitness, and hence mutation sampling is nonselective. In contrast, in experiments where plaques are sequenced directly (i.e., without molecular cloning), highly deleterious or lethal mutations will not be observable, and hence mutation sampling is selective. Our choice of parameters for computing
α is based on previous experimental work with a rhabdovirus (
80), a potyvirus (
6), a levivirus (
23), a microvirus (
23), and an inovirus (
73). Importantly, mutational fitness effects are well conserved across these viruses [E(
sv ) = 0.10 to 0.13,
pL = 0.2 to 0.4] (
78), and, given the diversity of this group, we can be relatively confident that the model is realistic for most ssRNA(+), ssRNA(−), and ssDNA viruses infecting animals, plants, or bacteria. In contrast, they are probably not accurate for large ssRNA(+) viruses (e.g., coronaviruses) and dsDNA viruses, and the validity for retroviruses remains to be determined.
There are several outstanding questions regarding the main evolutionary determinants of virus mutation. For instance, biochemical restrictions might not be sufficient to explain the error-prone nature of RNA virus replication, since fidelity can be increased through single amino acid replacements in the RNA polymerase (
74). Also, to investigate the differences between DNA virus and RNA virus mutation rates, more estimates for small DNA viruses are needed, particularly for eukaryotic ssDNA viruses, which are the DNA viruses known to evolve fastest as measured by the number of substitutions that become fixed per year (
46). The role played by error-prone host DNA polymerases in determining the mutation rate of DNA viruses is another interesting research avenue. For RNA viruses, it is still unclear whether there is a negative relationship between mutation rate and genome size analogous to Drake's rule. As we have shown, the current data suggest a correlation, but we need more estimates for the largest and smallest RNA viruses to better test this hypothesis. The possibility that the largest RNA viruses, namely, coronaviruses, show low mutation rates is supported by evidence of 3′ exonuclease proofreading activity in their replicases (
65). Further, the RNA genome with the highest mutation rate, a hammerhead viroid (
37), is 1 order of magnitude smaller than the smallest RNA virus genomes. However, while all viroids have very small genomes, variability studies suggest that they do not all show extremely high mutation rates (
33,
35). Finally, it is also unclear whether genome properties other than size, such as genome polarity or structure, can influence the viral mutation rate. dsRNA is less exposed to chemical damage than ssRNA, and ssRNA(−) viruses pack their genetic material densely with nucleoproteins, which might confer protection against mutation.
Finally, we suggest that future mutation rate studies should fulfill the following criteria: the number of cell infection cycles should be as low as possible, the mutational target should be large, and mutations should be neutral or lethal or a correction should be made for selection bias. Adhering to these criteria will help us to get a clearer picture of virus mutation patterns.
(iv) Poliovirus 1.
(a) A plaque-purified thermosensitive mutant (C5310U) was plated at 33°C and 39°C to obtain the frequency of revertants to the wild type (
17). This mutation was approximately neutral at 33°C (
α ≈ 1), and only the C-to-U reversion restored growth at 39°C (
T =
Ts = 1). The average revertant frequency in three isolated plaques was
fs = 3.1 × 10
−5 and
c = 2.8 (
30). Hence,
μs/n/c = 3 × 3.1 × 10
−5/2.8 = 3.3 × 10
−5. In a second experiment, the isolated plaques were passaged once in liquid culture, and the observed revertant frequency was
fs = 2.3 × 10
−5. In the first experiment, the total number of viruses was 1.1 × 10
9, and therefore, using equation
1 with
N 0 = 1, we obtain
B = (1.1 × 10
9)
1/2.8 = 1,694. In the second experiment, a 10
−5 dilution was applied to inoculate the liquid culture, and the average number of viruses after growth was 8.4 × 10
9. Hence, the amplification factor was (8.4 × 10
9)/(1.1 × 10
9) × 10
5 = 7.6 × 10
5, which corresponds to log (7.6 × 10
5)/log 1,694 = 1.8 additional infection cycles. Hence,
μs/n/c = 3 × 2.3 × 10
−5/(2.8 + 1.8) = 1.5 × 10
−5. Taking the geometric mean of the estimates from the two experiments,
μs/n/c = 2.2 × 10
−5.
(b) The frequency of guanidine-resistant mutants appearing from a guanidine-dependent mutant was measured by plating the virus in the presence and absence of the drug (
18). Approximately 2.0 × 10
6 cells were inoculated with ca. 200 PFU, yielding an average titer of 3.2 × 10
9 PFU/ml in a total volume of 4 ml after completion of the cytopathic effect (
18,
27). Hence, the burst size is
B = (3.2 × 10
9 × 4)/(2.0 × 10
6) = 6,400, and using equation
1 we obtain
c = log (4 × 3.2 × 10
9/200)/log 6,400 = 2.1. Drake and Holland (
30) gave a similar value (
c = 2.5). Sequencing showed that the loss of guanidine dependence could be conferred by each of the three possible nucleotide substitutions at position G4804 or an A → G substitution at position 4802 (
T =
Ts = 4). In two experiments,
fs = 1.1 × 10
−4 and
fs = 5.4 × 10
−4. Pooling all data,
fs = 3.2 × 10
−4. Considering that mutations were probably neutral, i.e.,
α ≈ 1 (although this was not demonstrated),
μs/n/c = 3 × 3.2 × 10
−4/2.1/4 = 1.1 × 10
−4.
(c) Viruses from transfection of cDNA transcripts were passaged three times at an MOI of 1.0 (
c ≈ 3.0), and individual plaques were isolated (
90). The 5′ noncoding region and capsid gene (
L = 2,821) were sequenced directly from reverse transcription-PCR (RT-PCR) products (i.e., without molecular cloning). Thirteen mutations were observed in 18 plaque-derived viruses. For the wild-type virus, 13 mutations were found after sequencing 50,700 nucleotides in total. Hence, using equation
5, we obtain min[
μs/n/c ] = 13/50,700/3 = 8.5 × 10
−5. No max[
μs/n/c ] can be obtained since sampling was selective (i.e., the assumption that all mutations are lethal is incompatible with plaque sequencing). The selection correction factor with selective sampling and assuming the same burst size as above (
B = 1,694) is
α = 0.28 for
pL = 0.3 and E(
sv ) = 0.12. Thus, the corrected estimate is
μs/n/c = min[
μs/n/c ]/
α = 8.5 × 10
−5/0.28 = 3.0 × 10
−4.
(v) Tobacco etch virus (TEV).
(a) Viruses isolated from single necrotic lesions in
Chenopodium quinoa were used to infect tobacco plants, and virions were extracted following the appearance of symptoms (
79). A region encompassing genome positions 7808 to 9437 was amplified by high-fidelity RT-PCR, and 83 molecular clones were sequenced (
Ts = 4,890). Four substitutions were observed. Using equation
6, max[
μs/n/c ] = 3 × 4/83/4,890 = 3.0 × 10
−5. Another reason to consider this estimate as an upper limit is that the observed rate was close to the rate of RT-PCR errors.
(b) Tobacco plants constitutively expressing the TEV polymerase gene NIb were inoculated with TEV (
88). Since the viral NIb gene was complemented in
trans, selection on this gene was probably absent or weak (
α ≈ 1). Samples from 20 plants were taken at different time points ranging from 5 to 60 days postinoculation and used for RT-PCR, cloning, and sequencing. In total, 42 mutations (36 substitutions and 6 indels) were identified in 472 NIb clones (
L = 1,536). Since the viral genomic RNA is translated as a polyprotein, indels that modify the reading frame or nonsense mutations in the NIb gene prevent the correct expression of downstream genes (here, the capsid gene). As a first approach, we can focus on these presumably lethal mutations. Of the 36 substitutions, two produced premature stop codons. The number of possible such mutations in the NIb gene is
Ts = 251. Hence,
μs/n/c = 2/251/472 = 1.7 × 10
−5. For indels,
μi/n/c = 6/1,536/472 = 8.2 × 10
−6, and thus
δ = 0.32. Immediately after a stop codon mutant appears in a cell, it can be replicated, transcribed, and packaged normally by the nonmutant proteins present in the cell, but the mutant should be unable to initiate a second infection cycle. Hence, the estimate is in per cell infection units. However, suppression of stop codons or complementation between viruses at a high MOI could allow a subset of mutants to complete several infection cycles, leading to an overestimation of the mutation rate. RT-PCR errors constitute another source of overestimation. As an alternative approach, we can focus on presumably neutral mutations, which are all except nonsense mutations and indels because NIb was
trans complemented (
Ts = 1,536 × 3 − 251 = 4,357). The viral yield per cell was
B = 1,555 as determined
in vitro using transfected protoplasts, and it was estimated that
c = 3.16 per day; hence,
c varied from 16 to 190 (5 to 60 days). According to a regression analysis of the number of mutations on the number of cell infection cycles done in the original publication,
μs/n/c = 4.8 × 10
−6. The latter value is used. Taking into account that the first approach was expected to produce an overestimation, the two estimates are reasonably consistent.
(vii) Murine hepatitis virus.
Viruses were recovered from a cDNA clone by transfection, seeded into fresh cells, passaged once in standard liquid culture at an MOI of approximately 0.01, plaque purified, and passaged twice plaque to plaque (
34). Six plaques were picked, amplified by infecting liquid cultures, and used for direct sequencing (i.e., without molecular cloning). It was estimated that one infection cycle was equivalent to 8 h of growth and, based on this, that the total number of cell infection cycles was
c = 13. For the wild-type virus, three mutations were found after sequencing 120,978 nucleotides in total. Hence, using equation
5, we obtain min[
μs/n/c ] = 3/120,978/13 = 1.9 × 10
−6, whereas no max[
μs/n/c ] can be obtained because mutation sampling was selective. To provide a more accurate estimate, we can use the selection correction method. Plaque-to-plaque passages constituted approximately two-thirds of the total passage time (
c1 = 13 × 2/3 = 8.7), although the exact fraction was not provided. Selection is typically relaxed under this passage regimen, and assuming that all mutations except lethal ones accumulated neutrally, we have
μs/n/c = min[
μs/n/c ]/(1 −
pL ). This defines a correction factor
α1 = 1 −
pL for this phase. For the standard liquid culture phase (
c2 = 4.3 cycles), the correction factor with selective sampling assuming that
B = 600 to 700 (
42),
pL = 0.3, and E(
sv ) = 0.12 is
α2 ≈ 0.26. Using the weighted average to combine
α1 and
α2 , we obtain
α = (0.7 × 8.7 + 0.26 × 4.3)/(8.7 + 4.3) = 0.55. Therefore, the corrected estimate of the mutation rate is
μs/n/c = 1.9 × 10
−6/ 0.55 = 3.5 × 10
−6. Notice, however, that our parameterization of the distribution of mutational fitness effects was based on viruses with genome sizes smaller than those of coronaviruses and thus might not be appropriate here. Also, there is some uncertainty in the number of cell infection cycles elapsed. For these reasons, the estimate should be taken with caution.
(viii) Vesicular stomatitis virus.
(a) The frequency of resistance to a monoclonal antibody was measured by plating clonal viral pools or viruses resuspended from plaques in the presence of the antibody (
43). Mutations were assumed to be neutral. Virus titers averaged 4.2 × 10
11 PFU/ml and 3.7 × 10
7 PFU/ml for clonal pools and resuspended plaques, respectively (
27). Plating 0.1 ml of these stocks yielded
f = 1.7 × 10
−4 and
f = 2.3 × 10
−4, respectively. Sequencing showed that there were two possible G → A transitions conferring resistance (
T =
Ts = 2). Under conditions that restrict viral diffusion,
B = 166 for this cell type (
14), and
B = 1,250 in liquid medium under standard conditions (
36). Hence, the formation of a plaque would require approximately log (3.7 × 10
7 × 0.1)/log 166 = 3.0 cell infection cycles, and an additional log (4.2 × 10
11 × 0.1)/log 1,250 = 3.4 cycles would have taken place in clonal pools. Accordingly,
μs/n/c = 3 × 2.3 × 10
−4/3/2 = 1.2 × 10
−4 and
μs/n/c = 3 × 1.7 × 10
−4/(3 + 3.4)/2 = 4.0 × 10
−5 for the resuspended plaques and the clonal pools, respectively, giving a geometric mean of
μs/n/c = 6.9 × 10
−5. Since the only mutations scored were transitions, which are more likely than transversions, and since neutrality was not guaranteed, this value might be an overestimation.
(b) The average monoclonal antibody resistance frequency obtained from many small cultures undergoing one cell infection cycle or fewer was determined (
36), giving
f/c = 3.5 × 10
−5. It was assumed that
T =
Ts = 6 from references cited in reference
36. These substitutions were probably close to neutral (
α ≈ 1), although this was not directly shown. Under this assumption,
μs/n/c = 3 × 3.5 × 10
−5/6 = 1.8 × 10
−5. The data from this experiment can also be used to estimate the mutation rate per strand copying using the fluctuation test null-class method (see below).
(ix) Influenza virus A.
(a) A single viral plaque was isolated and replated to isolate new plaques (
70). The consensus sequence of gene NS (
L = 849,
Ts = 849 × 3 = 2,547) was obtained for the parental and derived plaques after two amplification passages by direct sequencing of the purified RNA: 3
fs/Ts = 7.6 × 10
−5, and
c = 5. Hence, from equation
5, min[
μs/n/c ] = 7.6 × 10
−5/5 = 1.5 × 10
−5, whereas no max[
μs/n/c ] can be obtained because mutation sampling was selective. The estimated selection correction factor using the exponential plus lethal class model with E(
sv ) = 0.12,
pL = 0.3,
c = 5, selective sampling, and
B ≈ 50 as estimated in another work (
68) is
α = 0.33. Thus,
μs/n/c = 1.5 × 10
−5/ 0.33 = 4.5 × 10
−5.
(b) The same method as in the study described above (
70) was used, giving 3
fs/Ts = 1.4 × 10
−5 and
c = 7 (48 h postinoculation with a generation time of 7 h, as estimated from one-step growth curves; note that the estimated burst size is
B = 50 as shown in Fig.
2 of the original publication) (
68). Hence, min[
μs/n/c ] = 2.0 × 10
−6. The estimated selection correction factor using the exponential plus lethal class model with E(
sv ) = 0.12,
pL = 0.3,
c = 7, selective sampling, and
B = 50 is
α = 0.28. Thus,
μs/n/c = 2.0 × 10
−6/0.28 = 7.1 × 10
−6.
(c) Single plaques were isolated after 3 days of growth in cell cultures and used to infect the allantoic cavities of chicken eggs (
84). Viruses were harvested after 2 days and plated in the presence and absence of amantadine to score resistant viruses. From 10 independent experiments, the average
f values were 4.2 × 10
−4 and 1.8 × 10
−3 for H1N1 and H2N2 genotypes, respectively. Amantadine resistance was conferred by four different nucleotide substitutions (
T =
Ts = 4), which were probably neutral (
α ≈ 1). In the original publication, the mutation rate per strand copying was estimated by assuming binary replication, but this assumption does not seem to be justified. According to other authors (
68) the virus completes a cell infection cycle in ca. 7 h. Thus, after 5 days of growth,
c = 17. Thus,
μs/n/c = 3 × 4.2 × 10
−4/17/4 = 1.9 × 10
−5 for H1N1 and
μs/n/c = 7.9 × 10
−5 for H2N2, with the geometric mean being 3.9 × 10
−5.
(xiii) Spleen necrosis virus (SNV).
(a) A retroviral vector containing the
lacZ α-complementation gene region as a neutral mutational target was used to score null mutations appearing during a single infection cycle (
71). Out of 16,867 clones, 37 carried null mutations in the
lacZ α-complementation gene region based on the white/blue assay of transformed
Escherichia coli colonies. Sequencing showed that 11 were nucleotide substitutions (including two nonsense mutations), 24 were indels (5 frameshifts), and 2 were 15-base G → A hypermutations. The coding region the
lacZ region is 258 bases long (280 bases including the promoter region), but the mutational target is smaller because many mutations will not lead to the null phenotype. In a previous study, it was determined that
Ts = 219 (
3). Hence, for substitutions not caused by host-mediated hypermutation,
μs/n/c = 3 × 11/16,867/219 = 8.9 × 10
−6. Alternatively, we used the method based on scoring stop codons. Since there are 20 possible nonsense substitutions in the
lacZ α-complementation sequence and all should lead to the null phenotype, the mutation rate is
μs/n/c = 3 × 2/16,867/20 = 1.8 × 10
−5. We used the latter value. Considering that all G → A hypermutations should lead to loss of function and including the promoter in the mutational target (
Ts = 280), the G → A mutation rate due to host-mediated hypermutation is 2 × 15/16,867/280 = 6.3 × 10
−6. Hence, the total mutation rate to substitutions is
μs/n/c = 6.3 × 10
−6 + 1.8 × 10
−5 = 2.4 × 10
−5. For indels, it was determined that
Ti = 150 for frameshifts and
Ti = 280 for the other indels. Hence,
μi/n/c = 5/16,867/150 + 19/16,867/280 = 6.0 × 10
−6. The indel ratio is
δ = 0.25.
(b) A retroviral vector containing a
neo gene with an amber codon (a neutral mutational target) and the
hygro gene and was used to score mutations appearing during a single cell infection cycle by selecting clones with G418 resistance (cells containing proviruses revertant to the functional
neo gene) and hygromycin resistance (all provirus-containing cells) (
24). The amber reversion frequency per cycle was
f/
c = 2.2 × 10
−5. It was shown that 15/17 revertants were to the wild type, whereas the other two were a four-nucleotide insertion and an unidentified mutation. Using reversions to the wild type only,
μs/n/c = 3 × 2.2 × 10
−5 × 15/17 = 5.8 × 10
−5.
(xiv) Murine leukemia virus (MLV).
(a) Using the same method as for SNV estimate b above, the amber reversion frequency per infection cycle was
f/
c = 4.0 × 10
−6 (
89). It was shown that 7/14 revertants were to the wild type, whereas the other seven were of an unidentified nature. Using reversions to the wild type only,
μs/n/c = 3 × 4.0 × 10
−6 × 7/14 = 6.0 × 10
−6.
(b) The viral progeny released by a single transformant colony was used to infect fresh cells at a low MOI, which were plated onto solid medium before the release of new viral progeny (
66). The resulting infected colonies were analyzed by T
1 RNase digestion, covering a target of 1,380 nucleotides (
Ts = 3 × 1,380 = 4,140). Three substitutions were detected and confirmed by sequencing after screening of ca. 151,000 nucleotides in total, giving 3
fs /
c/
Ts = 3 × 3/(3 × 151,000) = 2.0 × 10
−5. However, selection was not completely absent. Since lethal or strongly deleterious mutations were probably missed, this should be considered a lower-limit estimate. The selection correction factor given by the exponential plus lethal class model using
c = 1,
pL = 0.3, E(
sv ) = 0.12, selective sampling, and taking
B = 50 from the literature (
67) is
α = 0.48. Therefore,
μs/n/c = 2.0 × 10
−5/0.48 = 4.2 × 10
−5.
(c) A retroviral vector containing the herpes simplex virus
tk gene (a neutral mutational target) and the
neo gene was used to score mutations appearing during a single cell infection cycle by selecting
tk null mutants with bromouridine and total virus-carrying cells with G418 (
69), giving
f/c = 0.088. According to Drake et al. (
29), of 244
tk− mutants, 114 were gross rearrangements and arose in a mutational target of 2,620 bases. Assuming that all gross rearrangements inactivated the
tk gene, the mutation rate to these changes was 0.088 × 114/244/2,620 = 1.6 × 10
−5 rearrangements/s/c. The remaining 130 changes were small mutations and arose in a target of 1,128 bases. Among 49 small mutants sequenced, 28 were indels. Hence,
μi/n/c = 0.088 × 130/244 × 28/49/1,128 = 2.4 × 10
−5. Among nucleotide substitutions, three were nonsense mutations. Given that
Ts = 76 for nonsense substitutions in this gene,
μs/n/c = 3 × 0.088 × 130/244 × 3/49/76 = 1.1 × 10
−4. The indel fraction is thus
δ = (1.6 × 10
−5 + 2.4 × 10
−5)/ (1.1 × 10
−4 + 1.6 × 10
−5 + 2.4 × 10
−5) = 0.27.
(xvi) Human T-cell leukemia virus type 1.
A retroviral vector containing the
lacZ α-complementation sequence (neutral mutational target) was used to score mutations appearing during a single cell infection cycle (
60). Of 36,561 clones analyzed, 33 carried null mutations in the target (white
E. coli colonies), of which 19 were single-nucleotide substitutions (four nonsense mutations), 1 was a double mutation (referred to as a hypermutation in the original study), and 13 were indels (including seven frameshifts). Assuming
Ts = 219 (see SNV estimate a),
μs/n/c = 3 × 19/36,561/219 = 7.1 × 10
−6. Using nonsense mutations only,
Ts = 20 (see SNV estimate a, and thus
μs/n/c = 3 × 4/36,561/20 = 1.6 × 10
−5; we use the latter). The rate of hypermutation appears to be low, and we did not attempt to calculate it since it was based on a single observation and corresponded to a double mutant, for which
Ts was undetermined (the assumption that all double mutants inactivate the target cannot be made).
Ti = 150 for frameshifts and
Ti = 280 for other indels (see SNV estimate a), and thus
μi/n/c = 7/36,561/150 + 6/36,561/280 = 1.9 × 10
−6 and
δ = 0.10.
(xvii) Human immunodeficiency virus type 1.
(a) A retroviral vector containing the
lacZ α-complementation sequence (neutral mutational target) was used to score mutations appearing during a single cell infection cycle in several related studies (
61,
62,
64). In the first study (
64),
f/c = 70/15,424 (66 mutant clones, with 4 of them carrying two mutations). The mutational spectrum was constituted by 46 nucleotide substitutions (six nonsense mutations [
29]) and 24 indels (17 frameshifts). Given that
Ts = 219 (see SNV estimate a),
μs/n/c = 3 × 46/15,424/219 = 4.1 × 10
−5. Using nonsense mutations only (see SNV estimate a),
Ts = 20 and
μs/n/c = 3 × 6/15,424/20 = 5.8 × 10
−5 (the latter value is used). For indels, using the
Ti values given in SNV estimate a,
μi/n/c = 17/15,424/150 + 7/15,424/280 = 9.0 × 10
−6 and
δ = 0.13. In a second study (
62), the same method was used to score mutations in
vpr null mutants and in
vpr null mutants complemented in
trans by virus producer cells. This showed that
vpr reduces the viral mutation rate by approximately 3-fold. In the presence of a functional vpr protein provided in
trans,
f/c = 0.006. The mutational spectrum was unknown, but assuming that it was similar to the one reported in the previous study (
64), nucleotide substitutions should constitute approximately two-thirds of all the observed mutations. Given that
Ts = 219,
μs/n/c s = 3 × 0.006 × 2/3/219 = 5.5 × 10
−5. In a third study (
61), the same method was used to score mutations in the absence or presence of the antiretroviral drugs zidovudine (AZT) and lamivudine (3TC), as well as in viruses encoding reverse transcriptase variants resistant to these drugs. The average mutation frequency per cycle from three independent experiments for the wild-type virus was
f/c = 0.005 (0.004, 0,005, and 0.006) in the absence of drugs. Sequencing of 40 mutant clones showed that 22 carried nucleotide substitutions (there were three additional G → A hypermutants, but these are not counted here because the numbers of substitutions carried by each hypermutant were not provided), 6 carried frameshifts, and 2 carried other indels. Taking
Ts = 219 for substitutions,
Ti = 150 for frameshifts, and
Ti = 280 for other indels,
μs/n/c = 3 × 0.005 × 22/40/219 = 3.7 × 10
−5,
μi/n/c = 0.005 × 6/20/150 + 0.005 × 2/20/280 = 1.2 × 10
−5, and
δ = 0.22. The geometric mean of the three
μs/n/c values is 4.9 × 10
−5. The average of the two indel fractions is
δ = 0.18.
(b) To score mutations appearing during a single cell infection cycle, pseudotyped viruses were obtained by cotransfecting 293T cells with a viral vector defective for the
env gene and a helper plasmid (
38). HeLa cells were infected with these viruses, selected for antibiotic resistance, cloned, and used for DNA amplification and subcloning using a phage λ library. Sequencing of six nearly full-length viral genomes (9,072 nucleotides on average,
Ts = 27,216) yielded four nucleotide substitutions and no indels. Hence,
μs/n/c = 3 × 4/6/27,216 = 7.3 × 10
−5. In another assay, lymphocytes were infected with the pseudotyped viruses, sorted by fluorescence using flow cytometry, cloned, and used for DNA amplification by long-range PCR and direct sequencing of PCR products. Sequencing of seven large portions of viral genomes (7,791 nucleotides on average,
Ts = 23,373) yielded eight nucleotide substitutions and three indels. Hence,
μs/n/c = 3 × 8/7/23,373 = 1.5 × 10
−4. Taking the geometric mean of the two estimates,
μs/n/c = 1.0 × 10
−4. The average
Ts is 25,295. For indels,
μi/n/c = 3/7/7,791 = 5.5 × 10
−5, and
δ = 0.35. The fraction of stop codon mutations was unusually high (4/12), and no synonymous substitutions were observed, suggesting that cell clones receiving inactive viruses were favored, and selection acting on the virus during the cell infection cycle was not controlled for. Also, the fraction of mutations resulting from transfection was unknown. Finally, the pseudotyped viruses lacked
vpr, which may have an effect on the mutation rate (
62). These factors could have led to a high number of false positives, and thus this estimate should be taken with caution.
(c) A retroviral vector that contained all
cis elements required for replication, regulatory and accessory virus genes, and reporter genes
tk and
hygro but which lacked the
gag,
pol, and
env genes was constructed and used to score mutations appearing during one cell infection cycle by cotransfecting the vector with helper plasmids carrying the missing genes (
47). The
hygro gene confers resistance to hygromycin, whereas the
tk gene confers sensitivity to bromouridine. Hygromycin was used for selecting cells carrying the vector and bromouridine to score null mutations in the
tk gene (996 nucleotides). In total, 349/15,930 clones were mutant, giving
f/
c = 0.022. Sequencing of 43 mutants indicated that 13/43 mutations were indels. Hence
μi/n/c = 0.022 × 13/43/996 = 9.7 × 10
−6. The fraction of nucleotide substitutions that produced the
tk null phenotype was unknown, and it was not indicated which mutations produced stop codons. The number of possible mutations to stop codons in this gene is 76, and, using information from previous studies (
29,
69), it is expected that approximately 1/7 of the observed nucleotide substitutions produced such mutations (see MLV estimate c). Hence
μs/n/c = 3 × 0.022 × 30/43/7/ 76 = 8.7 × 10
−5, and
δ = 0.072. Mutations arising during transfection were not controlled, potentially introducing false positives.
(d) A retroviral vector containing the
gag and
pol genes and two reporter genes,
bsd and
eYFP, but defective for
env, regulatory, and accessory genes was used to score mutations appearing during one cell infection cycle (
51). The
eYFP gene encodes the yellow fluorescent protein and was used to count the total number of cells carrying the vector. 293T cells stably expressing the vector were transfected with a helper plasmid to yield pseudotyped viruses, which were used to transduce fresh cells. The
bsd gene encoded resistance to basticidin but had a premature ochre stop codon such that only cells receiving a revertant virus would be resistant to blasticidin. This gave
f/
c = (2.0 to 4.0) × 10
−6, and sequencing of 16 revertants showed nine single-nucleotide substitutions (to the wild type or three other codons), five apparent G → A hypermutations, and two deletions. Here,
T is unknown for indels and hypermutations, but for single nucleotide substitutions,
Ts = 7. Using the latter and taking
f/
c = 3.0 × 10
−6,
μs/n/c = 3 × 3.0 × 10
−6 × 9/16/7 = 7.3 × 10
−7. Additional assays were carried out with HIV-1 and other retroviruses by directly cotransfecting cells with the vector and the helper plasmid instead of using stable
eYFP producers, but it was shown that transfection was a significant source of mutation and hence these data did not provide a reliable estimate of the mutation rate.
(xix) Bacteriophage φX174.
(a) Approximately 340 independent wells were infected with an average of 2 PFU each and incubated overnight (
77). Lysates were plated onto the selective strain
E. coli gro89, a defective mutant with a mutation of the
rep gene, which encodes a DNA helicase required for particle maturation, to score for phages with the ability to infect this strain. From 12 wells, it was determined that
f = 1.7 × 10
−5. The average final number of PFU per well was 6.6 × 10
7, and
B = 180 as estimated in another study (
20). Thus, using equation
1,
c = log (6.6 × 10
7/2)/log 180 = 3.3. Sequencing of 156 independent mutants showed that
T =
Ts = 12. Since
Ts was determined, there is no selection bias due to lethal mutations. Some bias could exist due to a nonlethal effect. However, since
c was small, this should not produce a large deviation in the estimate (probably less than 2-fold). Neglecting this effect,
μs/n/c = 3 × 1.7 × 10
−5/3.3/12 = 1.3 × 10
−6.
(b) A plaque-purified virus was used to infect 216 independent cultures with an average of 231 PFU each (
12). Cultures were incubated until an average of 3.3 × 10
5 PFU per culture was produced and then plated onto
E. coli gro87, a
rep gene mutant similar to the one used in phage φX174 estimate a. A total of 239 mutants were scored, and thus
f = 239/3.3 × 10
−5/216 = 3.4 × 10
−6. Sequencing of 47 clones showed that
T =
Ts = 7. Taking
B = 180 (
20),
c = log (3.3 × 10
5/231)/log 180 = 1.4. The fact that
Ts was determined implies that there was no selection bias due to lethal mutations. Also, the bias due to nonlethal effects should be small, because
c was close to 1.0. Thus,
μs/n/c = 3 × 3.4 × 10
−6/7/1.4 = 1.0 × 10
−6. Data from this experiment can also be used obtain an estimate of the mutation rate per strand copying free of selection bias (see below).
(xx) Bacteriophage M13.
A single plaque of a recombinant virus carrying the
lacZ α-complementation sequence (258 bases) as a neutral mutational target (
α = 1) was used to inoculate a large
E. coli culture and incubated overnight (
49). Viral DNA was extracted and transfected to score null mutations in the
lacZ sequence (based on the blue/white colony assay). After discarding 11 false positives,
f = 117/199,655, with 67 plaques containing single-nucleotide substitutions and 50 containing indels (11 frameshifts and 39 deletions or rearrangements). In a previous study (
3), it was determined that
Ts = 219. Thus,
c ×
μs/n/c = 3 × 67/199,655/219 = 4.6 × 10
−6. For indels, assuming
Ti = 150 for frameshifts and
Ti = 280 for other indels (see SNV estimate a),
c ×
μi/n/c = 11/150/199,655 + 39/280/199,655 = 1.1 × 10
−6. At the very least,
c = 3 (two cycles during the formation of the plaque and another one during the infection of the liquid culture). According to Drake (
26), the initial and final viral population sizes were 1 and ca. 1.0 × 10
15, respectively. Our own unpublished data suggest that under relatively optimal conditions, the exponential growth rate of the virus is ca 4.0 h
−1 and the duration of the cell infection cycle is ca. 1 h. According to this and assuming exponential growth, an increase in population size by a factor of 1.0 × 10
15 would require approximately
c = 8.6. Taking the average of 3 and 8.6,
c = 5.8. This gives
μs/n/c = 4.6 × 10
−6/ 5.8 = 7.9 × 10
−7,
μi/n/c = 1.1 × 10
−6/5.8 = 1.9 × 10
−7, and
δ = 0.19. The most evident source of error in this estimate is the undetermined
c value. This could lead to a maximal underestimation of 1.9-fold and a maximal overestimation which, although not determined, probably does not exceed 1.5-fold.
(xxiii) Bacteriophage T2.
Mutations at the
rII locus (
L = 3,136) producing rapid plaque growth (phenotype
r) were scored in single bursts (
55). After discarding cases in which the mutant was probably present in the inoculum, 420 mutants were scored in 22,615 bursts (
c = 1), and it was determined that
B = 82. This gives
f/c = 420/22,615/82 = 2.3 × 10
−4. Mutations were probably close to neutral (
α ≈ 1), and deviations from neutrality should not produce a strong bias since
c = 1. The mutational spectrum of this gene was analyzed for the closely related bacteriophage T4 (
26). Fifteen nonsense mutations (all of which should produce the phenotype) and 21 missense mutations were identified in a 435-base region of the locus, and nonsense mutations were expected to represent ca. 0.073 of all random substitutions. Hence, the expected number of substitutions (including those that did not produce the
r phenotype) is 15/0.073 = 206, indicating that 21/206 = 0.102 of missense mutations produced the
r phenotype. In another assay, it was shown that among 121 observed
rII mutants, 27 were single-nucleotide substitutions and 94 were indels.
Ts is not simply 3
L because many mutations were not observable. Since nonsense mutations represented approximately 0.073 of all mutations and 0.102 of missense mutations were observable,
Ts = 3
L × [0.073 + 0.102 × (1 − 0.073)] = 1,576. Thus,
μs/n/c = 3 × 2.3 × 10
−4 × 27/121/1,576 = 9.8 × 10
−8,
μi/n/c = 2.3 × 10
−4 × 94/121/3,136 = 5.7 × 10
−8, and
δ = 0.37.
(vi) Bacteriophage φX174.
(a) A fluctuation test was carried out using the reversion of an amber mutation as the selectable phenotype (
19). In each individual culture, the number of initial PFU was high but the virus underwent a single cell infection cycle. For each of three amber mutants,
P0 = 646/740,
P0 = 679/778, and
P0 = 510/602. The corresponding burst sizes were
B = 167,
B = 28, and
B = 51, and the final numbers of PFU were 9.0 × 10
7, 5.5 × 10
7, and 2.6 × 10
9, respectively. Hence,
N 1 −
N 0 = 9.0 × 10
7/740 − 9.0 × 10
7/740/167 = 1.2 × 10
5 for the first mutant, and analogously,
N 1 −
N 0 = 6.8 × 10
4 and 4.2 × 10
6 for the second and third mutants, respectively. Using the null-class method,
m = 1.1 × 10
−6 s/r,
m = 2.0 × 10
−6 s/r, and
m = 3.9 × 10
−8 s/r, respectively, with geometric mean
ms = 4.5 × 10
−7 s/r. If all amber revertants were to the wild type,
μs/n/r = 3 × 4.5 × 10
−7 = 1.4 × 10
−6. However, there are eight possible single-nucleotide revertants, and if all were viable, the mutation rate would be
μs/n/r = 3 × 4.5 × 10
−7/8 = 1.7 × 10
−7. For this phage, the estimated lethal fraction is
P = 0.2 (
23), and hence the expected number of viable revertants is 8 × 0.8 = 6.4, which gives
μs/n/r = 2.1 × 10
−7.
(b) A plaque-purified virus was used to infect 216 independent cultures with an average of 231 PFU each (
12). The mutation rate was calculated using the null-class method.
m = 2.3 × 10
−6 s/r, and sequencing of 47 clones showed that
Ts = 7. Hence,
μs/n/r = 3 × 2.3 × 10
−6/7 = 1.0 × 10
−6.