Variola virus, the causative agent of smallpox, is estimated to have killed between 300 million and 500 million people in the 20th century and was responsible for widespread mortality and suffering for at least several preceding centuries (
1) (see the supplementary text section of the supplementary materials). Smallpox drove the 1796 development of vaccination by Edward Jenner, who later declared that “it now becomes too manifest to admit of controversy, that the annihilation of the Small-pox, the most dreadful scourge of the human species, must be the final result of this practice” (
2). His prophecy was fulfilled in 1980, when the disease was declared eradicated by the World Health Organization, following a worldwide vaccination campaign. The virus is now exclusively stored in laboratories in the United States and Russia. Despite the eradication of smallpox, there are ongoing concerns regarding the reemergence of a smallpox-like disease via accidental or deliberate reintroduction of variola virus, adaptation of monkeypox virus to humans, or zoonosis or genetic engineering of another orthopoxvirus (
3–
5). Specific information regarding the evolutionary history of variola virus is therefore of substantial interest.
Orthopoxviruses are a genus of the
Poxviridae and have large, linear, double-stranded DNA genomes (
6). They differ in the range of mammalian host species they infect and the severity of the disease they cause: variola virus (VARV) and camelpox virus (CMLV) have a narrow host range and can be highly virulent, whereas taterapox virus (TATV), which infects gerbils, is only known to cause unapparent infections (
7,
8). Cowpox virus (CPXV), monkeypox virus (MPXV), and vaccinia virus (VACV) infect several host species, with varying severity of disease. VACV, whose origin and reservoir host are unknown (
9), generally has low virulence in humans, elicits an immune response that is protective against VARV, and was used as the vaccine during the smallpox eradication campaign (
8). VACV infection of mice is a useful model for studying orthopoxvirus gene function in vivo. Orthopoxvirus genomes are typically between ~186,000 and ~228,000 nucleotides (nt) long (
6). Conserved genes, necessary for virus transcription and replication, are located in the central ~100,000-nt region of the genome and are commonly used in phylogenetic inference (
10). In vitro and in vivo studies show that orthopoxvirus genomes also contain many genes that are important for modulating host innate immunity and determining host range but whose inactivation does not prevent virus replication (
11,
12). Those genes are predominantly located near the genome termini and vary between orthopoxvirus species. The factors governing host range and virulence of orthopoxviruses are complex and not fully understood. For example, some genes that promote VACV virulence are inactivated in the more virulent VARV (
13–
15), whereas in other cases the loss or inactivation of host immune system–modulating genes in VACV can result in increased virulence (
12). In general, orthopoxvirus species with a narrow host range have fewer genes than those with broad host ranges, for example, compare VARV (~162 functional genes) with viruses of the CPXV species (~209 functional genes) (
6). No orthopoxvirus species has a gene exclusive to that species, and all genes present across all orthopoxvirus species are also present in CPXV. It has therefore been proposed that, by a process of gene inactivation and loss, the different orthopoxvirus species arose from a common ancestor containing a full set of CPXV genes (
6).
The timeline of the emergence of smallpox in humans is unclear. Analysis of sequences from a 17th-century Lithuanian mummy (VARV-VD21) (
16), two Czech museum specimens from the 19th and 20th centuries (
17), and VARV from the 20th century sampled during the eradication campaign dates their most recent common ancestor (MRCA) to the 16th or 17th century (
10,
16–
18). However, written records of possible smallpox infections date back to at least 3000 years ago (ya), and mummified remains suggestive of smallpox date to 3570 ya (
1,
19). Ancient virus sequences recovered from archaeological remains provide direct molecular evidence for past infections and can reconcile the discrepancy between the written historical record of possible early infections and the time to the oldest available genetic sequences.
Recovering ancient variola virus DNA
To investigate the evolutionary history of VARV, we screened high-throughput shotgun sequencing data from skeletal and dental remains of 1867 humans living in Eurasia and the Americas between ~31,630 and ~150 ya. Using Kraken (
20) on the microbial, protozoan, and viral genomes from the National Center for Biotechnology Information Reference Sequence (NCBI RefSeq) databases (
21–
23), we identified a total of 26 samples with between 1 and 730 reads assigned to VARV (fig. S1 and table S1). On the basis of the availability of additional sampling material, 13 of those 26 samples were then enriched for viral sequences using a targeted capture approach, as described in (
24). Eleven of the 13 individuals are dated to 603–1050 CE, overlapping the Viking Age (793–1066 CE), and are from northern Europe, western Russia, or the United Kingdom, and two are from the 19th century from western Russia (
Fig. 1A,
Table 1, and supplementary text). All 13 samples showed evidence of VARV infection after capture, with the following characteristics supporting authenticity of the recovered reads: elevated rates of mismatching nucleotides (due to postmortem DNA damage) toward the read termini when aligned against a reference sequence (
25), consistent with their respective ages and sequencing depth (fig. S2); mapped reads distributed evenly across the reference genome (fig. S3); lowest edit distances (the number of nucleotide differences in a read alignment) to VARV (19th-century samples) or to either CMLV or TATV (Viking Age samples) when comparing read mapping against eight diverse orthopoxvirus reference genomes (fig. S4); and 100% orthopoxvirus specificity when mapping reads against the NCBI nucleotide database (table S2 and supplementary text).
In four samples (VK281, VK382, VK388, and VK470; hereafter referred to as higher-coverage samples), alignment against VARV (accession no. NC_001611.1) resulted in a mean coverage depth of 5.01× to 45.19× (
Table 1 and fig. S3) and consensus sequences covering 95.89 to 99.56% of the VARV genome, with the sequence for VK382 being 99.98% complete (
Table 1 and table S3). On the basis of nucleotide sequence identity, the consensus sequences for the four higher-coverage samples were collectively most similar first to each other and then to TATV (table S4). For the remaining seven “lower-coverage” Viking Age samples, read edit distances were lowest against the assembled VK382 consensus sequence, further supporting their authenticity (fig. S5).
Phylogenetic analysis reveals an unknown variola virus clade
Before inferring phylogenetic trees on the basis of the conserved central region of the genomes from the four higher-coverage sequences, we tested whether modern VARVs show evidence of genomic regions that descend from a virus population containing sequences resembling these four higher-coverage ancient VARV (aVARV) sequences and whether those sequences are themselves recombinants. We examined the higher-coverage sequences and seven representative sequences from related modern orthopoxvirus species for evidence of recombination using seven algorithms from the RDP4 toolset (
26). A 125-nt noncoding region of aVARV-VK470 was flagged as possibly attributable to recombination with an unknown sequence, but the evidence of recombination was uncertain. No other recombination event was suggested in the higher-coverage aVARV sequences, and no modern VARV sequence in the selection was suggested to be a recombinant derived from any aVARV sequence (Materials and methods and table S5).
To establish the phylogenetic placement of the four sequences from the higher-coverage samples, we inferred a maximum likelihood (ML) tree including 84 sequences chosen to represent the full orthopoxvirus diversity. In the resulting tree (
Fig. 1B and fig. S6), the sequences form a now-extinct monophyletic clade in sister relationship with the clade consisting of all modern VARV and VARV-VD21 (collectively referred to as mVARV). We then placed partial consensus sequences of the lower-coverage samples onto the inferred tree using the evolutionary placement algorithm EPA-ng (
27). Four of the seven Viking Age samples (VK108, VK168, VK515, and VK533) are confidently placed within the aVARV clade (
Fig. 1B and figs. S7 to S12). Their placement position suggests a common ancestor with the more recent VK281 and VK470 lineages, supporting the existence of distinct Viking Age lineages (fig. S8). The placements of the other samples show some uncertainty but are consistent with their respective sample ages: the Viking Age samples (VK138, VK255, and VK443) in the vicinity of the aVARV clade, and the 19th-century samples (FIN1 and KHA1) within the mVARV clade. The placement of the lower-coverage sequences provides further evidence of their authenticity and independently confirms the presence of a separate aVARV clade, as already shown by the higher-coverage samples alone.
The existence of a temporal signal in the sequence data is a precondition for inferring dated phylogenetic trees (
28). A linear regression of sample ages and root-to-tip distances from an ML tree that includes sequences from the four higher-coverage samples and mVARV (fig. S13) shows clear evidence of clocklike evolution (fig. S14), in agreement with earlier results (
16,
18). Consequently, we proceeded to infer dated coalescent trees using BEAST2 (
29) under six different evolutionary models (table S6). With the best-fitting model, the MRCA of the combined mVARV and aVARV sequences is dated to 1.7 thousand years ago (kya) [95% highest priority density interval (HPD95): 2.2 to 1.4 kya], and the MRCA of the aVARV clade is dated to 1.5 kya (HPD95: 1.7 to 1.4 kya). The MRCA of the mVARV clade is dated to 0.44 kya (HPD95: 0.6 to 0.35 kya), and to 0.31 kya (HPD95: 0.43 to 0.21 kya) for the mVARV clade without VARV-VD21, both of which are in line with previous molecular dating analyses (
10,
16–
18) (table S7 and fig. S15).
The ancient variola viruses have a diverse pattern of gene inactivation
Many genes that are active in other orthopoxviruses are inactivated (truncated, fragmented, or completely absent) in mVARV, possibly owing to host adaptation (
6,
13). We therefore compared and contrasted the gene status of sequences from the four higher-coverage aVARV samples, mVARV, plus CMLV and TATV (the orthopoxviruses most closely related to aVARV) (
Fig. 2, fig. S16, table S8, and supplementary text). Gene status analysis allows a detailed comparison of interclade and intraclade variation at the level of possible phenotypic differences. Gene differences are of interest individually and in combination, and the overall pattern of differences in gene status complements the whole-genome diversity summarized by phylogenetic analysis.
Interclade differences in gene status can be logically organized into four groups (fig. S17): group A, inactive in mVARV and aVARV clades; group B, inactive in mVARV but active in at least one aVARV sequence; group C, active in mVARV but inactive in at least one aVARV sequence; and group D, active in both clades.
Figure 2 shows details of groups A to C, highlighting gene differences between clades.
In group A (genes inactive in both clades), comparing the exact position of gene-inactivating mutations allows us to infer the likely status of the genes in the common ancestors of mVARV and aVARV. Five genes in group A [
B23R;B24R/C17L;C18L, 26765,
A25L,
A37R, and
B16R; nonitalic numeric values correspond to unnamed genes with offsets as given in (
6)] have identical inactivating mutations in both clades (
Fig. 2 and supplementary text). Identical inactivating mutations among orthopoxviruses not in sister relationship are rare: The 17 representative orthopoxviruse sequences included in our gene-loss analysis (Materials and methods) have 53 genes with gene-inactivating mutations in viruses not in sister relationship, but only two of these have the mutation in the same location. This suggests that the five genes listed above were already inactivated in the common ancestor of mVARV and aVARV (
Fig. 1B). An additional 10 genes (20088,
A39R,
A44L,
A53R,
B2R;B3R, 215231,
C9L,
A9L,
A52R, and 202711) are inactivated in both clades, although in different ways, and are thus projected to be active in the ancestor of mVARV and aVARV (fig. 2). This suggests parallel virus evolution after the divergence of the clades ~1.7 kya (fig. S15).
A gene inactivated in both mVARV and aVARV may have induced a similar outcome after infection of humans. The VACV gene encoding a soluble interleukin-1β (IL-1β) receptor is inactive in VACV strain Copenhagen (VACV-Cop) (gene
B16R) but functional in VACV strain Western Reserve (VACV-WR; where it is denoted
B15R) (
30). Infection of mice with either VACV-Cop or VACV-WR in which
B16R was inactivated resulted in a febrile response (
15). Conversely, infection with either virus expressing a functional IL-1β receptor prevented the induction of fever (
15). Human infections with VARV, which all have a disrupted
B16R gene, are accompanied by high fever (
8). The identical inactivation of
B16R in the aVARV clade may suggest that the symptoms of the disease caused by those viruses also included high fever and that the inactivation occurred before the divergence of the clades.
Group B (genes inactivated in mVARV but present or with undetermined status in some or all of the aVARV sequences) contains 14 genes. Eight of these (
B7R,
CrmE,
C2L,
K1L,
F3L,
A35R,
A40R, and
A55R) encode known virulence factors or immunomodulators in VACV, three of which (
C2L,
F3L, and
A55R) are members of the kelch-like family (supplementary text) (
12). Deletion of the kelch-like proteins
C2L or
A55R in VACV-WR leads to increased lesion size in intradermally infected mice (
31,
32), while the virulence of VACV lacking
F3L is reduced in the same model (
33). Deletion of kelch-like proteins has also been proposed to reduce the host range and virulence of CPXV-Gri in vitro (
34). Also in contrast with mVARV, aVARV-VK382 and aVARV-VK388 encode a predicted functional guanylate kinase (
A57R). This gene is otherwise only active in CPXV and appears to have been lost independently in CMLV, TATV, mVARV, aVARV-VK281, and aVARV-VK470 (supplementary text), suggesting that the gene was functional in their common ancestors.
Group C (genes inactivated in at least one aVARV but not in any mVARV sequence) contains seven genes (
CrmB,
C1L,
E5R,
F10L,
I8R,
B20R, and 210863, although only the first three have unequivocal gene-inactivating mutations). The effect of the inactivation of
C1L and
E5R in VACV is unknown. CrmB is a tumor necrosis factor receptor homolog and chemokine binding protein (
35). Interestingly, mVARVs have a functional CrmB, whereas it is identically inactivated in aVARV-VK281 and -VK470, and the start codon in aVARV-VK388 is replaced by a valine. However, unlike in mVARV, CrmE, a tumor necrosis factor receptor homolog, and the chemokine binding protein B7R (
35) are functional in aVARV-VK281, -VK470, and -VK388 and together may provide the same functionality as CrmB. The older aVARV-VK382 has functional versions of
CrmB,
CrmE, and
B7R. The seven group C genes, already inactivated in the aVARV clade, show that these viruses had taken a different genetic (although perhaps not functional) evolutionary path than that of the mVARV sequences, in which none of these genes were inactivated during an additional ~1000 years of evolution.
The gene status analysis and molecular clock dating of the internal nodes allow us to investigate the temporal trend of gene inactivation within the combined mVARV and aVARV clades. We observe an inactivation of genes over time (
Fig. 3), supporting suggestions that orthopoxvirus species may derive from a common ancestor containing all genes found in orthopoxviruses today (
6). A simple linear model of gene loss suggests that an ancestor of aVARV and mVARV with a gene content similar to CPXV would have existed ~4 kya (
Fig. 3). But any such model will necessarily involve unsupported and currently untestable assumptions, some of which will almost certainly be invalid, so caution is warranted (supplementary text). Loss of host-range genes has been correlated with an increase in host specificity (
11), suggesting that the aVARVs and the common ancestor of mVARV and aVARV may have had a broader host range than mVARV. However, host range in poxviruses is likely to be a multigenic trait that depends on the cooperation of several genes, as is the case in myxoma virus (
36) and VACV (
37–
41). The higher number of active genes thus does not necessarily imply that the aVARVs had a host tropism for animals other than humans, although this is a possibility.
Within the aVARV clade, we observe a diverse pattern of gene inactivation (
Figs. 2 and
3). Eleven genes are unequivocally inactivated in some, but not all, of the sequences from the four higher-coverage samples (in group B: 22682,
C2L,
K1L,
F3L,
A35R,
A40R, 168145, and
A57R; in group C:
CrmB,
C1L, and
E5R). An additional four genes are inactivated in all four viruses, but with differing gene-inactivating mutations (in group A:
A39R, 193435
*,
C13L;C14L, and 202711), suggesting that these genes were lost independently (supplementary text). By comparison, among 50 available mVARV sequences, the median number of pairwise differences in gene activation status is zero (mean: 0.6). On the basis of gene status, aVARV-VK281 and aVARV-VK470 can be grouped together, as can aVARV-VK382 and aVARV-VK388, differing unequivocally in status in one (
A35R) and five (
A35R,
A40R,
CrmB,
C1L, and
E5R) genes, respectively (fig. S18), in agreement with the grouping based on their overall gene inactivation counts (
Fig. 3), sample ages (
Table 1 and
Fig. 3), and phylogenetic positions (
Fig. 1B and figs. S13 to S15). Our data suggest that VARV followed at least two independent trajectories of gene inactivation during its evolution: one leading to aVARV-VK281 and aVARV-VK470 and one to mVARV (
Fig. 3). The geographical and temporal spread of the viruses in the aVARV clade (
Fig. 1A and figs. S8 and S15) indicates that variola viruses with a gene composition pattern different from that of mVARV existed over a period of at least ~450 years and circulated widely among humans during the Viking Age.
Historical perspective
We found evidence of VARV in 11 of 525 individuals (~2%) predating and overlapping the Viking Age. If aVARV, like mVARV, did not lead to persistent infections (
8) and if no viral particles remained in teeth or bone after an infection, these individuals all died during an acute infection. The 11 samples with reads matching VARV are likely not the only infections among those 525 individuals because: (i) The presence of viral DNA in ancient teeth and bones is certainly an imperfect diagnostic test of viral infection, with an elevated false negative rate due to differences in DNA preservation, source tissue, or sequencing depth in the screening; (ii) VARV is not thought to be viremic for the entire duration of the illness (
8); (iii) On the basis of mVARV case fatality rates (up to ~30%), many individuals survived infection; (iv) Five Viking Age samples with reads assigned to VARV by Kraken were not available for capture, because of a lack of sample material. Moreover, the 525 individuals are drawn from an ~450-year period, during which the overall population of Scandinavia rose from ~750 thousand to ~1 million (
42). If the overall population increased linearly during this period, and life expectancy at birth was ~30 years throughout, a total of ~13.1 million people would have lived in Scandinavia during this period, so a sample of 525 individuals is only ~0.004% of the estimated total population. Given this considerable uncertainty, it would not be prudent to use the 2% detection rate to estimate smallpox prevalence or case fatality rates during the Viking Age. The likely inaccurate (low) detection rate and the very small sample size argue against regarding the figure as a prevalence. Even if all other sources of uncertainty were resolved, it would still not be clear how the figure relates to case fatality rates during that period, because we cannot be sure that the individuals died as a result of their infections.
Hypotheses regarding the earliest history of VARV have been derived exclusively from often ambiguous historical accounts and from the visual examination of mummies dating from as early as 3570 ya (
1,
19). The dates of the earliest likely presence of smallpox in specific regions are difficult to determine from written records and have in some cases been a matter of dispute (
1,
8,
19,
43). Ancient DNA sequences can resolve such questions by providing clear evidence of VARV presence in time and space. The dating of the aVARV samples, from as early as 603 CE, matches that of multiple written accounts of likely smallpox infections in southern and western Europe from the late 6th century onward (
1,
8,
19,
44,
45). Our finding of the virus in northern Europe at these times disproves various suggestions of first introductions involving later dates. For example, the introduction (or later establishment) of smallpox into Europe via returning Crusaders (
8,
9) or by way of Spain during the Moorish invasion of 710 CE (
9), or into England by Normans in 1241–1242 CE (
46), as well as the claim that the virus had not reached northern Europe by 1000 CE (
8). We did not find evidence for infections in samples from earlier time periods, despite screening 619 individuals predating the Viking Age (fig. S1). Together, the written record and the confirmed aVARV infections suggest a pan-European presence of the virus by the end of the Viking Age at the latest, as suggested for parts of southern and western Europe based purely on written records (
1,
19,
44).
The Viking Age sequences reported here push the definitive date of the earliest VARV infection in humans back by ~1000 years and reveal the existence of a previously unknown, now-extinct virus clade. The ancient viruses were following a genotypic evolutionary path that differs from modern VARV. This is highlighted by at least three genes inactivated in some aVARV sequences but still active in the mVARV clade, and by 10 genes inactivated in both clades but with gene-inactivating mutations that differ between clades. The ancient viral genomes show reduction of gene content during the evolution of VARV and that multiple combinations of gene inactivations have led to viruses capable of circulating widely within the human population.
Materials and methods
Software package versions
The following software package versions were used, with default arguments unless otherwise noted. BEAST2 2.4.7 (
29), PastML 1.9.20 (
47), BLASTn 2.2.29+ (
48), bModelTest 1.0.4 (
49), Bowtie2 2.3.5.1 (
50), Picard 2.20.2 (
51), BWA 0.7.15-r1140 (
52), DIAMOND 0.9.23 (
53), EPA-ng 0.3.6 (
27), Gappa 0.5.0 (
54), gargammel (released without a version number, October 2017) (
55), Geneious 9 (
56), ggtree 1.16.6 (
57), Kraken 1.0 (
20), MAFFT 7.419 (
58), mapDamage 2.0.8-2 (
59), Mosdepth 0.2.6 (
60), R 3.6.0 (
61), RAxML-NG 0.7.0 BETA (
62), RDP4 4.9.5 (
26), SAMtools 1.9 (
63), SciPy 1.4.1 (
64), TempEst 1.5 (
65), Tracer 1.6.0 (
66), and TreeAnnotator 2.2.4 prerelease (
29).
Screening and sample preparation
Using Kraken on a database of all microbial, protozoan, and viral genomes in the NCBI RefSeq databases (
21–
23), we analyzed whole-genome shotgun sequencing data from 1867 individuals for the presence of reads that best matched VARV. We identified a total of 26 samples with between 1 and 730 reads assigned to VARV (fig. S1) as candidates for enrichment of viral target sequences in library by capture. Of these, sufficient sample materials were available for capture processing for 13 samples. Uniquely dual-indexed sequencing libraries (2 to 6 libraries per sample) were prepared from 1 to 3 ancient DNA (aDNA) extracts per sample. Libraries were pooled into capture reactions containing 1183 to 1880 ng per reaction. Hybridization and sequencing on HiSeq4000 were performed as described previously (
24). An additional round of sequencing was performed on pools of enriched libraries, using the Illumina NovaSeq6000 (S4 flowcell, XP workflow, 2 × 100 cycles). Read length and insert size distributions are shown in fig. S19.
Authenticity
The following points argue for the authenticity of the recovered DNA sequences.
1) Standard precautions (
67) and recommended methodologies (
68) for working with ancient DNA were applied.
2) Clear damage patterns characteristic of DNA sequences recovered from archaeological remains are present in all samples with sufficient viral reads available. Note that the absence of a damage pattern in a sample with few reads does not (by itself) indicate that the sample reads are not ancient. The human reads all show clear damage patterns in all samples, showing that the nucleic acids present in the samples are indeed ancient.
3) The ancient samples included in the study have reads matching across the entire TATV reference genome (fig. S3). This is in stark contrast to false positives that regularly occur with human DNA sequences in which there are matches of only one or two poxvirus proteins out of ~160 to 220, in which cases the reads in fact are better matches for similar-but-unrelated sequences, e.g., human mRNA.
4) Reads matching TATV were matched against the entire NCBI nucleotide database using BLASTn. Matches were found in the nucleotide database for 118,049 of 118,118 reads (99.94%), all of which had a poxvirus as the best match, ignoring matches against artificial constructs (table S2). A small number of reads with no BLASTn match in the nucleotide database is expected (see Read poxvirus specificity section).
5) The edit distances of the reads are fully consistent with the placement of (full or partial) consensus sequences in phylogenetic trees. Reads from all nine aVARV clade samples very clearly have lowest edit distances against TATV and CMLV, the intermediate samples (VK138 and VK255) have read edit distances that are closest to VARV, TATV, or CMLV, and the two modern samples (FIN1 and KHA1) have read edit distances very clearly closest to VARV. As would also be expected, edit distances against our aVARV-VK382 sequence are the lowest of all for the Viking Age samples.
6) The fact that the almost-complete genome consensus sequences from the four higher-coverage ancient samples form a self-consistent separate basal clade indicates that these sequences are very unlikely to be due to four independent modern contaminations. If, for example, 20th-century smallpox sampling entirely missed a still-circulating clade that these samples actually came from, those lineages would need to have evolved with a far lower substitution rate than other variola viruses because of the relatively very short branch lengths leading to the ancient sequences in the ML tree (fig. S6).
7) The DNA sequenced in these samples was drilled out of the interior of well-preserved teeth (11 samples) and petrous bone (two samples). Pox virus DNA was therefore very likely circulating in some form in the bloodstream of the individual at the moment of death. Subsequent contamination of any form of virus or viral DNA before aDNA extraction in our dedicated clean laboratory is unlikely.
8) The EPA-ng placements from the lower-coverage samples provide important information via the phylogenetic placement of the (incomplete) consensus sequences from those samples. The fact that all ancient samples are placed consistently (according to sample ages) within or basal to the aVARV or mVARV clades is further evidence of the overall authenticity and consistency of the reads. We also show that reads simulated in silico from a different orthopoxvirus genome that was not included in the tree are consistently placed on their expected branch, not with the aVARV or mVARV clades (fig. S12).
9) Present-day analysis of purported VARV reads has the advantage that contamination by a modern variola virus can be definitively ruled out, given the nature of the associated disease. No member of the analysis team has any poxvirus infection and the facilities where the sample extraction, library preparation, and sequencing was done is not a virology laboratory (rather it is focused on ancient nonmicrobial host genome recovery) and has never been involved in processing modern viral material. The paper author (L.V.) who carried out all the targeted laboratory work received a routine VARV vaccination before 1980.
Read poxvirus specificity
To check that sample reads found to match poxvirus sequences were not in fact better matches for other sequences, reads matching TATV (GenBank accession no. NC_008291.1) using Bowtie2 end-to-end matching and with mapping quality (MAPQ) ≥ 30 were checked for specificity by matching them against the NCBI nucleotide database (downloaded during the week of 13 January 2020) using BLASTn (-task blastn -evalue 0.01). The best-matching database sequence for each read was checked to see if it corresponded to a poxvirus. Matches against bacterial artificial chromosomes and synthetic constructs [matching the case-insensitive regular expression (BAC cloning vector|synthetic construct)] were ignored. Table S2 shows that, apart from 69 reads (from a total of 118,118) that did not match anything in the nucleotide database, all reads had a poxvirus as their best match, as determined by the nucleotide database subject title matching the case-insensitive regular expression (pox|ectromelia|variola|akhmeta|abatino|vaccinia).
The end-to-end option to Bowtie2 and the MAPQ ≥ 30 filtering were applied to examine only reads that mapped in full and with a high confidence regarding their genome match position. This is necessary because orthopox viruses contain repetitive low-complexity regions near their termini, and such sequences often match many hundreds of host chromosome sequences from a diverse range of unrelated organisms. These matches sometimes exceed the default 500 matches returned by BLASTn, meaning no orthopox match may be reported at all, despite a perfect match with an orthopox genome. The stricter matching (in particular, only considering reads matching with MAPQ ≥ 30) eliminates low-complexity viral reads with these ambiguous placements from consideration.
It is expected that some orthopox reads matched by Bowtie2 will not match anything with BLASTn. This is due to search algorithm variations in seeding, match scoring values and thresholds, and reporting thresholds. For example, despite nucleotide identity exceeding 90%, BLASTn (-task blastn) will not match a sequence against a copy of itself with every 11th nucleotide mutated, because a perfectly matching seed region of 11 nucleotides is required before a region is further considered as a potential match. A different search algorithm will identify such matches. These algorithmic idiosyncrasies are the reason we also examined read matches found by DIAMOND and BWA as part of the consensus and gene-level analysis, with both analyses subject to manual visual inspection of aligned reads.
Investigation of gene-inactivating mutations
To assemble the reads used for the generation of consensus sequences and the investigation of gene-inactivating mutations, reads from the capture were mapped against orthopoxvirus reference genomes using DIAMOND, matching against version 13.0 of the RVDB protein database (
69), and BWA [using both the mem and aln (with and without the −l option set to 1000) algorithms, with other arguments left at their defaults], matching against a collection of orthopox genomes from NCBI (GenBank accession nos. AY009089.1, AF438165.1, AF482758.2, X94355.2, AF012825.2, AF380138.1, DQ437594.1, AM501482.1, M35027.1, AY243312.1, L22579.1, X69198.1, Y16780.1, and KY358055.1). Reads from all mapping algorithms were combined and de-duplicated using Picard after mapping to a TATV genome (GenBank accession no. NC_008291.1) using Bowtie2.
To assess gene content of the higher-coverage aVARV samples in the absence of a reference genome, we investigated the presence of gene-inactivating mutations in the ancient samples by comparison of alignments made to 17 orthopoxviruses. Information about the offset locations of 219 homologous genes in 17 orthopoxvirus reference genomes (dataset 3; see supplementary text) is presented in supplementary table 1 of (
6). These offsets were used to extract the sequences of the homologous genes in the 17 reference sequences. Excluding cases where a gene is absent in a reference sequence, we extracted 219 × 17 − 301 = 3422 gene sequences. Supplementary table 1 in (
6) was also used to acquire information on status (presence, absence, fragmentation, and truncation) of genes in the 17 reference sequences. We aligned reads independently at the gene level with sequences from this broad set of references to make it possible to correctly deal with cases where, for example, two genes from an ancient sequence matched one gene best in one modern reference (e.g., CMLV) and another gene best in a different modern reference (e.g., TATV), or where a gene is present in the ancient virus but completely absent in the modern reference. Aligning against just one of the modern full-genome references would necessarily risk potential error. We considered this the only defensible approach, given the fact that the gene composition of the ancient sequences was unknown.
For each gene, all reads from an ancient sample were aligned to one or more of the 17 reference gene sequences using Geneious, medium sensitivity, fast parameters, and then visually inspected for the presence of gene-inactivating mutations. The reference(s) to align against were chosen in the following way: Reads for each ancient sample were matched against each reference gene sequence using BLASTn, and a consensus sequence for each gene was generated using BWA (aln algorithm) and SAMtools. The reference gene sequence with >50% coverage of the gene sequence and the highest median bit score was chosen as the reference. If this was inconclusive, additional references were used, usually one of CPXV-Gri, -Ger, or -BR. For each ancient sample, gene status, coverage, and reference sequence are given in table S8. Gene-inactivating mutations in the ancient sequences were classified as either “certain,” “uncertain,” or “uncertain if functional” according to the following criteria.
A “certain” gene-inactivating mutation: There is a mutation that leads to a stop codon (insertion, deletion, or point mutation) that is covered by at least three reads, and the reference against which the ancient reads are aligned is also truncated or fragmented in an identical fashion. An “uncertain” gene-inactivating mutation: There is a mutation that leads to a stop codon that is covered by less than three reads.
An “uncertain if functional” gene: A gene of intermediate length—shorter than a gene that is functional in some references but substantially longer than inactivated versions of the same gene in others (see gene A9L).
The ancient gene status was further confirmed by mapping the reads, consensus sequences resulting from the Geneious alignments against the gene reference(s), and the contigs from a de novo assembly of the reads (also performed in Geneious, default parameters) against the CPXV-Ger reference genome (GenBank accession no. DQ437593.1) and inspecting the resulting consensus for the expected genes. The status of each gene in the resulting consensus sequence agreed with the analysis performed above.
Ten genes [1393 and 222274 (C23L/B29R), 2286 and 221381 (CrmB), 3420 and 220247 (C19;C20L/B25/B27R), 5392 and 218275 (C17L;C18L/B23R;B24R), 7652 and 216015 (C16L/B22R)] in the orthopoxvirus genome are occasionally diploid. While both versions of these genes are present in CPXV-Ger, -Gri, and -BR, only one version of the gene is present in VARV. Given low genome coverage and short DNA template fragments, it is not clear how to determine whether these genes are present at the 3′ or 5′ (or both) ends of the aVARV genomes. The (potentially) two copies of the gene were thus treated as one in the analyses described above.
To infer the gene status at internal nodes of the phylogeny of mVARV, aVARV, CMLV, and TATV, we used the DELTRAN algorithm implemented in PastML, to perform a maximum parsimony ancestral state reconstruction. Gene-inactivating mutations were determined to be identical in two tips if the same mutation leading to an inactivation of the gene was present, whether or not a different upstream mutation may have previously ablated the gene in one of them.
We compared differences in gene activation status using a larger set of modern VARV sequences. The Virus Orthologous Cluster database contains (among other orthopoxviruses) the gene annotations of 50 modern VARV (
70). Each gene is assigned to a cluster on the basis of BLASTp results and manual curation, genes within each cluster are orthologous. We determined correspondence between the assignments in (
6) and the orthologous clusters on the basis of sequence identity. Only clusters that had corresponding genes assigned by Hendrickson
et al. (
6) were included. Genes in the Virus Orthologous Cluster database were deemed functional if they were at least 80% of the length of the corresponding functional gene in (
6). To compare the gene content of the 50 modern VARV sequences, we counted the number of differences in gene status between two sequences. Gene status was considered different if the gene was present in one sequence and had a gene-inactivating mutation or was absent in the other.
Generation of consensus sequences
Reads from aVARV-VK382, the sample with the highest coverage, were mapped against VARV strain SLN68_258 (GenBank accession no. DQ441437.1) in Geneious using Medium sensitivity/fast and iterate up to five times. The alignment was visually inspected, and sections not present in the reference sequence were determined by iteratively mapping the reads against the new consensus sequence from either side of the insertion, until the consensus sequences from the mapped reads overlapped. The resulting consensus had 41 undetermined positions out of 192,255 nucleotides. Gene status and position in the genome order were both fully consistent with the analysis performed in the previous section. Consensus sequences for the remaining samples were generated by mapping their reads against the aVARV-VK382 consensus using Geneious using Medium sensitivity/fast and iterate up to five times. Manual inspection showed that all expected genes were present in the ancient consensus sequences, consistent with the gene loss analysis (table S3). The consensus sequences were also consistent with contiguous sequences (contigs) resulting from a de novo assembly of the reads (performed in Geneious, using default parameters). We called majority rule consensus sequences at a depth of one read, taking into account residue quality.
Recombination analyses
Seven algorithms, as implemented in the RDP4 package, were used to search for evidence of recombination in a selection of 11 orthopoxvirus sequences. The algorithms were: 3Seq (
71), BootScan (
72), Chimaera (
73), GENECONV (
74), MaxChi (
75), RDP (
76), and SiScan (
77). The sequences (with GenBank accession numbers in parentheses) were: Akhmeta (MH607143), CMLV (AY009089), CPXV (KC813492), CPXV (KC813508), TATV (NC_008291), VARV-VD21 (BK010317), VARV (NC_001611), aVARV-VK281, aVARV-VK382, aVARV-VK388, and aVARV-VK470. A region of 125 nucleotides of aVARV-VK470 was identified as being of possible recombinant origin by three of the seven RDP4 algorithms (3Seq, BootScan, and GENECONV), with aVARV-VK281 as the major parent and the minor contribution from an unknown parent. The RDP4 program warned that “the proposed recombination signal may be attributable to a process other than recombination” and of a “possible misidentification of recombinant.” No recombination signals were detected with aVARV-VK281, aVARV-VK382, or aVARV-VK388 as recombinant sequence. No modern sequence was identified as being the possible descendant of a recombination involving any ancient sequence. Details on the algorithms,
P values, and the possible breakpoints for aVARV-470 are shown in table S5.
Phylogenetic analyses
1) Maximum likelihood trees: Separate trees were made from datasets 1 and 2 (see supplementary text). The sequences were aligned using MAFFT. Maximum likelihood (ML) trees were generated using RAxML-NG with a K81uf+G+I (selected using bModelTest) substitution model and an ML estimate of tree topology, branch lengths, substitution rates, and nucleotide frequencies, with support estimated using 1000 bootstrap replicates.
2) Lower-coverage samples trees: The EPA-ng algorithm was used to infer the positions of consensus sequences from the nine lower-coverage samples (VK108, VK138, VK168, VK255, VK443, VK515, VK533, FIN1, and KHA1) on the ML tree made from dataset 2, including the sequences from the four higher-coverage samples. For all lower-coverage samples, consensus sequences were obtained by extracting the majority allele from BAM files made from reads mapped against the TATV reference genome (GenBank accession no. NC_008291.1) using the local alignment algorithm of Bowtie2, to reduce the impact of genotyping errors due to postmortem DNA damage (fig. S2). The resulting consensus sequence was aligned to the multisequence alignment of dataset 2 from which the original ML tree was inferred (see Maximum likelihood trees paragraph, above) using MAFFT (version 7.407) with the “–add” and “–keeplength” options. EPA-ng was run using the same model parameters that were used for the original ML tree (K81uf+G+I). Placements of lower-coverage samples were further analyzed using Gappa (
54), and the uncertainty in placements for each sample was visualized by plotting the inferred likelihood weight ratios (LWRs) for each branch on the reference tree, using the ggtree package in R (figs. S7 and S8).
To assess the EPA-ng placements of the lower-coverage samples, BAM files with mapped reads from the higher-coverage sample VK388 (aVARV) and the published sample VD21 (mVARV) were randomly subsampled to sets containing 30, 50, 100, 500, 1000, and 2000 reads, with 100 replicates each. Consensus sequences for each sample and replicate were made and placed back on the reference tree, using the same approach as for the lower-coverage samples (described above). Placement uncertainty was visualized using the normalized placement edge mass (
78) across all 100 replicates, for each branch on the reference tree (fig. S9).
To further assess the reliability of the placements, two additional metrics were used (
79). First, the distributions of LWRs for the first, second, and third most-likely placement for each subsample set were inferred. If many replicates of the placements for a particular subsample set have high confidence, a large fraction of the most-likely placements will show high LWRs (close to the maximum LWR of 1), and if second or third most-likely placements are observed they will have very low LWRs (fig. S11A). Second, the expected distance between placement locations (EDPL) was calculated. EDPL is defined as the average distance on the reference tree between different placements for a query sequence, weighted by their respective LWR. A particular query sequence can have low confidence in the actual placement (i.e., many placements with low LWR) but high confidence for a local neighborhood of the reference tree, for example, in the case of sets of highly similar sequences within a subclade of the reference tree. The EDPL thus gives a measure of the locality of the query placements across the reference tree, with low values indicating high locality (fig. S11B).
We also tested the ability of EPA-ng to correctly place a sequence not currently in the tree. For that, we chose the Abatino orthopoxvirus (GenBank accession no. MH816996.1) sequence, which diverges basal to Ectromelia virus (
80). We simulated a total of 1,000,000 sequencing reads from the Abatino reference genome using gargammel. Sequencing reads were simulated from HiSeq 2500 Illumina single-end runs of length 81 base pairs (bp), using an empirical read length distribution derived from sample VK388. After removal of sequencing adapters, read mapping, consensus sequence building, and phylogenetic placement was carried out, as previously described, for random subsamples of 30, 50, 100, 500, 1000, and 2000 reads (fig. S12). As expected, the placement edge masses were concentrated on the branch leading to the Ectromelia virus clade, even for as few as 30 simulated reads.
To complement and confirm the EPA-ng placements, branch-defining single-nucleotide polymorphisms (SNP) were inferred from the reference alignment. A SNP is considered branch-defining if all sequences descending from the branch share the same nucleotide at that location and all other sequences (on the alternate branches) do not. For each lower-coverage sample and each branch of the reference tree, the number and fraction of SNPs where the allele observed in the sample matched the branch-defining allele is shown in fig. S20.
3) Dated coalescent trees using BEAST2: We performed a linear regression of root-to-tip dates against sampling dates. Root-to-tip distances were extracted using TempEst from the ML tree inferred from dataset 1 and the sequences from the four higher-coverage samples in fig. S6 (dataset 1). The regression analysis was performed using SciPy. Dated coalescent trees were inferred using BEAST2. Forty-eight modern VARV sequences were used, as well as two published ancient sequences from the Czech Republic, and VARV-VD21 (dataset 1). Using bModelTest, we selected a TPM1 substitution model with unequal base frequencies, invariant sites, and gamma distributed rate heterogeneity among sites. The clock rate was constrained using a uniform [1 × 10
−9 to 1 × 10
−3 substitutions per site per year (s/s/y)] prior. Proper priors were used throughout. Trees were inferred using strict and relaxed log-normal molecular clocks, as well as constant and exponential coalescent and Bayesian skyline population priors. The Markov chain Monte Carlo analysis was run for 100M (for the strict clock model and coalescent constant and exponential population priors) or 200M (for the strict clock model and Bayesian skyline population prior, and the log-normal relaxed clock model and coalescent constant, coalescent exponential, and Bayesian skyline population priors) generations, sampling every 2000 generations. Convergence and mixing were assessed using Tracer, and an effective sample size >200 was reached for all relevant parameters. Final tree files were subsampled to contain 10,000 trees, and maximum clade credibility trees were summarized using TreeAnnotator. Path sampling, as implemented in BEAST2, was used to select the best-fitting molecular clock model and population prior. Per path sampling run, 50 steps with a chain length of 1,000,000 generations were run. Likelihood values were compared using a Bayes factor test (
81).
Acknowledgments
E.W. thanks St. John’s College, Cambridge, for providing an excellent environment for scientific discussions. We thank S. Rankin and the staff of the University of Cambridge High Performance Computing service, the National High-throughput DNA Sequencing Centre (Copenhagen;
seqcenter.ku.dk), L. Drenzel (curator at the Historical Museum in Stockholm, Sweden), and S. Thelin and C. Ödman (archaeologists and curators, Malmö Museum, Sweden). We thank the three reviewers for their suggestions and constructive criticism.
Funding: Funding was provided by the Danish National Research Foundation, the Danish National Advanced Technology Foundation (Genome Denmark platform, grant 019-2011-2), the Lundbeck Foundation, the Novo Nordisk Foundation, the Carlsberg Foundation, the Villum Kann Rasmussen Foundation (grant KU2016), and Deutsches Zentrum für Infektionsforschung. L.M.S. is funded by the Norwegian University of Science and Technology, the Royal Norwegian Society of Sciences and Letters, and Sparebank 1 SMN. G.L.S. is a Wellcome Trust Principal Research Fellow. E.W. was supported by the Lundbeck Foundation, the Danish National Research Foundation, the Wellcome Trust (214300/B/18/Z), and the Villum Kann Rasmussen Foundation (grant KU2016). B.M., C.D., R.A.M.F., D.J.S., and T.C.J. were supported by the European Union Horizon 2020 Research and Innovation Programme, COMPARE (grant agreement 643476). T.C.J. was supported by the Wellcome Trust (214300/B/18/Z).
Author contributions: B.M.: Initiation of the study, sample identification and data processing, phylogenetic analysis, gene inactivation analysis, visualizations, consensus sequences, manuscript preparation (initial draft, review, and editing). L.V.: Design of capture array, all virus sequence–targeted data production, and contribution to data analysis and writing of the manuscript. A.M., C.d.l.F.C., P.d.B.D., and H.S.: Sampling, sequence data generation, and manuscript review and editing. H.W., A.B., T.P., C.F., V.K., V.M., M.L.S.J., and P.Ø.S.: Osteological analysis, manuscript review and editing, and writing of site descriptions. M.E.A.: Supervision of data production and manuscript review and editing. A.J.H.: Design of capture array. S.H.N.: Initial data collection, processing, and identification of positive samples. L.M.S.: Osteological analysis and writing of site descriptions. J.B.: Selection of material, manuscript review and editing, and writing of site descriptions. Y.M. and I.G.: Contribution to site descriptions. G.S., G.L.S., C.D. R.A.M.F., and D.J.S.: Analysis of results and manuscript review and editing. E.W.: Initiation of the study; leading of efforts related to sampling and data production; and participation in discussion, interpretation of results, and writing of the manuscript. T.C.J.: Initiation of the study, sample identification and data processing, phylogenetic analysis, recombination analysis, read specificity analysis, and manuscript preparation (initial draft, review, and editing). M.S.: Initiation of the study, sample identification and data processing, ancient DNA data authentication, lower-coverage sample phylogenetic analyses, edit distance analysis, coverage, damage figures, and manuscript review and editing.
Competing interests: E.W. is involved in scientific collaboration with Illumina, Inc. All other authors declare no competing interests.
Data and materials availability: Sequencing data and consensus genomes are available in the European Nucleotide Archive under project number PRJEB38129. Data for sequence alignments, BEAST2 XML files, RAxML-NG output for the ML tree of the higher-coverage sequences, consensus sequences of the lower-coverage samples used in EPA analysis, Python code and data for producing figures and gene loss analysis, and SNP tables for each lower-coverage sample, showing status (derived, ancestral, or missing) at all branch informative SNPs are available via Figshare (
83).
RE: Diverse variola virus (smallpox) strains were widespread in northern Europe in the Viking Age
Mühlemann et al. ("Diverse variola virus (smallpox) strains were widespread in northern Europe in the Viking Age," Science 369 (2020), eaaw8977) report the capture, sequence and reconstruction of 4 Variola virus (VARV) genomes, the oldest yet drafted (600-1050 CE). While advancing markedly our knowledge of the evolutionary history and genomic content of VARV, the explanation and contextualization offered for their findings requires further clarification. The origins of smallpox have been debated for centuries. Before the disease was understood to be pathogenic, physicians and polymaths employed vague descriptions of pustular epidemics and pox-like endemic illness to comment on the location and age of smallpox's cradle.(1,2,3) Like purported soft tissue and skeletal evidence for the disease,(4,5) the interpretation of early 'poxes' is complicated and has precluded definitive diagnosis. At the same time, recent attempts to use sequence data and a 'molecular clock' to date VARV's emergence,(6,7,8) are subject to ascertainment bias and very distant outgroups (camelpox virus and taterapox virus), complicating VARV's origins.(12) The use of equivocal evidence for historical smallpox as calibration points has likewise confounded emergence estimates.(9) Lack of DNA-based unambiguous VARV detections pre-dating the 17th century has further problematized attempts to better characterize VARV evolution. For this reason, the new study adds considerable time depth and context to the evolutionary history of VARV. The question remains, however, did the VARV strains Mühlemann et al. report cause smallpox?
Although coverage depth of 3 of 4 genomes is low enough (<8x) that identified gene (in)activations remain tentative, the reconstructed VARV strains very clearly sit, as Mühlemann et al. observe, in a previously unknown and extinct sister clade. These VARV strains are not the direct ancestor of the VARV known to have caused smallpox, and importantly their gene content and active/inactive state, as the highest-coverage sample (VK382, 45x) confirms, contrasts sharply with both VARV eradicated in the twentieth century and VARV drafted using 19th- and 17th-century samples.(9,10) Consequently, the disease caused by the newly reconstructed VARV would have differed, perhaps significantly, both clinically and epidemiologically, from the disease we know and recognize as smallpox. The authors' repeated assertion, therefore, that they identified 'smallpox' in Northern Europe, 600-1050, is an overstatement of the evidence they present and regrettably misleading. They do not address the origins of smallpox sensu stricto. The data they present neither put smallpox in Europe more than a millennium ago nor 'disprove' earlier claims regarding when smallpox first appeared in Europe. In fact, the phylogenetic placement of the distinct clade Mühlemann et al. identify, suggests that the clade currently comprising 17th-20th century VARV strains, the only VARV strains unquestionably associated with smallpox, emerged sometime after these two clades diverged. Mühlemann et al. estimate the lower bound of that divergence to date to ca. 1,700-to-1,400 years ago (95% highest priority density). Importantly, VARV that lead to a disease clinically and epidemiologically recognizable as smallpox may have emerged at any point along the branch leading to the 17th-20th century VARV clade. Previous studies have proposed that that emergence took place centuries later.(9,11) Establishing when and where it occurred are critical questions.
Interdisciplinarity will be vital in reconstructing the evolutionary history of VARV as well as the history of smallpox. That the 'Viking' VARV study would have benefitted from more collaboration with historians and archaeologists is apparent in the supposition that the extinct VARV was widespread. Mühlemann et al. impressively recovered VARV sequences from 11 out of 525 individuals spanning 600-1050 and much of northern Europe. False negatives are the norm in ancient DNA and the sample size Mühlemnann et al. describe represents a vast undersampling of the lives lived across those centuries and thousands of kilometres. Yet, with thousands of premodern European skeletons now analyzed for pathogen DNA (12) widespread and persistent VARV would likely have been detected already.
Further, and contrary to what the authors suggest, the written evidence for European pustular epidemics in the 570s and 580s (13,14), which they do not consult, cannot confirm the reconstituted VARV was present in the 6th century or that it was at an any point 'pan-European'. The reported VARV did not originate from known epidemic contexts (Mühlemann et al. "Diverse variola virus (smallpox)", SI: Site Descriptions), its clinical manifestation is completely uncertain, confounding attempts to draw linkages with documented disease, and there were then no settlements in northern Europe with anywhere near the >200,000 inhabitants required to ensure long-term smallpox maintenance.(15) If this VARV sister-clade was indeed widespread and long persistent, and if 6th-century authors did observe the disease that these 4 Northern Europeans would later suffer, there would be, in fact, more reason to argue that the VARV Mühlemann et al. report was epidemiologically distinct from smallpox as we know it. How else to explain the persistence of VARV for centuries in a sparsely populated rural region? That some of the samples the authors present are described as 'predating' the Viking Age, but all are ultimately characterized as 'Viking' is, in this regard, also misleading. There was no singular 'Viking' Age. Bookends of the era vary region-to-region and disciplines delineate the period differently. VK388 died upwards of 190 years, and VK382 possibly 150, before the Viking Age as historians define it. VK281 and VK470 fit the periodization, but identifying these or the other nine VARV sequences as 'Viking' speciously reinforces the conjecture that the drafted VARV was widespread. 'Viking' to many conjures up images of frequent raids in foreign lands, long-distance trade, and exploration, but the great majority of northern Europeans then led far less eventful lives and were not superspreaders of exceptional potential.
Crucially, the identification of a pathogen must not be conflated with the identification of a disease and not all VARV would have caused smallpox as we understand it. As a pathogen evolves over centuries so too can the disease it causes, both clinically and epidemiologically. This is particularly true for viral pathogens, including DNA viruses like VARV. With this in mind, it is worth considering whether the novel 'Viking' virus should be labelled VARV at all. How we dub evolutionary relatives of contemporary pathogens from the past requires more reflection. There is no doubting, though, that as remarkable as the VARV Mühlemann et al. identify is, it was not smallpox.
1) J. Hahn, Variolarum antiquitates, nunc primum e Graecis erutae (Brieg, 1733).
2) J.-J. Paulet, Histoire de la petite vérole I (Paris, 1768).
3) A. Hirsch, "Blattern" in A. Hirsch, Handbuch der historisch-geographischen pathologie I (Stuttgart, 1860).
4) A. McCollum et al., "Poxvirus viability and signatures in historical relics" EID 20 (2014), 177-184.
5) Z. Patterson Ross et al., "The paradox of HBV evolution as revealed from a 16th century mummy" PLOS Pathogens 14 (2018), e1006750.
6) Y. Li et al., "On the origins of smallpox: Correlating variola phylogenetics with historical smallpox records" PNAS 104 (2007), 15787-15792.
7) I. Babkin, I. Babkina, "The origin of the variola virus" Viruses 7 (2015), 1100-1112.
8) G. Zehender et al., "Bayesian reconstruction of the evolutionary history and cross-species transition of variola virus and othropoxviruses" J Med Virology 90 (2018), 1134-1141.
9) A. Duggan et al., "17th century variola virus reveals the recent history of smallpox" Current Biology 26 (2016), 3407-3412.
10) A. Porter et al., "Comment: Characterization of Two Historic Smallpox Specimens from a Czech Museum" Viruses 9 (2017), 276.
11) A. Carmichael, A. Silverstein, "Smallpox in Europe before the seventeenth century: Virulent killer or benign disease?" J Hist Medicine Allied Sciences 42 (2987), 147-168.
12) S. Duchêne et al., "The recovery, interpretation and use of ancient pathogen genomes" Current Biology 30 (2020), R1-R17.
13) Gregory of Tours, Libri historiarum decem, eds. B. Krusch and W. Levison Monumenta Germaniae Historica Scriptores rerum merovingicarum I.1 (Hanover, 1851).
14) Marius of Avenches, Chronica, ed. T. Mommsen Monumenta Germaniae Historica Auctores antiquissimi XI (Berlin, 1894).
15) F. Fenner et al., Smallpox and its eradication (Geneva, 1988).