Discovery of marburgvirus HF in West Africa.
In early 2005, reports were received by the CDC and World Health Organization (WHO) of a large HF outbreak in Uige Province in northern Angola, West Africa (Fig.
1). On 15 March, specimens were collected and sent to CDC in Atlanta, Ga., for testing for evidence of infection with viruses associated with VHF which were known to be present in West Africa. To be thorough, the VHF-associated Marburg virus was also included despite the origin of all earlier outbreaks being restricted to East Africa. Testing included virus isolation, antigen capture ELISA, IgM and IgG ELISA, RT-PCR and Q-RT-PCR assays, and in one case IHC staining. Contrary to expectations, evidence of acute marburgvirus infection was found in 12 of the 15 initial specimens (Table
1), all of which were from fatal cases.
Clearly the most sensitive diagnostic test was the Q-RT-PCR assay, which was designed to detect a conserved sequence of VP40 present in all known species of marburgvirus, including the more distantly related Ravn strain. All positive Q-RT-PCR results were obtained in duplicate, with each set yielding amplification curves within 0.4 CT units. Virus isolation proved to be nearly as sensitive as the Q-RT-PCR assay, although limited clinical material in some cases made virus isolation attempts difficult. The antigen capture assay confirmed the RT-PCR results in four cases yet overall proved less sensitive. None of the initial samples were positive in either the marburgvirus-specific IgM or IgG assays performed in parallel. In those ELISAs, the positive and negative controls in each assay performed as expected, thus suggesting that the negative results with the human specimens were not the result of any systematic errors. In no instances did specimens test positive by antigen capture, IgM, IgG, or virus isolation and not test positive by RT-PCR (data not shown).
Three samples (samples 1411 to 1413 in Table
1) arrived the following week. One specimen was a skin biopsy from a patient in Uige, while the other two samples, 1411 and 1412 (a blood specimen and a serum specimen, respectively, drawn at the same time), were from a physician in Luanda with a recent contact history with the suspected VHF patients in Uige Province. The physician traveled directly to the capitol city of Luanda only days before her death. Both samples tested positive by Q-RT-PCR and virus isolation, while the skin biopsy tested positive by IHC staining.
Complete genome sequence determination and phylogenetic analysis.
To establish the genetic relationship of the Angola marburgvirus relative to prior East African viruses, and to determine the extent of sequence variation circulating within the Angola VHF outbreak, complete marburgvirus genomes were sequenced from 11 clinical samples collected from fatal cases throughout a time period of almost 3 months and encompassing the known geographic distribution of the outbreak. There were four municipalities within Uige Province from which virus-positive samples were collected (Fig.
1). The municipalities were Uige (samples 1379 to 1386 and 0998), Songo (sample 0754), Bungo (samples 0181, 0214, and 0215), and Damba (sample 0126). In addition, we included the sample from the Angolan capitol, Luanda (sample 1411) (Fig.
1). RNAs were extracted from multiple types of clinical specimens, including oral swabs, which collectively contained a wide range of viral loads. It should be noted that following the initial Marburg VHF diagnosis, field labs were established by CDC and the Public Health Agency of Canada on site in Luanda and Uige, respectively, at the request of the Ministry of Health of Angola and WHO. Samples 0126, 0181, 0214, 0215, 0754, and 0998 (Table
2) were obtained and tested within 24 h of specimen collection and then stored frozen until nucleotide sequence analysis could be initiated.
To provide a more complete representation of the overall potential for genetic diversity within a single marburgvirus outbreak and to provide a basis to which to compare the genetic diversity (or lack thereof) circulating in the Angola outbreak, we determined the complete genome sequences of four marburgvirus isolates obtained from the two most recent marburgvirus outbreaks prior to the one in Angola. Three of the isolates, 05DRC, 07DRC, and 09DRC, were from epidemiologically unlinked cases that occurred at different times during the 1998 outbreak in Durba, DRC, while the fourth isolate, Ravn (Rav), was obtained in 1987 from a 15-year-old Danish boy with a recent history of travel to Kitum Cave in Mount Elgon, Kenya, 9 days prior to disease onset.
All 16 newly determined complete marburgvirus sequences (19,114 nucleotides in length), in addition to those of Popp (Pop), Musoke (Mus), and Ozolin (Ozo), were analyzed to determine their phylogenetic relationship (Fig.
2A) and nucleotide distances (Fig.
2B). A maximum-likelihood analysis placed the Angola (Ang) isolates firmly (100% bootstrap support) within the clade containing the majority of the East African isolates. This result was surprising given the large geographic distance, ∼1,000 miles, of Uige, Angola, from the locations of all other known sources of marburgvirus. The branching pattern of the phylogenetic tree shows five overall branches, with the Rav/09DRC branch being the most divergent, showing nucleotide differences greater than 21% relative to viruses in other lineages, including Ang. 05DRC, 07DRC, and Ozo comprise a second well-defined lineage, differing from viruses in all other lineages by greater than 7%. Finally, using a cutoff of ∼5% nucleotide difference, Ang, Mus, and Pop form three additional lineages, with the Angola sequence diverging by 6.8 and 7.1% from Mus and Pop, respectively, and by greater than 7.4% from the Ozo, 05DRC, and 07DRC lineages. Mus and Pop differ from each other by 5.9%. For comparison, the difference between any of the marburgvirus genomes and those of either Zaire ebolavirus or Sudan ebolavirus is greater than 65%, while the Zaire and Sudan ebolaviruses differ from each other by greater than 40%.
Among the 11 clinical samples selected from the Angola outbreak, the genetic sequences were well conserved throughout the entire length (19,114 nucleotides) of the genome (Fig.
3). Ten of 11 genomes had five changes or fewer compared to the reference isolate Ang1379c. Remarkably, four of the sequences, obtained from clinical samples collected over a month and a half, showed no nucleotide differences for the entire 19,114-nucleotide genome. This confirmed the high fidelity of the RT-PCR-based sequence analysis performed and demonstrated that human-to-human passage of marburgvirus could occur in the absence of virus evolution. In addition, our reference sequence, Ang1379c, was also 100% identical to the sequence obtained from the corresponding virus isolate, Ang1379v, indicating the lack of selection occurring during culture of the virus from the clinical specimen. The most genetically diverse genome came from a specimen collected in the Songo municipality (specimen 0754), which had a total of 11 nucleotide changes (out of 19,114 bases) relative to the reference isolate (0.07% variation). Each of the unique changes were independently confirmed by generation of small (<1-kb) RT-PCR fragments followed by sequence analysis. In contrast, the analysis of three isolates from within the earlier Durba, DRC, outbreak showed at least 10 times greater sequence diversity, ranging from 0.8 to 21% nucleotide difference. The 05DRC and 07DRC isolates differed by 0.8% (>150 nucleotide differences) and differed by over 21% when each was compared to the 09DRC isolate. The 09DRC isolate is noteworthy because it represents the second member of the most distinct lineage within the marburgviruses, first defined by the Rav isolate (
20). The minimal genetic diversity observed in the Angola outbreak (maximum of 0.07% variation) relative to that seen in the Durba (0.8 to 21% variation) is consistent with the Angola marburgvirus outbreak being the result of a rare introduction of virus into the human population from the unknown reservoir followed by direct human-to-human transmission.
Comparative analysis of marburgvirus genetic elements.
The determination of the Ang, Rav, 05DRC, 07DRC, and 09DRC full-length sequences almost tripled the number of marburgvirus genomes available for analysis. Therefore, using this more extensive database of full-length sequences, a comprehensive effort was undertaken to reexamine many genetic features throughout the genome and determine the degree to which these elements are maintained across all eight marburgvirus strains. An initial macroscopic perspective of genome similarity is shown in Fig.
4, in which full-length genomes were analyzed for similarity using a sliding window of 50 nucleotides. The similarity plot reveals a striking pattern of sequence conservation among the open reading frames flanked by regions of much greater variation in the noncoding sequences. However, the more variable noncoding regions are punctuated with spikes of high identity corresponding to the regions containing transcription start and stop sequences (arrows). Alignments of the individual
cis-acting regulatory features demonstrate that 12 of 14 transcription start and stop sequences are 100% identical (Fig.
5). Of the two start/stop sequences that show variation, the differences are merely single-nucleotide transitions and are within the Ravn/09DRC lineages. In all genomic elements examined, the Angola sequence shows 100% identity with the consensus sequence. Well conserved genomic features include the lengths of the 5′ and 3′ untranslated regions of all seven predicted mRNAs, some of which differ slightly from previous reports (
8,
15), as well as the length and composition of the six intergenic (IR) sequences. A few notable variations within the IRs are a requirement for a purine residue in the first position of the trinucleotide IR between VP24 and L and conserved differences within the Ravn/09DRC lineage at three of seven positions in the IR between NP and VP35.
An examination of the nucleotide and amino acid distances for each of the seven marburgvirus gene products is shown in Fig.
6A and B, respectively. At the nucleotide level the most conserved genes are, in order, VP40, NP, VP24, and VP35, showing 0.2 to 15.2% variation, closely followed by VP30, showing 0.3 to 17.4% variation. The gene with the greatest nucleotide differences is GP (0.7 to 22.5%), consistent with previous alignments using fewer marburgvirus strains (
8,
20,
45). Surprisingly, the nucleotide difference in the polymerase (0.5 to 21.4%) is almost as much as that seen in the GP region despite having some stretches of very high conservation (Fig.
7G).
At the amino acid level, the percent differences among the marburgvirus strains for each of the seven gene products are quite different from those seen within the same genes at the nucleotide level. The degrees of variation within all the open reading frames, except GP, are decreased by 2- to 10-fold. The most striking example is the matrix protein VP40, in which the decrease is about 10-fold, demonstrating a distinct intolerance for amino acid changes (1.65% maximum variation). This intolerance is suggestive of tight physical constraints for VP40 in the virus assembly process. At the other end of the spectrum, GP showed no decrease at all in the level of amino acid variation (∼23%) relative to that seen at the nucleotide level, suggesting a selective pressure for nonsynonymous changes, most likely exerted by the immune system of the natural reservoir host(s).
We next examined known protein domains and motifs in each of the seven open reading frames by comparative alignment of the amino acid sequences (Fig.
7A to G). The analysis of GP (Fig.
7D) shows that the area of greatest diversity is a continuous 300-amino-acid (aa) stretch from residue 201 to 501, a region previously divided into two smaller variable domains (
45). Despite this diversity, a number of previously described features remain well conserved, many of which reside within this central variable domain. These features include 13 of 14 proposed N-glycosylation sites (N-x-T/S) and 12 of 12 cysteines (
45). Elsewhere in GP, the transmembrane domain (aa 649 to 670) and proposed fusion domain (aa 526 to 540) show 100% identity among all eight strains of marburgvirus, as does the furin cleavage site R-X-K/R-R (aa 432 to 435) (
54). Volchkov et al. (
52) proposed the presence of an immunosuppressive domain (ISD) in GP2 of ebolavirus based on analogy to retroviruses. This 26-amino-acid motif is also present in marburgvirus (
55) and shows a single Thr-to-Ala substitution at position 12 of the Angola sequence. Lending importance to the function of the ISD motif, recent studies have shown that 17-residue monomers of filovirus ISDs are capable of suppressing T-cell activation and Th1-related cytokine production in activated human and nonhuman primate peripheral blood mononuclear cells (
59). All other positions of the ISD show complete conservation. Another posttranslational modification ascribed to GP is the potential for phosphorylation at serine residues in two independent motifs between amino acids 260 and 273 (
46). The GP alignments demonstrate that only one phosphorylation site, encompassing amino acids 268 to 273, is conserved among all eight marburgviruses. The other site, a diserine motif at amino acids 260 to 261, is not present in five of eight strains, thus discounting this motif as a general feature of all marburgviruses.
Within the alignment of marburgvirus L amino acid sequences (Fig.
7G), the regions showing the greatest variation are clustered into five domains, i.e., amino acids 114 to 135, 262 to 348, 1143 to 1206, 1623 to 1645, and 1677 to 1866. The last domain, encompassing residues 1677 to 1866, may contain a hinge region within the polymerase based on analogy to morbilliviruses, which have been shown to tolerate the insertion of the green fluorescent protein within the proposed hinge (
13,
31).The areas of greatest conservation within the marburgvirus L sequences are in three large blocks, amino acids 349 to 1142, 1207 to 1622, and 1867 to 2322. The first of these three areas, amino acids 349 to 1142, contain the box A, B, and C sequences common to paramyxo- and rhabdovirus L proteins (
3,
36). Boxes A and B share 100% identity among the aligned marburgvirus sequences, while box C has a single K-to-R substitution in the Rav and 09DRC lineages. Two other motifs purported to be present in all polymerases of negative-sense RNA viruses are the diresidue D-D and QGDNQ motifs (
3,
21). These were previously identified in the Mus marburgvirus L sequence (
36) and are thought to be essential components of the polymerase catalytic core. In this alignment, these potential catalytic core motifs are all 100% conserved, with the exception of one diresidue D-D motif at amino acids 91 to 93, which shows degeneracy at two positions in the Ravn and 09DRC lineages. In addition, there is a putative ATP and/or purine ribonucleoside triphosphate binding domain found in all known L proteins of single-strand negative-sense viruses (amino acids 1931 to 1956) (
36), referred to as motif C in an alignment of filovirus L sequences (
53), that shows 100% identity at the core consensus glycine residues and identity at 23 of 26 positions overall within the motif. Three other potential ATP binding motifs at residues, 1325 to 1360, 1390 to 1420, and 1560 to 1593 (
36), show complete identity among all taxa analyzed. Finally, 46 of 52 consensus cysteine residues are completely conserved, including the dicysteine motif at positions 1376 to 1377, as previously noted (
36), which are found in most L proteins at similar locations and are believed to anchor protein secondary structure to maintain the necessary conformation of the putative active sites.
Outside of the polymerase and glycoprotein, a few notable genomic elements reside in the NP, VP35, and VP30 genes. An alignment of NP amino acid sequences (Fig.
7A) shows that nearly all the variability is in the C-terminal half of the protein, similar to that seen in a recent comparison of Zaire and Sudan ebolavirus full-length genome sequences (
44). Within this same half of the protein are seven unique phosphorylation domains (
28), each containing one or more serine/threonine kinase substrate motifs. Yet, despite the overall variation, the majority of individual kinase recognition motifs are conserved to the point that, regardless of the marburgvirus strain examined, at least one motif is present within each of the seven domains. The maintenance of these motifs highlights the potential role for phosphorylation in this region of NP, an area postulated to effect protein-protein interactions.
VP35 protein of marburgvirus plays an essential role in transcription and replication of viral RNA. A predicted coiled-coil domain with residues 70 to 120 may effect VP35 oligomerization, an interaction which in turn may be necessary for VP35 to bridge NP with L to form the active polymerase complex (
35). At the heart of the presumed coiled-coil domain are heptad repeats containing hydrophobic residues at the first and fourth positions. This domain shows variability at nine positions (Fig.
7B). Yet despite this variation, the spacing of the hydrophobicity is strictly maintained among all the aligned sequences. The Angola sequence shows no variation whatsoever from the consensus sequence throughout the 50-amino-acid domain. In addition to its role in RNA replication, VP35 of ebolavirus has recently been shown to contain an 11-amino-acid motif (Zaire ebolavirus residues 304 to 314) that is thought to be essential for type I interferon (IFN) antagonism (
19). This motif, which is possibly involved in RNA binding, is also present in marburgvirus (residues 293 to 303) and has identity at 9 of 11 positions with that of ebolavirus, including the three basic amino acids experimentally demonstrated to be important. Further highlighting the potential importance of this VP35 domain, the alignment (Fig.
7B) reveals that this domain is a general feature of all known marburgviruses.
VP30 of Zaire ebolavirus has been shown to contain an unconventional Cys
3-His zinc binding domain whose integrity has been shown to be required for VP30 function in virus transcription (
33). The alignment in Fig.
7E shows the high conservation of all four zinc-coordinating residues among all eight marburgvirus taxa. Adjacent to the zinc binding domain is a well-conserved tetraleucine motif which could facilitate VP30 oligomerization, similar to that observed with ebolavirus VP30 (
18).
Finally, VP24, whose alignment is shown in Fig.
7F, is highly conserved throughout the protein. Recent studies of marburgvirus VP24 have implicated it to be involved in nucleocapsid assembly and interactions between nucleocapsids and budding sites at the plasma membrane (
2). In addition, ebolavirus VP24 has been shown to bind Karyopherin α1 and to block STAT1 nuclear accumulation, thus implicating VP24 as a virulence determinant that allows ebolavirus to evade antiviral effects of IFNs (
41). Consistent with the idea that VP24 may be a virulence determinant, both mouse- and guinea pig-adapted ebolaviruses, each of which is capable of 100% lethality in its respective animal model, have amino changes that map to VP24 (
7,
51).