The Nidovirales order includes mammalian and avian positive-polarity, single-stranded RNA viruses in the arterivirus and coronavirus families (
43). The
Coronaviridae family is further subdivided into the
Coronavirus and
Torovirus genera (
14,
45). Despite remarkable size differences in the genomic RNA (13 to 32 kb), the polycistronic genome organization and regulation of gene expression from a nested set of subgenomic mRNAs are similar for all members of the order (
29,
45). Coronaviruses contain the largest single-stranded, plus-polarity RNA genome in nature, ranging in size from about 27 to 31 kb in length, and are divided into three subgroups based upon antigenic and sequence comparisons (
29,
43). The group I coronaviruses include human coronavirus strain 229E (HCV 229E) and transmissible gastroenteritis virus (TGEV). The group II coronaviruses contain mouse hepatitis virus (MHV), bovine coronavirus, and HCV OC43. Bovine coronavirus is an important pathogen of cattle while HCV infection is associated with a significant percentage of common colds in winter. The group III coronaviruses contain infectious bronchitis virus (IBV).
The MHV-A59 and MHV-JHM strains are the most extensively studied group II coronaviruses, and infection in susceptible mice results in a panencephalitis with acute and chronic demyelination that is histologically similar to multiple sclerosis in humans (
28). MHV-A59 is also hepatotropic, and infection results in hepatitis (
30). The MHV-A59 virion contains a ∼31.5-kb genome that encodes ∼10 large open reading frames (ORFs) (
8). The genomic RNA is packaged by a 50-kDa nucleocapsid protein (N) into a helical nucleocapsid structure and acquires an envelope by budding into intermediate compartments between the endoplasmic reticulum and the Golgi complex (
18,
37,
38,
50). The MHV virion contains three or four virus proteins including a spike glycoprotein of ∼180/90 kDa (S), a 65-kDa hemagglutinin esterase (HE), a 23-kDa membrane glycoprotein (M), and an ∼11-kDa E protein (
33,
34,
53). Note that the HE glycoprotein is encoded as a pseudogene in some MHV strains, including MHV-A59 (
34,
51). Consequently, its function in MHV replication and pathogenesis is not clear. Much of our knowledge concerning the replication strategy of coronaviruses has focused on the use of MHV as a model for pathogenesis, docking and entry, receptor usage, transcription, replication, polymerase function, and assembly and release (
4,
5,
6,
9,
10,
12,
15,
21,
22,
23,
40,
41,
42,
54).
The MHV-A59 gene 1 (replicase gene) is about 22 kb in length and contains two large overlapping ORFs (1a and 1b) with 1b in the −1 reading frame with respect to ORF1a (
8,
12). ORF1b expression requires a ribosomal frameshift at a pseudoknot structure in the 1a-1b overlap region (
12). Thus, the replicase gene is capable of expressing two large polyproteins, the ORF1a polyprotein (pp1a, 495 kDa) and the 1a/ab fusion polyprotein (pp1ab, 803 kDa) (
8,
12,
25). ORF1a encodes at least three experimentally confirmed protease activities including two papain-like proteases (PLP1 and PLP2) and a polio 3C-like protease (3CLpro) (
9,
20,
31,
32,). Neither pp1a nor pp1ab is detected intact in MHV-infected cells since the proteinases process the polyproteins cotranslationally and posttranslationally into at least 14 mature replicase proteins (
9,
10,
19,
20,
31,
32). The PLP1 cleavage products include the p28 and p65 proteins, both of which are derived from the N terminus of ORF1a (
20). The 3CLpro cleavage products include MP1, 3CLpro, αp10, αp22, αp12, and αp15, all of which are derived from the C terminus of pp1a (
10). ORF1b is cleaved by 3CLpro into at least four mature products, including putative polymerase and helicase polypeptides (
19). The functions of most of the replicase proteins in MHV-A59 replication and pathogenesis are unknown, although genetic complementation analysis with temperature-sensitive mutations suggests at least eight distinct functions that influence RNA synthesis (
46). Many of the replicase proteins are associated with membranes that are also sites of viral RNA synthesis (
19,
44).
Molecular genetic analysis of the structure and function of RNA virus genomes has been profoundly advanced by the availability of full-length cDNA clones, from which infectious RNA transcripts can be derived that replicate efficiently when introduced into permissive cell lines (
1,
2,
11). Until recently, the large size of the coronavirus RNA genome, coupled with regions of chromosomal instability, hindered the development of full-length infectious cDNAs. To circumvent these problems, a novel targeted recombination-based approach was developed, which involved the use of a molecularly cloned defective-interfering-like RNA and took advantage of high RNA recombination frequencies during mixed MHV infection (
26,
27,
35). Targeted recombination has been very useful for studies of the 3′-most ∼9 kb of the MHV genome but has not been used for other regions of the genome, notably the replicase gene (
26,
27).
Two general strategies have recently been developed to assemble full-length infectious cDNAs of coronavirus RNA genomes. One strategy involved the stable cloning of full-length group I coronaviruses (TGEV and HCV 229E) cDNAs into bacterial artificial chromosomes or vaccinia virus (
3,
48). Then, molecularly cloned recombinant virus was recovered following either a DNA- or an RNA-based launch, respectively. A full-length infectious cDNA of IBV has also been assembled in vaccinia virus vectors (
13). In contrast to these approaches, we have assembled a full-length infectious cDNA of TGEV by in vitro ligation from six component subclones and the subsequent in vitro transcription of infectious RNAs (
52). In this system, each of the component clones terminates in a restriction site that leaves a variable 3- or 4-nucleotide end, generated by a
BglI or
BstXI site, respectively. The different ends target assembly with only the appropriate adjacent subclones. A strength of this system is that reverse genetic applications are simplified in the component clones compared with the sequence complexity of full-length cDNAs. This allows for rapid reverse genetic applications within individual segments of the viral genome as well as the ability to rapidly mix and match mutations in different regions of the genome (
17,
52).
In this study we describe a full-length cDNA of MHV-A59, the first group II coronavirus that has been successfully recovered from a genome-length reverse genetic system. The assembly of full-length cDNAs of the group II coronaviruses has been hampered by the presence of numerous toxic regions in the viral ORF1 polymerase, which are unstable in bacteria with high- or low-copy-number plasmid vectors. We have solved this problem by using a strategy that separates the toxic regions during cloning and then regenerates the exact wild-type MHV sequence at the junctions between the component clones. This vector will allow for detailed reverse genetic applications throughout the entire MHV genome and will serve as a model for the assembly of other group II coronavirus full-length cDNAs, as well as other RNA and DNA viruses and microbial genomes.
DISCUSSION
In this study we report the first reverse genetic system for a group II coronavirus, MHV-A59, which allows for the successful recovery of infectious virus following assembly of a genome-length cDNA from a series of contiguous cDNA subclones. The approach was similar to the strategy that was used to assemble a full-length infectious cDNA of TGEV except that it was not usually necessary to introduce mutations and new restriction sites into the wild-type virus genome to direct the assembly cascade (
52). Rather, we have demonstrated that type IIS restriction endonuclease
Esp3I sites can be used to create the unique interconnecting junctions and yet be subsequently removed from the final assembly product, allowing for the reconstruction of an intact wild-type sequence. This approach avoided the introduction of nucleotide changes that are normally associated with building a full-length cDNA product of a viral genome. These nonpalindromic restriction sites will also provide other novel recombinant DNA applications. For example, by PCR, it will be possible to insert
Esp3I or a related nonpalindromic restriction site at any given nucleotide in a viral genome and use the variable domain for simple and rapid site-specific mutagenesis. By orienting the restriction sites as No See'm, the sites are removed during reassembly, leaving only the desired mutation in the final DNA product. The dual properties of strand specificity and a variable end overhang that can be tailored to match any sequence allow for
Esp3I sites to be engineered as universal connectors that can be joined with any other 4-nucleotide restriction site overhang (e.g.,
EcoRI,
PstXI, and
BamHI). Alternatively, No See'm sites can be used to insert foreign genes into viral, eukaryotic, or microbial genomes or vectors, simultaneously removing all evidence of the restriction sites that were used in the recombinant DNA manipulation.
The MHV-A59 systematic assembly strategy involves seven contiguous cDNAs and offers several unique advantages as a reverse genetic system for coronaviruses. These include (i) reduced sequence complexity and the corresponding increased availability of rare restriction sites that allow for rapid reverse genetic manipulations in individual cassettes compared with a full-length genome clone; (ii) sequence compartmentalization in individual cassettes, minimizing the possibility of spurious second-site mutations and recombination that may occur frequently in large inserts during recombinant DNA manipulations; (iii) the disruption of the MHV toxic regions, which are normally unstable in microbial vectors; (iv) avoidance of the introduction of mutations that create new restriction sites in the viral sequence; (v) compatibility with vaccinia virus or bacterial artificial chromosome vectors; and (vi) the fact that the theoretical limits of the assembly cascade with the rarest cutters like SapI greatly exceed the sizes of most million-base-pair microbial genomes as well as all RNA and DNA viral genomes described to date. As such, many DNA and RNA virus and microbial genomes could also be reassembled from component clones by this approach.
We purposely introduced several silent changes to remove preexisting
Esp3I sites that resided within the MHV-A59 genome sequence and to distinguish between molecularly cloned and wild-type viruses. In one instance, the
Esp3I site at position 4875 was removed because it left a TTAA overhang that would have prevented the directionality of assembly. The other
Esp3I sites were removed to minimize the total number of MHV-A59 subclones used in the assembly cascade. In two instances, we inserted silent mutations into the
Esp3I overhang to maximize sequence specificity and directionality at a particular junction (Tables
1 and
2), but this could be circumvented by choosing slightly different junction sites. The MHV cDNA cassettes can be ligated systematically as described for TGEV or simultaneously. Although numerous incomplete assembly intermediates were evident, our demonstration that simultaneous ligation of seven cDNAs will result in full-length cDNA will simplify the complexity of the assembly strategy. At this time, there is no evidence to indicate that this approach might introduce spurious mutations or genome rearrangements from aberrant assembly cascades. However, it is possible that such variants might arise following RNA transfection, as a consequence of high-frequency MHV RNA recombination between incomplete and genome-length transcripts (
35). It is likely that such variants would be replication impaired and rapidly outcompeted by wild-type virus. A second limitation is that the yield of full-length cDNA product is reduced, resulting in less robust transfection efficiencies than those of the more traditional systematic assembly method.
The group I and III coronaviruses appear to encode a single toxicity-instability region in ORF1a (
3,
52). In contrast, MHV-A59 contains several regions in ORF1, particularly between nucleotides 9555 and 15754, that are highly unstable and/or toxic in many high- or low-copy-number plasmids in
Escherichia coli. To solve this problem, we took advantage of the observation that toxicity can be significantly reduced if the domains are bisected into two distinct subclones and if the microbial vectors are propagated at 30°C. In MHV-A59, the toxic regions map within the small 3CLpro cleavage products at the C terminus of ORF1a, near the PLP about 5.0 kb in ORF1a, and at the N terminus of ORF1b (
10,
19,
20,
25,
31,
32). Instability appears to be associated with expression, as this entire domain (nucleotides 9555 to 15754) is stable in yeast vectors (pYES2.1 Topo TA cloning kit from Invitrogen) that maintain tight regulation over foreign gene expression (B. Yount et al., unpublished data). Importantly, the MHV B, C, and D clones were most stable in low-copy-number pSMART plasmids, which are transcription- and translation-free cloning vectors. These vectors lack a β-galactosidase promoter, and the MHV inserts are flanked by strong transcriptional stops, providing further support for the hypothesis that toxicity is expression linked. We are determining if the entire C-to-E domain can be stably cloned into pSMART vectors. As an alternative, preliminary data also suggest that the MHV B, C, and D inserts may be more stable in Topo II/pGEM vectors in the presence of glucose, presumably from the induction of catabolite repression.
Coronaviruses have been demonstrated elsewhere to package low concentrations of subgenomic mRNAs, especially N transcripts, and several studies have suggested that N transcripts may function in transcription and replication and are tightly associated with the replication complex (
7,
19,
42,
47). Previous studies in our laboratory with TGEV, and then in other laboratories with IBV, have shown that N transcripts enhance the infectivity of transcripts derived from coronavirus full-length cDNAs (
13,
52). With IBV, but not TGEV or HCV 229E, N transcripts are absolutely essential for full-length transcript infectivity (
13). In this study, N transcripts enhanced the infectivity of full-length MHV-A59 transcripts by 10- to 15-fold as evidenced by increased viral antigen expression and virus titers at 25 h p.i. It is unclear whether MHV N transcripts, N protein, or both are essential for increased virus yields following electroporation or whether this effect would be observed with transcripts of unrelated genes. Previous reports indicate that N transcripts do not appear absolutely necessary for HCV 229E subgenomic RNA transcription (
49). Additional studies are needed to determine exactly how N transcripts enhance infectivity of coronavirus genome-length transcripts in vitro.
MHV-A59 has long served as a model system for studying coronavirus genetics, polymerase function, replication, transcription, receptor usage, assembly and release, and pathogenesis (
29). The availability of a full-length infectious cDNA of MHV-A59 will allow for reverse genetic applications throughout the entire MHV genome. Virus recovered from the infectious construct replicated as efficiently as did wild-type virus, approached titers of 10
8 PFU/ml within 16 h p.i., and, importantly, contained the marker mutations engineered into several of the component clones. Molecularly cloned virus utilized MHVR as a receptor for docking and entry into cells, was sensitive to blockade with monoclonal antibodies against MHVR, and displayed similar host range restriction as did the wild-type virus. Consequently, the reverse genetic system will prove applicable to studying MHV-A59 receptor interactions as well as elucidating the mechanisms of coronavirus host range expansion and persistence (
5,
6,
15,
21,
22,
23,
41).
Further, the use of a cassette approach to construct infectious cDNAs of MHV-A59 will allow for the precise and rapid introduction of mutations into regions of the genome that are currently inaccessible by targeted RNA recombination approaches, specifically the entire 5′ two-thirds of the genome and notably the MHV replicase gene, gene 1. Although a great deal has been learned concerning the expression, processing, targeting, and interactions of replicase gene proteins, it has not been possible to determine the functions of most of them. Novel functions that must be encoded include proteins that regulate MHV discontinuous transcription, high-frequency RNA recombination, and positive- and negative-strand RNA synthesis. As arteriviruses use a similar transcriptional strategy and yet encode a replicase gene that is one-third to one-half the size of the MHV gene, it seems likely that novel nidovirus gene 1 functions will be encoded in the coronavirus replicase gene that assist in the replication of large RNA genomes. The approaches described here will allow direct reverse genetic studies of individual replicase gene proteins, as well as any combination of proteins because of the ability to mix and match different mutated gene fragments.
In summary, we have established a reverse genetic system for the group II coronavirus MHV-A59. The strategy used in the assembly of this infectious cDNA could be applied to other important group II coronaviruses like bovine coronavirus, an important pathogen of cattle, and HCV OC43, which is an important cause of upper respiratory tract infections in humans. Moreover, it will be possible to target MHV to multiple species by simple replacement of the S glycoprotein gene (
27), allowing for the development of MHV-A59 as a vaccine vector in domesticated animals and humans. Host range variants of MHV may recognize human CEA genes for docking and entry, allowing for virus targeting of unique cell populations in humans (
5,
6,
36). These features, coupled with a transcriptional strategy that will likely allow for regulated expression of multiple genes from the genome, should allow for the use of coronaviruses as heterologous vaccine vectors in humans and animals (
17,
24).