The order
Nidovirales includes mammalian positive-polarity single-stranded RNA viruses in the arterivirus and coronavirus families (
10,
16). The
Coronaviridae family includes the
Coronavirus and
Torovirus genera (
10,
46). Despite significant size differences (∼13 to 32 kb), the polycistronic genome organization and regulation of gene expression from a nested set of subgenomic mRNAs are similar for all members of the order (
16,
46). The family
Coronaviridae contains the largest RNA viral genomes in nature (
26,
44). Transmissible gastroenteritis virus (TGEV), a group I coronavirus, contains a ∼28.5-kb genomic RNA that is packaged into a helical nucleocapsid structure and surrounded by an envelope that contains three virus-specific glycoprotein spikes, including the S glycoprotein, membrane glycoprotein (M), and a small envelope glycoprotein (E) (
17,
18,
33,
36). The TGEV genome is polycistronic and encodes eight large open reading frames (ORFs), which are expressed from full-length or subgenome-length mRNAs during infection (
17,
42,
43). The 5′-most ∼20 kb encode the RNA replicase genes, which are encoded in two large ORFs, designated 1a and 1b, the latter of which is expressed by ribosomal frameshifting (
3,
17). ORF1a encodes at least two viral proteases and several other nonstructural proteins, while ORF1b contains polymerase, helicase, and metal-binding motifs typical of an RNA polymerase (
3,
17,
19). In the 3′-most ∼9 kb of the TGEV genome, each of the downstream ORFs is preceded by a highly conserved intergenic sequence element, which directs the synthesis of each of the six or seven subgenomic RNAs (
11,
17,
18,
52). These subgenomic mRNAs are arranged in a nested set structure from the 3′ end of the genome, and each contains a leader RNA sequence derived from the 5′ end of the genome (
26,
29,
42,
43). Subgenomic mRNAs are generated by a discontinuous transcription mechanism, the details of which are somewhat controversial (
4,
40,
42,
43). In addition to the viral mRNAs, full-length and subgenome-length negative-strand RNAs are implicated in mRNA synthesis (
4,
26,
40,
42,
43). Another unique feature of coronavirus replication is the high RNA recombination frequencies associated with infection (
6,
25,
26).
DISCUSSION
The complete ∼30-kb nucleotide sequence for a number of coronaviruses has been available for about 10 years (
8,
17,
22,
27), yet until recently, a full-length infectious clone has not been assembled because of size constraints and regions of coronavirus genomic instability in bacterial vectors, the requirement for a vector system which allows simple reverse genetic applications, and the inability to synthesize full-length transcripts in vitro. Each of these inherent restrictions must be circumvented to assemble infectious coronavirus constructs and at the same time allow easy reverse genetic applications. In a landmark achievement, a full-length TGEV infectious clone was recently engineered into BAC vectors using standard DNA techniques (
3). Following DNA transfection into ST cells, full-length transcripts were initially transcribed from a cytomegalovirus (CMV) promoter and then amplified by virus replication in the cytoplasm of the cell. In this paper, we describe a rapid approach to systematically assembling a full-length infectious TGEV cDNA from a panel of six smaller subclones using in vitro ligation. These methods will provide a powerful complementary approach to systematically assemble new large cDNAs from a variety of microbial pathogens into BAC or other vectors that stably maintain large DNA inserts (
30). Importantly, RNA or DNA genomes which are too large, circular, or unstable in these cloning vectors can still be assembled using this in vitro ligation technique. As coronaviruses contain the largest RNA genome, these approaches should permit reverse genetic studies for all RNA viruses.
Evidence from several experiments demonstrated that transcripts of the TGEV 1000 genomic construct were infectious. Transcripts treated with RNase were not infectious, indicating that infection was likely initiated from the RNA transcripts synthesized in vitro. Medium from transfected cultures could be used to propagate infection, with corresponding cytopathology and viral antigen expression in fresh cultures of cells. Progeny virions formed plaques in monolayers of permissive cells, and plaque-purified molecularly cloned virus grew efficiently to levels equivalent to those of wild-type virus in permissive host cells. The host range phenotypes of molecularly cloned viruses and wild-type virus were similar in vitro, although additional experiments are needed to determine if these viruses utilize the feline aminopeptidase receptor for docking and entry into feline cells (
51). Most importantly, plaque-purified virus contained the expected
BglI and
BstXI marker mutations, providing definitive evidence that transcripts driven from the TGEV 1000 construct were infectious in vitro. The presence of these neutral mutations did not restrict the ability of icTGEV to replicate efficiently in ST cells.
It is remarkable that two entirely different approaches can be exploited to engineer infectious constructs of large RNA and DNA viruses. Our assembly strategy for coronavirus infectious constructs is simple and straightforward and does not depend on the availability of an existing viral defective interfering cDNA clone as a foundation for building the infectious construct (
3). In contrast to infectious clones of other positive-strand RNA viruses (
1,
2,
3,
13,
32,
35,
54), the TGEV 1000 construct must be assembled de novo and does not exist intact in bacterial vectors, circumventing problems in sequence instability. This did not restrict its applicability for reverse genetic applications, but rather allowed genetic manipulation of independent subclones, which will minimize the introduction of spurious mutations elsewhere in the genome during recombinant DNA manipulation. Another advantage of our approach is that different combinations of restriction sites can be used that generate highly variable 5′ or 3′ overhangs of 1 to 4 nucleotides in length, further increasing the specificity and sensitivity of the assembly cascade (Table
2). Because of insert toxicity in
E. coli, infectious clones of yellow fever virus and Japanese encephalitis virus were assembled by in vitro ligation from two subclones but used conventional restriction enzymes like
BamHI,
ApaI, and
AatI (
34,
48). Our strategy, however, prevents spurious self-assembly of subclones and will provide a strong complementary approach to engineering large RNA or DNA genomes into BAC vectors or other vectors that stably maintain large DNA inserts (
3).
It is interesting that in both TGEV infectious constructs assembled to date, sequences in or around the TGEV 3-Clpro motif were unstable in
E. coli. Our studies, coupled with the findings by Almazan et al. (
3), suggest that the unstable sequences can be disabled by bisecting the sequence between nucleotides 9758 and 9949 in the TGEV genome. This information may permit the isolation of larger TGEV A-B1, B2-C, and DE-F subclones and allow the assembly of infectious cDNAs following a single DNA isolation-ligation step. It is not clear whether similar unstable sequences are located at this position in other group 1 and group 2 coronaviruses.
Synthesizing ∼29-kb transcripts in vitro is problematic and the greatest impediment to generating infectious RNA from the assembled TGEV 1000 construct. Using a DNA launch platform and transcription of TGEV RNAs from a CMV promoter, transfection resulted in ∼36 infectious units/10 μg of DNA (
3). Using an RNA launch platform, similar results were obtained in our laboratory. Compared with Sindbis virus replicons encoding GFP, we synthesized ∼100-fold less full-length TGEV transcripts in vitro, probably due to the extreme size of the viral genome (data not shown). Using transcripts driven from the ∼28.5-kb TGEV full-length construct alone, viral structural gene expression was not noted in 10
5 cells. In BHK cultures cotransfected with TGEV and
N gene transcripts, ∼100 to 500 cells per 10
5 cells expressed viral structural proteins under identical conditions (data not shown). At 16 h posttransfection, little if any structural protein expression was noted in BHK cells electroporated with
N gene transcripts alone or transcripts treated with RNase A. This compares with transfection efficiencies of greater than 60% using the 11- to 12-kb Sindbis virus noncytopathic replicons encoding GFP. Although less dramatic, similar problems were reported with the ∼13-kb infectious arterivirus cDNA clone (
54). These problems may be circumvented somewhat by constructing BHK cell lines that simultaneously express the swine aminopeptidase N receptor and T7 RNA polymerase, allowing DNA transfection and transcription in vivo, and direct selection of progeny virus amplification in susceptible BHK cell lines (
1,
14,
51). Alternatively, CMV promoters can be inserted at the 5′ end of the TGEV A clone, allowing DNA launch of infectious RNA (
3).
In our studies, we could not generate infectious full-length transcripts until the putative T7 polymerase stop signals were removed from the TGEV genome, cytidine was included in agarose gels to reduce UV damage to DNA fragments, and BHK cells were used as recipient hosts (
21). At this time, we have no direct evidence that the T stretches in the TGEV A and C fragments might act as T7 termination sites, as the RNA structure in these regions has not been characterized biochemically. Inclusion of capped
N gene transcripts during the transfection process also enhanced the infectivity of the TGEV full-length construct in three separate trials. It is not completely clear whether these results were simply serendipitous or whether N transcripts were simply protecting the full-length transcripts from degradation by competitively interfering with RNase activity in cells or culture medium. The N protein may also protect the genome-length RNA in a ribonucleoprotein structure in the cell, enhance infectivity directly by stabilizing or functioning as part of an intact replication complex (
7,
15,
26), or enhance the expression of viral mRNAs (
49). Interestingly, TGEV engineered into BAC vectors did not require the presence of nucleocapsid protein to enhance transcript infectivity, suggesting an ancillary role for N transcripts in our system (
3).
Prior to these and earlier studies (
3), targeted RNA recombination using defective interfering donor RNAs was the best method for introducing precise alterations into the structural genes of the group II coronavirus mouse hepatitis virus, but this approach has been essentially limited to the 3′-most 9 kb of the mouse hepatitis virus genome (
24,
25,
53). The availability of TGEV infectious constructs will obviously benefit studies of all aspects of TGEV biology and pathogenesis, including analysis of the coronavirus replicase and the somewhat controversial transcription processes which govern expression of the subgenome-length mRNAs (
17,
40,
42,
43). The future development of TGEV vaccines and expression vectors is a particularly intriguing application, as the polycistronic genome organization and synthesis of subgenome-length mRNAs may allow the simultaneous expression of multiple foreign genes (
18). It will also be relatively easy to target TGEV to other species by simple replacements of the
S glycoprotein gene (
14,
25,
51). In contrast to arterivirus expression vectors, the coronavirus intergenic sequences rarely overlap upstream ORFs, simplifying the design and expression of foreign genes from downstream intergenic promoters (
11,
17,
52). Several TGEV downstream ORFs also appear to encode luxury functions that can be deleted from the viral genome without affecting infectivity in vitro (
18,
29,
56,
57). Finally, the helical TGEV nucleocapsid structure may minimize packaging constraints and allow the expression of multiple large genes from a single construct (
18,
26,
36).
The theoretical limits of our technique may approach several million base pairs of DNA and provide a rapid approach for inserting large cDNAs into BAC vectors (
20,
30,
45). The systematic assembly method should be appropriate for constructing full-length infectious constructs of other large RNA viruses, including coronaviruses (27 to 32 kb), toroviruses (24 to 27 kb), and filoviruses like the Ebola and Marburg viruses (19 kb) (
10,
26,
31). Viral genomes which are unstable in prokaryotic vectors might also be successfully cloned using these methods (
9,
34,
48). Moreover, full-length infectious double-stranded DNA genomes of adenoviruses and herpesviruses promise to be a powerful tool in vaccination, gene transfer, and gene therapy (
30,
45,
50,
55). Historically, full-length infectious constructs of these DNA viruses have been generated by ligation of DNA fragments, by homologous recombination (the more widely used method), or as full-length clones in BAC vectors (
30,
38,
45,
50,
55). Direct ligation of DNA fragments has been restricted by the low efficiency of large-fragment ligations and the scarcity of unique restriction sites that make the approach technically challenging. Systematic and precise assembly using rare cutters (
SfiI and
SapI) that leave variable ends and can be purposely engineered into a sequence should simplify assembly of large double-stranded DNA viruses (Table
2). This will alleviate the difficulties associated with typical restriction enzymes or recombination approaches, which often result in second-site alterations (
38,
45,
50,
55). This method may also circumvent other restrictions inherent in recombination-based methods which are limited to specific regions in the viral genome and which often result in recombinant viruses which are not wild type while allowing the introduction or removal of only a few genes in the virus vectors.
Our systematic assembly approach is not limited to manipulating the chromosomes of large RNA and DNA viruses. Over the past decade, the genome sequence of a large number of prokaryotic and eukaryotic chromosomes has provided significant insight into gene organization, structure, and function and likely identified the minimal set of genes required for prokaryotic life (
12,
23; TIGR home page
http://www.tigr.org ). Reconstruction of a minimal genome from the bottom up is technically challenging and requires systematically assembling large DNA fragments and then inserting the reconstructed genome into an environment that allows metabolic activity and replication (
12). Using a recursive approach, the systematic assembly of large chromosomes or minichromosomes from the bottom up is theoretically feasible (Table
2). Technical challenges will likely include the isolation of large DNA fragments and accompanying assembly intermediates from gels and the introduction of large DNA genomes into environments that permit replication. Our approach, however, may provide a means to address the function of large blocks of DNA, like pathogenesis islands, or to directly engineer chromosomes that contain large gene cassettes of interest (
12). Additional studies will be needed to test the application of these methods in other viral, prokaryotic, and eukaryotic genomes.