Severe acute respiratory syndrome (SARS) is a recently described disease that has affected approximately 8,500 people worldwide with a mortality rate of approximately 10% (according to the World Health Organization). The causative agent of SARS is a newly identified coronavirus, SARS-CoV, first isolated by propagation on Vero E6 cells (
5,
12,
17). The SARS-CoV genome has been sequenced, and the probable coding regions for viral proteins have been deduced. Like other coronaviruses, SARS-CoV is a positive-strand RNA virus that encodes four main structural proteins, M, N, E, and S (
20). Genetic analysis of the coding regions has demonstrated that SARS-CoV is distinct from the three known antigenic groups of coronaviruses (
5,
12); however, recent data studying the replicase gene suggest that SARS-CoV may be most related to group 2 coronaviruses (
21).
The S glycoprotein, a 1,255-amino-acid type I membrane glycoprotein (
20), is the prominent protein present in the viral membrane and presents as the typical spike structure found on all coronaviruses. SARS-CoV S glycoprotein domain structure has been deduced from sequence analysis (
20). The S glycoprotein consists of a leader (amino acids 1 to 14), an ectodomain represented by amino acids 15 to 1190, a membrane-spanning domain (amino acids 1191 to 1227), and a short intracellular tail (amino acids 1227 to 1255) (
20). The full-length SARS-CoV S glycoprotein has 23 potential N-linked glycosylation sites predicted by sequence analysis (
20). For group 2 and group 3 coronaviruses, the S glycoprotein is posttranslationally cleaved into two noncovalently associated subunits, S1 and S2 (
6,
15,
22,
23). The motif that leads to cleavage of the subunits in these coronaviruses (
15) is not present in SARS-CoV, suggesting that cleavage of the SARS-CoV S glycoprotein does not occur (
20).
Although the process by which SARS-CoV penetrates the cellular membrane has not been determined, the mechanism is most likely similar to that described for other coronaviruses. The S glycoprotein interacts with the cellular surface, and for coronaviruses HCoV-229E and mouse hepatitis virus (MHV) amino acids 1 to 547 (
2) and 1 to 330 (
13), respectively, are required for binding to the cellular receptor. This interaction is predicted to lead to conformational changes in the carboxy-terminal half of the S glycoprotein. This change culminates in fusion of the virus and host cell membranes, allowing for entry of the virus (
25-
27). Sequence analysis of the SARS-CoV S glycoprotein using the LearnCoil VMF software has predicted the presence of two coiled-coil motifs present at amino acids 900 to 974 and 1148 to 1190. These coiled-coil structures are present in the fusion domain of many varied viruses, including MHV (
4,
11,
14) and human immunodeficiency virus type 1 (
9), of which entry events have been predicted to occur as described above.
Here we describe the construction and expression of a codon-optimized gene encoding the soluble ectodomain (amino acids 1 to 1190) of the SARS-CoV S glycoprotein. Codon-optimized S glycoprotein (S1190) was secreted into the growth medium and purified by affinity chromatography. Expression levels of secreted S1190 glycoprotein were determined to be approximately 5 mg/liter after purification. The S1190 synthetic S glycoprotein was shown to have an apparent molecular mass of 170 kDa, a size similar to that observed for native S protein expressed in SARS-CoV-infected Vero E6 cells. Purified S1190 protein was readily detected by human SARS convalescent-phase serum (provided by Larry Anderson, Centers for Disease Control and Prevention [CDC]) as determined by Western blot analysis. Synthetic S glycoprotein could also bind to the surface of Vero E6 cells, demonstrating that soluble, codon-optimized S glycoprotein retains the biologic activity present in the native molecule. Carboxy-terminal truncations of S1190 were produced, and it was demonstrated that the amino acids 1 to 510 (S510) are required for binding to Vero E6 cell surfaces. Amino-terminal truncations of the S510 glycoprotein demonstrated that amino acids 270 to 510 contain the minimal receptor-binding domain of the SARS-CoV S glycoprotein.
MATERIALS AND METHODS
Construction of a synthetic gene encoding soluble codon-optimized SARS-CoV spike (S) protein and S protein fragments.
The amino acid sequence of the SARS-CoV (Urbani strain) S protein was obtained from the NCBI database (AAP13441). The soluble portion of the protein was determined to be the first 1,190 amino acids (of 1,255) and, as such, only the DNA encoding this sequence was synthesized. The DNA sequence was codon optimized for mammalian cell expression (
1,
16), replacing the natural codons with the following optimum codons: alanine (GCC), arginine (CGC), asparagine (AAC), aspartic acid (GAC), cysteine (TGC), glutamic acid (GAG), glutamine (CAG), glycine (GGC), histidine (CAC), isoleucine (ATC), leucine (CTG), lysine (AAG), methionine (ATG), phenylalanine (TTC), proline (CCC), serine (TCC), threonine (ACC), tryptophan (TGG), tyrosine (TAC), and valine (GTG). Runs of Cs and Gs were avoided, to simplify both synthesis of oligonucleotides as well as PCR conditions. When these stretches of Gs and Cs occurred, suboptimal codons were used. The 5′ end of the gene was modified to include a restriction site for HindIII and an irrelevant upstream overhang to facilitate cloning. The 3′ end of the synthetic gene was similarly modified to include an XbaI site and overhang sequences.
A total of 104 oligonucleotides were obtained (Integrated DNA Technologies; polyacrylamide gel electrophoresis purified) that represented the entire coding region of both the sense and antisense strands of the S protein gene, as well as engineered restriction sites. The most-5′ oligonucleotide of each strand was a 35-mer and all others were 70-mers, resulting in a 35-bp overlap between strands. In essence, the oligonucleotides from the sense strand fully overlapped the oligonucleotides of the antisense strand, leaving no gaps. Construction of the codon-optimized gene was performed as follows. Thirteen groups of oligonucleotides were selected that contained eight oligonucleotides (four sense and four antisense) in each group. PCR was performed on each set in a reaction mixture containing 20 μM deoxynucleoside triphosphates, 30 pmol of end oligonucleotides, 10 pmol of internal oligonucleotides, 1× cloned Pfu reaction buffer (Stratagene), and 1 U of Turbo Pfu (Stratagene). Thirty cycles of thermocycling (95°C for 15 s, 62°C for 30 s, and 68°C for 2 min) were performed, and the PCR products were resolved on 1% agarose gels. Specific products were gel purified (Qiagen) and divided into four separate groups containing either three or four of the first-step PCR products. PCR was again performed on each group, using oligonucleotides corresponding to the most-5′ end of each strand. These four PCR products were resolved on 0.8% agarose gels and gel purified as before. The four PCR products were mixed and amplified using oligonucleotides corresponding to the 5′ end of each strand of the entire synthetic gene. This final amplification yielded the 3,605-bp sequence consisting of the synthetic gene flanked by restriction sites.
The final PCR product encoding the SARS-CoV S glycoprotein gene was digested with HindIII and XbaI and cloned into pcDNA3.1 Myc/His (Invitrogen) in frame with the c-myc and His6 epitope tags. The cloned gene was sequenced to confirm that no errors had been accumulated during the PCR process. Of the four clones sequenced, none had sequence errors and no further genetic manipulations were required.
Once the sequence of the full-length soluble SARS-CoV S glycoprotein gene was confirmed, DNA encoding carboxy-terminally truncated soluble S glycoproteins was synthesized by PCR amplifying the desired fragment from the vector containing the full-length, codon-optimized gene encoding the S glycoprotein. Since the codon-optimized S1190 gene was used as a template for PCR, all truncated constructs were also codon optimized. Truncations were then cloned into pcDNA3.1 Myc/His as described above, and the DNA sequence was confirmed.
N-terminal truncations were also synthesized. PCR was used to amplify the leader sequence of the S1190 gene, containing a 3′ overhang corresponding to downstream sequences. The downstream sequences were then amplified and combined with the leader-overhang PCR product. PCR was again performed to synthesize copies of a gene that consisted of the S1190 leader fused immediately 5′ of the downstream coding region. These constructs essentially created deletions between the leader peptide and the desired downstream sequence.
Cells and cell culture.
HEK-293T/17 and Vero E6 cells, obtained from the American Type Culture Collection, were grown in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% fetal bovine serum and 100 IU of penicillin-streptomycin (complete DMEM) at 37°C with 5% CO2. To harvest cells, phosphate-buffered saline (PBS) containing 5 mM EDTA was added to the tissue culture dish and incubated for 5 min at room temperature.
Expression and purification of codon-optimized S glycoproteins.
All constructs were transfected into HEK-293T/17 cells using Lipofectamine 2000 (Invitrogen) as described by the manufacturer. Briefly, cells were grown to 80% confluence in 150-mm tissue culture dishes in 15 ml of DMEM-10% fetal calf serum (FCS). Thirty micrograms of DNA mixed with 75 μl of Lipofectamine 2000 was added to the cells, and plates were incubated overnight at 37°C. Medium was removed and stored, and fresh complete DMEM was added to the cells. Cells were incubated for an additional 24 h, at which time 3 mM sodium butyrate (Sigma) was added to the medium. An additional 24-h incubation was performed, and supernatants were removed from the plate. This supernatant was combined with the transfection supernatant and filtered using a 0.45-mm-pore-size filter apparatus. Filtered supernatants were mixed with Ni-nitrilotriacetic acid-agarose (Invitrogen) at a ratio of 0.5 ml of agarose for 40 ml of culture supernatant. Supernatant-agarose mixtures were incubated for 2 h on a rocking platform at room temperature. Agarose was removed from the supernatant by column filtration. Beads were washed with PBS, and protein was eluted using 250 mM imidazole. Eluted protein was dialyzed against PBS for 2 h at room temperature and concentrated to 2 ml with an Amicon Centriprep YM-10. Sodium dodecyl sulfate-PAGE (SDS-PAGE) and Coomassie blue staining were used to determine purity of isolated proteins.
SDS-PAGE and Western blotting.
Various concentrations of purified S glycoproteins were mixed with 2× reducing Laemmli sample buffer and boiled for 5 min. Samples were resolved using 12% Novex gels (Invitrogen) for 1.5 h at 200 V. Gels were transferred to Immobilon P (Millipore) as described by the manufacturer, and Western blot analysis was performed. Proteins were detected using the anti-c-myc (9E10) antibody (0.1 μg/ml; Sigma), followed by an anti-mouse immunoglobulin G (IgG)-horseradish peroxidase conjugate (1:5,000; Jackson ImmunoResearch). For detection with human convalescent-phase serum (provided by Larry Anderson, CDC), a dilution of 1:2,000 was used followed by detection with anti-human IgG-horseradish peroxidase (Jackson ImmunoResearch). For detection with mouse serum raised against synthetic S glycoproteins, the method was as described for the anti-c-myc antibody. Membranes were incubated with enhanced chemiluminescence reagent for 1 min and exposed to X-Omat-AR film for various periods of time.
S glycoprotein-binding assay.
Vero E6 or HEK-293T/17 cells were harvested with PBS-5 mM EDTA and aliquoted to microcentrifuge tubes (1 × 106 to 5 × 106 each). Pellets were resuspended in PBS containing 10% fetal bovine serum and various concentrations of the truncated soluble S glycoproteins (0.01 nM to 1 μM). Cells and S glycoprotein were incubated for 1 h at room temperature and washed once in PBS-2% FCS. Pellets were resuspended in 100 μl of PBS-2% FCS containing 10 μg of anti-c-myc (9E10) antibody/ml, incubated for 1 h at 4οC, and washed once in PBS-2% FCS. Pellets were resuspended in 100 μl of PBS-2% FCS containing 5 μl of anti-mouse IgG-phycoerythrin (PE; Jackson ImmunoResearch). Mixtures were incubated at 4°C for 40 min and washed twice, and fluorescence-activated cell sorter (FACS) analysis was performed using a FACScan instrument with CellQuest software (Becton Dickinson).
In order to specifically block S glycoprotein binding to Vero E6 cells, human convalescent-phase serum was incubated with cells and S glycoprotein. Serum concentration never exceeded 10%, and as human serum was diluted, FCS was used to normalize all reaction mixtures to a final concentration of 10% serum. Normal human serum was used as a negative control.
DISCUSSION
Understanding the biochemistry by which SARS-CoV infects target cells is of paramount importance in preventing infection and death associated with SARS. The S glycoprotein, which mediates viral entry, is an obvious protein for study to approach inhibiting viral infection. Here we describe the synthesis and expression of codon-optimized SARS-CoV S glycoprotein. Codon optimization has many benefits over traditional cloning techniques, the most obvious of which is the yield of protein obtained. We have expressed the full-length ectodomain of the S glycoprotein (S
1190) at a level of approximately 5 mg/liter. This yield is greater than typically seen for native viral glycoproteins expressed in mammalian cells (
8). We have not formally compared the two expression systems, but it is our experience that codon optimizing of viral glycoprotein genes for mammalian cells greatly increases expression levels. At this time, we have the ability to purify >10 mg of S
1190 protein at one time, allowing for diverse studies to be undertaken.
Comparisons between S
1190 glycoprotein and native SARS-CoV S glycoprotein were performed. The relative molecular weight of the S
1190 glycoprotein was essentially identical to that of native S glycoprotein as determined by SDS-PAGE and Western blotting. S
1190 protein did, however, demonstrate proteolytic breakdown products not observed in the native protein (Fig.
2). One explanation for this difference is the amount of protein tested in the assay. Significantly more S
1190 protein was resolved on the gel than the native S glycoprotein-containing viral lysate. It is possible that these smaller S glycoprotein fragments are present in virally infected cells, but this Western blotting is not sensitive enough to detect them. When quantities of S
1190 glycoprotein comparable to that of native glycoprotein in the viral lysate were resolved by SDS-PAGE, we did not see the smaller S glycoprotein fragments (Fig.
1). It is also possible that overexpression of S glycoprotein in mammalian cells leads to degradation of a portion of the expressed S glycoprotein. In any case, the majority of the codon-optimized S
1190 has an apparent molecular weight that is equivalent to that of native S glycoprotein.
It has been shown that SARS-CoV can readily infect Vero E6 cells in culture (
5,
12,
17). The receptor for the SARS-CoV S glycoprotein has not been identified, but one can assume that it is expressed on the surface of Vero E6 cells. S
1190 protein bound to the surface of Vero E6 cells in a dose-dependent manner, and specific antibodies blocked this interaction. These data suggest that soluble S
1190 glycoprotein possesses some of the biologic activities present in the native S glycoprotein, specifically receptor binding.
The S glycoprotein of transmissible gastroenteritis virus has been shown to interact not only with the receptor to mediate viral entry but also with sialic acid (
18). The latter interaction is not required for fusion but may aid in enteropathogenesis (
10). It is a formal possibility that the interaction of soluble SARS-CoV S
1190 glycoprotein with Vero E6 cell surfaces is mediated not solely by receptor, but in combination with carbohydrate residues on the Vero E6 cell surface. The interaction of S
1190 with ligands other than the cellular receptor could complicate the analysis of S
1190 binding to Vero E6 cell surfaces. Identification of the SARS-CoV cellular receptor will allow us to clarify this issue. In any case, the binding of S
1190 is specific to the permissive Vero E6 cells.
We have determined that the first 510 amino acids of the SARS CoV S glycoprotein contain the entire ligand-binding domain. Domain structures of the SARS-CoV S protein can now be deduced. For many coronaviruses, such as MHV, the S protein is cleaved into the ligand-binding subunit (S1) and the membrane fusion subunit (S2) (
6,
15,
22,
23). The receptor-binding domain of the MHV spike protein has been mapped to amino acids 1 to 330 (
13). These amino acids are contained within the S1 region. The ligand-binding domain of a coronavirus that does not express a cleaved S glycoprotein, HCoV-229E, has also been mapped. The first 547 amino acids of the HCoV-229E S protein are required for binding to the receptor hAPN (
2). For this viral S glycoprotein, the first 547 amino acids were termed the S1 domain, the designation based on ligand-binding capability and not evidence of physically distinct subunits. Sequence analysis (
20) as well as data described herein (Fig.
2) suggest that, analogous to HCoV-229E, SARS-CoV S glycoprotein is not cleaved into S1 and S2 subunits. Interestingly, a domain nearly identical in size to the HCoV-229E S1 domain contains the ligand-binding domain of SARS-CoV S glycoprotein. Since the first 510 amino acids of SARS-CoV S glycoprotein encompass the entire receptor-binding domain, we propose that amino acids 1 to 510 be termed S1 and amino acids 511 to 1190 be called S2.
N-terminal truncation of the S510 glycoprotein demonstrated that amino acids 270 to 510 represent the minimal receptor-binding domain. S270-510 was the only amino-terminal truncation of the S1 domain that could be expressed in HEK-293T/17 cells. S90-510, S150-510, S210-510, S330-510, and S390-510 expression levels were below our detection limits. It is unclear why these truncated constructs were not expressed. The most likely explanation is that sequences were not present in these glycoproteins to ensure proper folding. This misfolding may have prevented secretion into the medium or resulted in degradation of the various proteins. It is possible that a smaller domain than amino acids 270 to 510 confers the ligand binding capacity of the S glycoprotein, but we believe this is unlikely due to our inability to express smaller fragments. We speculate that S270-510 was expressed and secreted, since it represents an intact receptor-binding domain that possesses the appropriate sequences required for proper protein folding.
Expression and purification of large quantities of S1190, S510, and S270-510 glycoproteins will be important for identifying the SARS-CoV cellular receptor and for crystallization studies of the SARS-CoV S glycoprotein. S1190 crystallization would give a better understanding of the mechanism by which the S glycoprotein binds to and fuses with susceptible cells. Also, the S510 and S270-510 glycoproteins present the opportunity to determine the exact structure of the ligand-binding site of the S glycoprotein.
Finally, for other coronaviruses, such as transmissible gastroenteritis virus, MHV, and HCoV-229E, neutralizing epitopes are typically present in the S glycoprotein (
2,
3,
7,
19,
24). Neutralizing antibodies directed against the S glycoprotein are reactive to either the S1 receptor-binding domain or hydrophobic residues located in the S2 region. The antibodies specific for S2 are predicted to interfere with fusion of the viral and host cell envelopes. We suggest that these codon-optimized S glycoprotein domains are appropriate targets for monoclonal antibody development or as vaccine candidates.