INTRODUCTION
Coronaviruses (CoVs) in the subfamily
Coronavirinae are important pathogens of mammalian and avian animals and currently compose four genera:
Alphacoronavirus,
Betacoronavirus,
Gammacoronavirus, and
Deltacoronavirus (
1). Members of
Alphacoronavirus and
Betacoronavirus are found exclusively in mammals, e.g., human CoV 229E, NL63, and OC43, and cause human respiratory diseases (
2). A CoV is also the causative agent of severe acute respiratory syndrome (SARS), the first global human pandemic disease of the 21st century, which spread to 30 countries in five continents, resulting in >8,000 human cases with 774 deaths (
3,
4). SARS CoV is a member of the
Betacoronavirus genus and is largely distinct from previously known human CoVs OC43 and 229E (
5–7). To identify the transmission source of SARS, large-scale animal screening was implemented in May 2003, and several strains of SARS CoVs were isolated from nasal and/or fecal swabs of six masked palm civets (
Paguma larvata) and one raccoon dog (
Nyctereutes procyonoides) collected from a wet market in Shenzhen retailing wild animals for exotic foods (
8). Their full genome sequences were 99.8% identical to that of human SARS CoV, and therefore civets were deemed to be an animal reservoir of this virus (
8). Further serological studies over a larger area revealed that only civets in the market were SARS seropositive, while farmed civets were seronegative, indicating that civets likely became infected from an unknown source in wet markets, not in the farming environment (
9). Moreover, a comprehensive analysis of cross-host evolution between SARS CoVs in civets and humans indicated that civets might be spillover animals rather than the natural hosts of SARS CoV (
10). In 2005, SARS-like CoVs sharing 87 to 92% nucleotide (nt) identity with SARS CoVs were identified in horseshoe bats (
11,
12). These studies provided the first evidence that bats were the natural hosts of SARS CoVs. Since then, more SARS-like CoVs have been reported in several insect bat species in China, Europe, and Africa, but none have genomes identical to SARS CoVs (
13–21). In particular, in these viruses, the key S1 domain of the S gene, responsible for receptor binding and determining host tropisms (
22,
23), shared a sequence identity as low as 76 to 78% with SARS CoVs and had a deletion of 19 amino acids (aa) in the S gene receptor binding domain (RBD), which mediates human infection via binding to human angiotensin-converting enzyme 2 (ACE2) (
11,
12,
24). Such key differences in the S gene between bat SARS-like CoVs and SARS CoVs determined their different host spectrums and made them unable to infect human and civets (
25–27). Clearly, these known bat SARS-like CoVs are not the progenitors of human/civet SARS CoVs, and there remains to be identified an intermediate virus to bridge bat to human/civet transmission (
24,
28). Recently, however, a novel SARS-like CoV (strain Rs3367) has been described which, so far, is more closely related to SARS CoVs than any previously reported bat SARS-like CoVs. Most importantly, it has been shown to use ACE2 receptor for cell entry, suggesting that it can cause direct human infection without an intermediate host (
29). Here, we report another novel SARS-like CoV (LYRa11) identified from
Rhinolophus affinis collected in Yunnan Province of China, which has high nucleotide and amino acid identities in its genome, similar to those of Rs3367, particularly in the RBD region. In addition, several clades of new alphacoronaviruses have been identified in
Rhinolophus and
Myotis spp.
MATERIALS AND METHODS
Ethics statement.
The procedures for sampling of bats in this study were reviewed and approved by the Administrative Committee on Animal Welfare of the Institute of Military Veterinary, Academy of Military Medical Sciences, China (Laboratory Animal Care and Use Committee Authorization, permit number JSY-DW-2010-02). All live bats were maintained and handled according to the Principles and Guidelines for Laboratory Animal Medicine (2006), Ministry of Science and Technology, China.
Sample collection and preparation.
In total, 268 adult bats were live captured with nets in 2011 in 4 counties/prefectures of Yunnan Province (
Fig. 1). Within each county there was either a single sampling location or two adjacent sites. Bat details are shown in
Table 1. All specimens were collected rectally using sterile swabs and immediately transferred to viral transport medium (Earle's balanced salt solution, 0.2% sodium bicarbonate, 0.5% bovine serum albumin, 18 μg/liter amikacin, 200 μg/liter vancomycin, 160 U/liter nystatin) and stored in liquid nitrogen prior to transportation to the laboratory, where they were stored at −80°C. All captured bats were released after sample collection.
Metagenomic analysis and RT-PCR screening.
All specimens were pooled and subjected to viral metagenomic analysis as per our published method, using barcode primers for differentiation of sample species and locations (
30). All sequences generated in one lane by Solexa sequencing (BGI) were subjected to BLASTn searches (
http://blast.ncbi.nlm.nih.gov/Blast.cgi) against the nonredundant database of GenBank, and all sequences with an E value of <10
−5 were imported into MetaGenome Analyzer v.4 (MEGAN) to determine their taxonomic classification (
30). Sequences assigned to CoVs were used for further analysis. Nested reverse transcription (RT)-PCR primers targeting a 440-bp fragment of the RNA-dependent RNA polymerase (RdRp) gene were synthesized based on previous publications (
31,
32). Total RNA of each rectal swab was extracted automatically using the RNeasy minikit (Qiagen) in a QIAcube (Qiagen). Reverse transcription was effected with the 1st cDNA synthesis kit (TaKaRa) according to the manufacturer's protocol. The cDNA was amplified using the PCR master mix (Tiangen) with the following PCR programs: 30 cycles (outer PCR) or 35 cycles (inner PCR) of denaturation at 94°C for 30 s, annealing at 54°C for 30 s, and extending at 72°C for 40 s, with double-distilled water (ddH
2O) as a negative control. Positive PCR amplicons were ligated into pMD18T vector (TaKaRa) and used to transfect DH5α competent
Escherichia coli (Tiangen). Six clones of each amplicon were randomly picked for sequencing by the Sanger method in an ABI 3730 sequencer (Invitrogen). All strains in this study were named according to the following rules: the first two letters represent the sampling location, with the remaining letters identifying the host species and numbers referring to the sampling order.
Full genome sequencing.
To obtain the full genome of LYRa11, 16 degenerate PCR primer pairs were designed using GeneFisher, based on human/civet SARS CoV and bat SARS-like CoV sequences available in GenBank, targeting almost the full length of the genome (sequences available upon request). For amplifying the terminal ends, 3′ and 5′ rapid amplification of cDNA ends (RACE) kits (TaKaRa) were employed. Viral cDNA was prepared as described above directly from positive samples and amplified using the Fast HiFidelity PCR kit (Tiangen). The amplicons were sequenced after blunt ligation into pZeroBack vector (Tiangen). Overlapping amplicons were assembled with SeqMan v.7.0 into full genomic sequences. Open reading frames (ORFs) of LYRa11 were determined by Vector NTI v.8, followed by comparison with those of other SARS CoVs and bat SARS-like CoVs.
Phylogenetic analysis of amplicons.
All 440-bp-long amplicons were aligned with their closest phylogenetic neighbors in GenBank using ClustalW v.2.0. Representatives of different species in the genera
Alphacoronavirus and
Betacoronavirus as well as some unapproved species were included in the alignment. Phylogenetic and molecular evolutionary analyses were constructed by the maximum likelihood method using MEGA v.6 with the Tamura-Nei substitution model and a bootstrap value of 1,000 (
33).
Morphological observation by electron microscopy.
The positive swab was examined for viral particles of LYRa11 as per our previous description (
34). Briefly, 100-μl swab suspensions were centrifuged at 120,000 ×
g for 3 h in an SW55Ti rotor (Beckman), and the resulting pellets were resuspended in 20 μl SM buffer (50 mM Tris, 10 mM MgSO
4, 0.1 M NaCl, pH 7.5) and directly negatively stained with 2% phosphotungstic acid for observation with a JEM-1200 EXII transmission electron microscope (JEOL).
S1 expression and antigenicity assay.
To characterize the antigenic reactivity of S proteins of bat SARS-like CoVs with human SARS CoV antibody, S1 fragments of human SARS CoV BJ01 (AY278488) and bat SARS-like CoVs LYRa11 and Rp3 (DQ071615) were expressed as fusion proteins with enhanced green fluorescent protein (EGFP) in BHK-21 cells and subjected to Western blot analysis using human convalescent-phase serum from a SARS patient in 2003. Briefly, the S1 fragment of SARS CoV BJ01 (nt 3 to 2028 of the S gene) was amplified from pcDNA3.1-S. The corresponding S1 fragments of LYRa11 and Rp3 were amplified from the above-described cDNA and commercially synthesized (GenScript). Three S1 fragments were inserted into pEGFP-C1 (Clontech) between XhoI and BamH I restriction sites to construct three S1 expressing plasmids, pEGFP-BJ, pEGFP-LY, and pEGFP-Rp3. These three plasmids, along with pEGFP-C1 (as a control), were transiently expressed in BHK-21 cells using FuGENE HD transfection reagent (Promega). Total proteins were harvested 24 h posttransfection with M-PER mammalian protein extraction reagent (Thermo Scientific), and concentration was measured by the BCA protein assay kit (Tiandz). A total of 20 μg total protein was boiled in 2× protein loading buffer (Tiangen) for 10 min, separated on 10% SDS-PAGE, and transferred onto a nitrocellulose membrane (Millipore). The blocked membrane was then incubated with primary antibody mixture (SARS-convalescent human serum, rabbit anti-EGFP antibody [Beyotime], and 5% skimmed milk [vol/vol/vol = 1:1:1,000]) at 4°C overnight followed by a secondary antibody mixture (peroxidase-conjugated mouse anti-human antibody [ZSGB-Bio], IRDye 800CW goat anti-rabbit secondary antibody [LI-COR Biosciences], and 5% skimmed milk [vol/vol/vol = 3:5:15,000]) at room temperature for 2 h. The washed membrane was then scanned in an Odyssey infrared imaging system (LI-COR Biosciences) at 700-nm and 800-nm wavelengths to detect EGFP protein and then reacted with SuperSignal West Pico chemiluminescent substrate (Thermo Scientific) and scanned using LAS-4000 Image Reader (Fujifilm) to detect S1 protein.
Recombination analysis.
To detect possible recombination between SARS and SARS-like CoVs, the full-length genomic sequence of LYRa11 was aligned with selected human/civet SARS CoVs (Tor2, AY274119; BJ01, AY278488; SZ3, AY304486) and bat SARS-like CoVs (Rp3, DQ071615; Rf1, DQ412042; Rs672, FJ588686; Rm1, DQ412043; Rs3367, KC881006; B41, DQ084199; B24, DQ022305; Yunnan2011, JX993988; and HKU3, GQ153542) using ClustalW v.2.0. The aligned sequences were initially scanned for recombinational events using the Recombination Detection Program (RDP; version 4) with MaxChi and Chimaera methods using 0.6 and 0.05 fractions of variable sites per window, respectively (
35,
36). The potential recombination events between LYRa11, Rs3367, Yunnan2011, and Rf1 suggested by RDP with strong
P values (<10
−20) were investigated further by similarity plot and bootscan analyses using SimPlot v.3.5.1 (
35–37). Maximum likelihood trees of four genomic regions generated by four breakpoints were constructed to illustrate the phylogenetic origin of parental regions. The breakpoint nucleotide locations are based on the LYRa11 genome.
Nucleotide sequence accession numbers.
The raw data of Solexa sequencing have been deposited in Short Reads Archives (SRA) under accession number SRA100822. All amplicon sequences, the S gene of LYRa3, and the full genome of LYRa11 generated in this study have been deposited in GenBank under accession numbers KF569973 to KF569997. All accession numbers of sequences from GenBank used in this study are shown in the figures.
DISCUSSION
Following identification of the first bat CoV in 2005 (
11,
12), further CoVs have been discovered in different bat species within China (summarized in
Table 3 and
Fig. 1). To date, CoVs have been found in 20 bat species within 4 families from 13 provinces and Hong Kong (
11–14,
16,
20,
29,
38,
40). Among these bat species, 10 were in the family
Vespertilionidae, 8 in
Rhinolophidae, with one in each of
Molossidae and
Pteropodidae, suggesting that
Vespertilionidae and
Rhinolophidae comprise the main hosts of CoVs. Within the above-named families, the genera
Miniopterus and
Myotis were found to harbor only alphacoronaviruses, while bats from the genera
Pipistrellus,
Tylonycteris, and
Rhinolophus harbored both alpha- and betacoronaviruses.
Table 3 also shows that alphacoronaviruses have a wider host range and show greater genetic diversity in bats than betacoronaviruses. In addition to China, countries reporting bat alphacoronaviruses include Japan (
46), the United States (
47), Spain (
32), Germany (
48), and Ghana (
21). Studies have shown that natural infection of various bats with various alphacoronaviruses is globally distributed, and bats are susceptible hosts of alphacoronaviruses. In addition, bats can also harbor diverse betacoronaviruses. According to the 9th Report of ICTV, since the first betacoronaviruses, i.e., SARS-like CoVs, were identified in bats, there have been 4 bat betacoronavirus species identified within the
Betacoronavirus genus (
1). More recently, some viruses related to Middle East respiratory syndrome (MERS) CoV have been discovered in different bat species in South Africa, Ghana, and Saudi Arabia (
49–51). It is apparent that more betacoronaviruses will be identified in bat populations, although not as abundantly as alphacoronaviruses. All of the above indicate that alpha- and betacoronaviruses have different circulation and transmission dynamics in bat populations. Among the carriers of betacoronaviruses, which are most associated with emerging human infectious diseases,
Rhinolophus spp. have been the main hosts found to harbor SARS-like CoVs in China and therefore have been considered to be the natural hosts of SARS CoVs (
11,
12,
29). With the increasing number of SARS-like CoVs identified in bats since 2005, the host range of SARS-like CoVs has extended from
Rhinolophus spp. to
Chaerephon spp. in China and
Hipposideros and
Chaerephon spp. in Africa (
13–21). Most SARS-like CoVs from non-
Rhinolophus spp. show far greater genetic distance to SARS CoVs than those from
Rhinolophus spp. This is especially true for viruses from Africa, which share less than 83% full genomic identities with SARS CoVs (
17,
19,
21), suggesting that the circulation of SARS-like CoVs is restricted mainly to
Rhinolophus spp. but with wide geo-locations.
Our attempt to amplify the full S gene of SARS-like CoVs from positive samples was successful, but amplification of the full S gene of alphacoronaviruses failed, possibly due to high sequence diversity as well as the limited sample amount. Instead, a 440-bp highly conserved region of the RdRp gene was amplified to construct the phylogenetic tree in the present study. This region is useful to analyze the diversity although cannot accurately determine the evolutionary status of CoVs (
20). Using this region, 5 clades of alphacoronavirus were identified from 4 of 5 bat species in 3 of the 4 sampled locations, while betacoronavirus was from only one species in a single location (
Table 1,
Fig. 1), indicating that bats in Yunnan have an abundant diversity of CoVs. In the present study, SARS-like CoV was detected only in 2 of 14 bats in Baoshan. This sample size was too small to permit detection of alphacoronaviruses, but betacoronaviruses were not found in 254 bats from the other three locations, which supports the conclusion that there is a restricted distribution of betacoronaviruses in the bat population. Taken all together, these data show that circulation and transmission dynamics of alpha- and betacoronaviruses in bats are different.
The gene encoding spike protein S is the highly variable region within the CoV genome. The S protein consists mainly of S1 and S2 domains, the former containing RBM (aa 426 to 518) within RBD (aa 319 to 518). RBM, which determines the host tropism of CoV by binding cell receptor ACE2, is the most variable region (
2,
24,
52). The RBM of SARS CoVs is a unique element which initiates viral infection by specifically binding to the ACE2 receptor of human and civet cells. In this process, two critical amino acid residues on RBM (479N and 487T) determine the efficiency of receptor binding since substitution of both abolishes viral binding to human ACE2, thereby abrogating the viral infection (
41,
42). Substitution of either residue alone, however, has no significant impact on human ACE2 binding (
24). Of significance is the fact that the S1 domain of bat SARS-like CoVs reported before 2013 has a very low nucleotide similarity to that of SARS CoVs (
Fig. 4A and
B), and there are several key deletions and mutations in their RBM (
Fig. 4C) which distinguish them from SARS CoVs and make them incapable of infecting humans and civets via binding to ACE2 (
11,
12,
24–27). In contrast, the LYRa11 in our study and Rs3367 reported recently (
29) have high sequence identity with the S1 domain of SARS CoVs, showing almost exactly the same RBM sequence, with a single amino acid substitution among the two key sites determining host tropism (
Fig. 4A to
C). This makes Rs3367 able to use human ACE2 for potentially direct human infection and to be crossly neutralized by convalescent-phase sera of SARS patients (
29). This property is probably shared by LYRa11 since its S1 domain, in addition to having very high sequence identify with Rs3367, is efficiently recognized by SARS-convalescent human serum (
Fig. 5B). The clear serological and RBM sequence evidences show that LYRa11 is antigenically very close to SARS CoV. All results given above strongly suggest that LYRa11 and Rs3367 have the potential to directly infect civets and humans and, as gap-filling viruses between previously reported bat SARS-like and human SARS CoVs, might be deemed progenitors of SARS CoVs. In consideration of the 91% full genomic identity with Rs3367, lack of ORF4, and its isolation site being >350 km from Kunming, where Rs3367 was identified (
Fig. 1B), the two viruses are distinct. It is reasonable to speculate that more LYRa11- or Rs3367-like viruses will be isolated from bats in the future.
Due to their unique mechanism of viral RNA replication, CoVs are prone to recombination during double infections (
43). Previous studies have suggested that SARS CoVs were likely recombinants originating from strains Rp3 and Rf1 (
13,
35), while Rs3367 recombined from lineages that had evolved into human/civet SARS CoV and bat SARS-like CoV Rs672 (
29). Our analysis of the recombination events among LYRa11 and other SARS or SARS-like CoVs using RBD and SimPlot and the results suggest that LYRa11 is a recombinant descending from lineages that had ultimately evolved into Rs3367 and Yunnan2011, both of which were detected in Yunnan Province (
16,
29). On this basis, it appears that SARS-like CoVs have been circulating in Yunnan bats for a long time, with obvious genetic recombination during virus transmission between bat species.
Our attempts to isolate infectious virus from the bat rectal samples failed, and only a few CoV-like particles were observed directly from rectal samples after ultracentrifugation. Reasons for believing these to be coronaviruses have been provided in Results, although the uncharacteristic morphology of the surface projections remains to be explained. Only a few petal-shaped spikes were observed on the surface of the virions (
Fig. 7). Spikes, however, are comprised mainly of S1 and S2 domains, which, respectively, form the globular portion and the stalk (
2). Studies have shown that S1 is not strongly associated with S2 and is easily detached from the virion during excessive freeze-thawing or ultracentrifugation (
53–55); hence, the observation of only a few intact spikes in our preparation might be ascribed to damage or loss of S1.
In conclusion, Yunnan is a region with diverse alpha- and betacoronaviruses. Due to the ease of recombination between different strains, more diverse bat CoVs are likely to be identified in the future in this region, with important public health implications. The identification of bat SARS-like CoVs unable to infect human and civet before 2013 prompted speculation about the existence of SARS-like CoVs able to directly infect human and civets via wild animals. This speculation has ended with the identification of LYRa11 and Rs3367, which are gap-filling viruses and likely have the ability to directly infect humans. The discovery of LYRa11, together with Rs3367, has provided an important clue to the origin of SARS CoV from bat SARS-like CoVs and presents the strongest evidence so far that bats are the natural hosts of SARS CoVs.