INTRODUCTION
Coronaviruses cause infections in a wide variety of animals, resulting in respiratory, enteric, hepatic, and neurological diseases of various levels of severity. Based on genotypic and serological characterization, coronaviruses traditionally were classified into three distinct groups, groups 1, 2, and 3 (
4). Recently, the Coronavirus Study Group of the International Committee for Taxonomy of Viruses has renamed the traditional group 1, 2, and 3 coronaviruses as
Alphacoronavirus,
Betacoronavirus, and
Gammacoronavirus, respectively (
http://talk.ictvonline.org/media/p/1230.aspx).
The recent severe acute respiratory syndrome (SARS) epidemic due to SARS coronavirus (SARS-CoV) and the identification of SARS-related coronaviruses (SARSr-CoVs) from Himalayan palm civets and horseshoe bats in mainland China have led to a boost in interest in the study of coronaviruses in both humans and animals (
5,
13,
24,
26,
33,
37,
55). Before the SARS epidemic in 2003, there were only 19 known coronaviruses, including 2 human, 13 mammalian, and 4 avian coronaviruses. After the SARS epidemic, more than 20 additional novel coronaviruses have been described with complete genome sequences (
9,
24–26,
31,
42,
45,
50,
53,
54,
57). These include 3 human coronaviruses, 15 mammalian coronaviruses, and 4 avian coronaviruses. For human coronaviruses, human coronavirus NL63 (HCoV-NL63) (an alphacoronavirus) and human coronavirus HKU1 (HCoV-HKU1) (a betacoronavirus) have been discovered in addition to the two previously known human coronaviruses, human coronavirus 229E (HCoV-229E) (an alphacoronavirus) and human coronavirus OC43 (HCoV-OC43) (a betacoronavirus), as well as SARS-CoV (a betacoronavirus) (
9,
45,
53,
56). While HCoV-229E and HCoV-OC43 were thought to account for 5 to 30% of human respiratory tract infections, HCoV-NL63 and HCoV-HKU1 often were detected in <5% of respiratory tract samples (
23,
29,
38). Outbreaks due to HCoV-OC43 also have been reported (
3,
32,
44). Nevertheless, the different HCoVs often cocirculate, with one or two HCoVs being predominant depending on the geographical area and year (
8,
11,
19,
23).
Coronaviruses are unique in having a high frequency of homologous RNA recombination, which is a result of random template switching during RNA replication that is thought to be mediated by a copy-choice mechanism (
28,
46). Their tendency for recombination and high mutation rates may allow them to adapt to new hosts and ecological niches. During our previous investigations on the molecular epidemiology of HCoV-HKU1, we documented the first evidence for natural recombination in coronavirus associated with human infection, resulting in the generation of different HCoV-HKU1 genotypes (
23,
52,
56). Since some strains of HCoV-HKU1 were found to display incongruent phylogenetic relationships upon the analysis of the RNA-dependent RNA polymerase (RdRp), spike (S), and nucleocapsid (N) genes, recombination events were suspected and later confirmed with the complete genome sequencing of 22 strains of HCoV-HKU1 and recombination analysis (
52). Although HCoV-OC43 is thought to be the most commonly encountered human coronavirus, no similar molecular epidemiology studies have been performed, and little is known about its evolution among humans. Only five complete genome sequences of HCoV-OC43, two from the same American Type Culture Collection (ATCC) strain, VR759, that was isolated in 1967, one Paris strain that was isolated in 2001, and two Belgium strains detected in 2003 and 2004, were available in GenBank (
39,
47,
48). In this study, we investigate the presence of different genotypes among HCoV-OC43 strains and identify potential recombination events that lead to the generation of novel genotypes, a situation analogous to that observed for HCoV-HKU1. HCoV-OC43 detected from the nasopharyngeal aspirates (NPAs) from 29 patients with respiratory tract infections from 2004 to 2011 were subjected to complete RdRp, S, and N gene sequencing and analysis. The clinical characteristics of patients also were analyzed in relation to molecular epidemiology results. As initial analyses showed the presence of potential recombination events, two complete genomes of HCoV-OC43 were selected for sequencing and further analysis. The emergence of a novel genotype of HCoV-OC43 through recombination and the evolution of different HCoV-OC43 genotypes also was described.
DISCUSSION
The present study represents the first report of possible natural recombination among HCoV-OC43 strains, which has resulted in the emergence of strains of potentially novel genotypes. Although HCoV-OC43 was first discovered in 1967 (
30), genomic studies of HCoV-OC43 have been scarce, with the first complete genomes coming from a laboratory strain from the ATCC and a clinical isolate, designated Paris, reported in 2004 (
39). This was followed by genomic studies on two HCoV-OC43 strains detected in 2003 and 2004 in Belgium, showing that the Belgium strains were genetically distinct and that HCoV-OC43 could have originated from recent zoonotic transmission (
47,
48). It also was found that the Paris isolate may be cross-contaminated with the ATCC strain, which explains their close genetic relatedness (
48). A later study from France analyzing the S1 genes of seven HCoV-OC43 strains also showed high genetic diversity (
43). In this study, we showed that there were at least three distinct clusters of HCoV-OC43 strains upon RdRp, S, and N gene analysis. One cluster, clade A, was formed by the ATCC and Paris strains. The other two clusters, clade B and clade C, were formed by the present HK strains and Belgium strains BE03 and BE04. However, 10 unusual strains displayed incongruent phylogenetic positions and belonged to clade B upon RdRp gene analysis and to clade C upon S and N gene analysis. These results suggested the presence of four different genotypes of HCoV-OC43, genotype A (comprising the ATCC and Paris strains), genotype B (including Belgium strain BE03 and five HK strains from 2004), genotype C (including 15 HK strains from 2004 to 2006), and genotype D (including the 10 unusual strains: Belgium strain BE04 and 9 HK strains, 1 from 2004 and 8 from 2008 to 2011). Moreover, genotype D is likely a recombinant genotype which has arisen from recombination between genotype B and C strains at a region between the RdRp and S genes within the genome. To investigate the suspected recombination event, complete genome sequences of two strains, HK04-01 (genotype C) and HK04-02 (genotype D), were determined. Both phylogenetic and bootscan analyses showed possible recombination events between genotypes B and C in the generation of genotype D strains, a situation similar to that reported for HCoV-HKU1 (
52). The analysis of more HCoV-OC43 strains from other countries also will reveal the relative prevalence of the different genotypes in different localities and the presence of additional genotypes arising from other recombination events.
The recombinant genotype D strains may represent an emerging HCoV-OC43 genotype associated with human infections. The present study, the first molecular epidemiology study on HCoV-OC43 infections with clinical characteristics presented, revealed genetic evolution into different genotypes over time. In a previous study from Belgium, three phylogenetic clusters were identified based on S gene analysis, the ATCC cluster and two clusters containing four 2003 strains and three 2004 strains, respectively, suggesting different temporal patterns among different clusters (
48). In this study, 29 HK strains collected during a 7-year period were included to better elucidate the genetic evolution of HCoV-OC43 over time. None of the contemporary strains belong to genotype A, which consisted only of ATCC and Paris strains that likely were isolated 44 years ago. Five of the HK strains from 2004, together with Belgium strain BE03 from 2003, belonged to genotype B. Fifteen HK strains from 2004 to 2006 belonged to genotype C. One HK strain from 2004, eight HK strains from 2008 to 2011, and Belgium strain BE04 belonged to genotype D. While only 1 of the 18 HK strains from 2004 belonged to genotype D, all 8 HK strains from 2008 to 2011 belonged to this recombinant genotype. This suggests that new genotypes of HCoV-OC43 have evolved over time, with the most recent HCoV-OC43 strains circulating in our population being dominated by genotype D, which likely has arisen from recombination as early as 2004. Molecular clock analysis using S and N gene sequences suggested that the most recent common ancestor of all HCoV-OC43 genotypes emerged in the 1950s (mean, 1957), while genotype B and C emerged in the 1980s (means, 1984 and 1989 by S and N gene analysis, respectively), genotype B emerged in the 1990s (means, 1996 and 1998 by S and N gene analysis, respectively), and genotype C emerged in the late 1990s to early 2000s (means, 1999 and 2002 by S and N gene analysis, respectively). Although the tMRCA of the recombinant genotype D strains could not be studied by molecular clock analysis, the detection of a genotype D HK strain and the reported Belgium strain BE04 from 2004 suggested that this genotype has emerged no later than that year. Moreover, seven of the eight genotype D HK strains from 2008 to 2011 were associated with pneumonia, especially in the elderly, suggesting that this emerging, recombinant genotype is associated with more severe disease. However, molecular epidemiology studies involving a larger number of strains and from different geographical areas are required to better understand the molecular evolution of HCoV-OC43 and the relative pathogenicity of the different genotypes. Continuous studies also are warranted to detect the emergence of new genotypes and recombinants of HCoV-OC43 as well as other human coronaviruses and to assess their significance and potential in causing future epidemics. Nevertheless, it should noted that the amplification and sequencing of a single gene may not be sufficient to define the genotypes of HCoV-OC43, HCoV-HKU1, HCoV-NL63, and probably other coronaviruses (
36,
52). Given that recombination events are not uncommon among these human coronaviruses, the amplification and sequencing of at least two gene loci, probably one from ORF1ab (e.g., RdRp or helicase) and one from HE to N (e.g., S or N), should be performed to more accurately understand their molecular epidemiology and reveal novel genotypes due to recombination events.
Although MHV is, historically, the most well-studied coronavirus for recombination in
in vitro studies (
10,
28), there is increasing evidence for natural recombination in other coronaviruses, some of which lead to the generation of new strains or genotypes. In feline coronavirus (FCoV), FCoV type II strains have been found to have originated from a double recombination between FCoV type I and canine coronavirus (CCoV) in the M gene (
14). Novel CCoV type II strains also have been suggested recently to have originated from double recombination with porcine transmissible gastroenteritis virus in the S gene (
6). In our previous study on the natural recombination of HCoV-HKU1 in the generation of different genotypes, the sites of major recombination were localized at nsp16 just upstream of the stop codon of ORF1ab, at the end of nsp6, and upstream of nsp5 (
52). Recently, we also have demonstrated natural recombination events among different bat coronaviruses, including those which could have accounted for the emergence of civet SARSr-CoV (
22). It was found that civet SARSr-CoV strain SZ3 was a potential recombinant of horseshoe bat SARSr-Rh-BatCoV strains Rp3 from Guangxi Province and Rf1 from Hubei Province, with a recombination breakpoint identified at the nsp16/S intergenic region (
22). In the study on Rousettus bat coronavirus HKU9, which belongs to novel
Betacoronavirus subgroup D, recombination was identified at nsp3 and at the nsp15/16 junction (
21). In this study, possible recombination was observed mainly within ORF1ab at the nsp2/nsp3 junction, nsp12/nsp13 junction, and NS2a/HE junction of HCoV-OC43 genomes. ORF1ab is the region most susceptible to recombination in the coronavirus genomes. Further studies are required to better understand the role of recombination in various coronaviruses and the common sites of recombination in coronavirus genomes.
The present results also revealed previously undescribed features in the HCoV-OC43 genomes. Although the TRS preceding most ORFs in HCoV-OC43 genomes has been described previously, the putative TRS shared between NS5a and E suggested for the ATCC strain (
39) was absent from the genomes of BE03, BE04, HK04-01, and HK04-02 as a result of a 12-nucleotide deletion. While we identified a potential alternative TRS, a 39-bp sequence homologous to the leader sequence also was found upstream of NS5a. Such a homologous sequence has been suggested as a compensation mechanism for the absence of TRS in HCoV-NL63 and HCoV-HKU1 (
34,
35). Further experiments are required to ascertain the mechanism for the translation of NS5a and E protein in HCoV-OC43. The analysis of the
Ka/
Ks ratios for different regions of the HCoV-OC43 genome also revealed interesting findings. The
Ka/
Ks ratios were relatively high at nsp5, HE, S, and N regions of the HCoV-OC43 genome, suggesting that these regions were under higher selective pressure. The nsp5 of HCoV-OC43 encodes the chymotrypsin-like protease 3C-like protease (3CL
pro) that is important for the proteolytic cleavage of the large polyprotein encoded by ORF1ab. The 3CL
pro of SARS-CoV also has been found to induce cellular apoptosis (
27). Both HE and S are major viral membrane glycoproteins which may be important for tissue tropism, cell attachment, and eliciting neutralizing antibodies (
20,
59,
60). The coronavirus N protein, a highly phosphorylated protein required for viral replication, is also an immunogenic protein which elicits subgroup-specific antibody responses (
12,
21,
51,
58). Interestingly, the
Ka/
Ks ratios at the S gene had dropped from 5.134 among genotype A strains isolated in the 1960s to 0.270 among the 29 HK strains from 2004 to 2011. This may reflect the rapid evolution of spike protein to adapt to a new host soon after interspecies transmission, as HCoV-OC43 was thought to have originated from zoonotic transmission, sharing a common ancestor with bovine coronavirus dating back to 1890 (
47). Further studies on more ancient and contemporary strains are required to better understand the selective pressure at different regions of the HCoV-OC43 genome and its significance in terms of protein function and evolution.