An unusual feature of the genomic RNA of picornaviruses is that it lacks the 5′-terminal cap structure present in most eucaryotic mRNAs and is instead covalently linked to a small, virally encoded protein, VPg (
3,
8). An internal ribosomal entry site located within the 5′ NTR directs the cap-independent translation of the polyprotein (
6,
19), so that the cellular translational machinery thus bypasses the 5′ end of the genome. The polyprotein contains three major functional segments, defined in part by the order of cleavage events that occur during its processing by one or more viral proteases (
26). In the case of the enteroviruses and rhinoviruses, the most N-terminal segment, P1, contains four capsid proteins, VP4, VP2, VP3, and VP1, while the P2 and P3 segments are comprised of nonstructural proteins involved in protein maturation and RNA replication. These include 2A
pro, 2B, 2C, 3A, 3B (VPg), 3C
pro, and 3D
pol and their functional precursors, 2BC, 3AB, and 3CD
pro (
15).
Following the identification of the HRV-14
cre, similar internal replication signals were identified within the open reading frames of Theiler's virus, a cardiovirus (
9), poliovirus type 1 (PV-1) (
5), and, more recently, HRV-2 (
4). The latter observations suggest that an internally located
cre may be a common feature of the RNA replication schemes of all picornaviruses. The RNA segments comprising these putative
cres can be folded into relatively simple stem-loop structures, but the elements differ in terms of their primary nucleotide sequence as well as their location within the open reading frame. In cardioviruses, the
cre is located in the VP2 region (
9). In contrast, the PV-1
cre is not located in the P1 capsid region, but in the 2C (P2) region (
5), while the HRV-2
cre is located within the 2A coding sequence (
4). Studies by Paul et al. (
17) indicate that the
cre acts as the primary template for uridylylation of VPg by the 3D
pol polymerase in vitro. In addition, a recent mutational analysis of the poliovirus polymerase indicates that amino acid residues on the surface of the protein that are essential for uridylylation of VPg are also involved in the interaction of the polymerase with the membrane-bound 3AB precursor protein (
10). Taken together, these data suggest that the
cre plays a critical role in bringing viral RNA into the replication complex and in initiating VPg uridylylation, the first step in the process of viral RNA replication.
Here, we describe experiments aimed at better defining both the sequence and structural requirements for cre function during the replication of HRV-14 RNA. We show that the fully functional cre resides within a 33-nucleotide (nt) RNA segment that is predicted to form a simple stem-loop structure. By introducing single-base substitutions at each position within the loop sequence, we have determined which nucleotides are essential for replication and which nucleotide substitutions are tolerated without significant degradation of cre function. We further show that the ability of individual mutant cre sequences to support RNA replication is closely correlated with their ability to serve as template for the uridylylation of VPg in an in vitro reaction.
DISCUSSION
To date,
cres have been identified within the protein-coding sequences of viruses representing three genera of picornaviruses: rhinoviruses, enteroviruses, and cardioviruses (
4,
5,
9,
11,
12). These
cres are similar in that they are all located within protein-coding sequence, but function at the level of RNA. Each is predicted to form a stem-loop structure about 30 to 65 nt in length, generally involving nucleotide tracts with low P-num values supporting the likelihood that the structure is conserved and thus biologically significant (
16). The available evidence suggests that these RNA structures are required within the positive strand of the RNA for efficient initiation of minus-strand RNA synthesis (
12) and that they are likely to be common to the RNA replication scheme of most if not all picornaviruses.
However, there are differences in these
cres that are also remarkable. First, they are located in different regions (VP1, VP2, 2A, or 2C coding region) of the genome, suggesting that
cre function is not dependent on a specific location and that different viruses have evolved
cres at different sites within the genome in ways conducive to their dual roles as both replication element and protein-coding sequence. Another surprising difference, considering their common roles in RNA replication, is the extent of the diversity that is evident in their primary sequences as well as predicted secondary structures. The sequence differences present in the HRV-14, PV-1, and HRV-2
cres (Fig.
8A) are even more surprising, since the HRV-14
cre is capable of substituting for the PV-1 and HRV-2
cres in VPg uridylylation reactions with PV-1 and HRV-2 enzymes (
4,
17). This suggests that the PV-1 and rhinovirus enzymes involved recognize some common sequence and/or structural features in these
cres. Our primary aim in this study was to identify the sequences that are critical for HRV-14
cre function and, by comparing these with other known or predicted
cres, to identify those sequences and structural features that are important for RNA replication.
Since previous results suggested that the minimal functional HRV-14
cre (
12) was significantly larger (96 nt) than the
cres identified subsequently in other picornaviruses (
4,
5), we created a series of deletion mutants in which the sequences flanking the apical loop of the structure predicted for the HRV-14
cre were progressively deleted (Fig.
1). The ability of each of these mutants to support the replication of HRV-14 RNA (Fig.
2B) indicated that the minimal functional
cre resides within a 33-nt sequence that is predicted to form a simple stem-loop (nt 2354 to 2386) (Fig.
1). The HRV-14
cre is thus no larger than the
cre identified in other picornaviruses.
These results indicate that only the top stem and loop sequences of the originally described 96-nt
cre structure (
12) are required for replication of HRV-14 RNA and that the large internal loop and lower duplex stem (see ΔP1LucCRE structure in Fig.
1) are not essential. Despite this, McKnight and Lemon (
12) found that some mutations in the lower stem were lethal to replication of HRV-14 RNA. Since the sequences forming the lower stem are located outside of the region, we have shown here to contain the minimal functional
cre (Fig.
1), these results are best explained by changes in the folded structure of the top loop and stem of the
cre mediated by mutations in the lower stem. If this interpretation is correct, the tertiary structure of the
cre loop is critically important to its ability to function in RNA replication and probably also uridylylation of VPg. In general, however, it appears that a stem of 8 to 10 bp, similar to what we have determined for the minimal HRV-14
cre, is sufficient to support
cre function. Disruption of the lower part of the PV-1
cre stem (leaving 9 bp at the base of the loop) had little effect on
cre function (
24). In addition, the predicted cardiovirus
cre structures have stems formed by 8 to 10 bp (
9).
We also carried out an extensive mutational analysis of the loop of the HRV-14
cre, because previous work points to its importance in RNA replication (
12). The HRV-14
cre presents a unique opportunity for such a mutational analysis, since unlike the PV-1 or HRV-2
cre, it is located within the P1 region that encodes capsid proteins that are not necessary for RNA replication (
12). Thus, mutations in the HRV-14
cre do not alter the amino acid sequence of proteins that contribute to the replicase. This distinguishes the
cre mutants we have studied here from many of the mutations that have been studied previously in the PV-1 or HRV-2
cres (
4,
17) and has also allowed us to do a more complete analysis. The fact that we found a strong correlation between the effects of these HRV-14 mutations on VPg uridylylation in vitro and on RNA replication in vivo (Fig.
7) provides additional indirect, but nonetheless strong, evidence that the ability of the
cre to function as a template for VPg uridylylation is essential for viral RNA synthesis.
We found that an AAA triplet (nt 2367 to 2369, or 67A, 68A, and 69A) located in the 5′ half of the loop and 2 nt, 63G and 76A, at the bottom of the loop, are important for
cre function, while there are no requirements for specific bases at other positions (Fig.
5). Any substitution within the AAA triplet abrogated RNA replication, except for the substitution of 69A with 69G. These results are reminiscent of previous studies of the PV-1
cre, in which the mutation of either of the first two A's of the AAA triplet within the 5′ half of the loop in the PV-1 element abolished infectivity (
24). Mutation of these adenosines in the HRV-2
cre also abrogated uridylylation of VPg and replication of the virus (
4). This AAA triplet is part of the AAACA motif identified by Rieder et al. (
24) in the PV-1
cre loop and considered to be a common feature of the
cre. The mutational analysis of the HRV-14
cre shown in Fig.
4 and
6, however, demonstrates that the CA dinucleotide within the AAACA motif is not necessary for replication or uridylylation. We also found that 63G and 76A, located at the junction of the
cre loop and stem, were critically important for
cre activity. As with 69A, only purine transition mutations were permissible at these bases.
Alignments of the sequences around the AAA triplet in the HRV-14, PV-1, and HRV-2
cres indicate that the critical base residues we have identified in the HRV-14
cre are perfectly conserved in these other
cres (Fig.
8A, boldface type). This includes the 63G and 76A bases at the junction of the HRV-14
cre loop and stem. However, in contrast to the HRV-14 loop sequence, which is predicted by MFOLD to be 14 nt in length, the homologous sequences in the PV-1 and HRV-2
cres are predicted to contain several base pairs and an internal loop and to fold differently from HRV-14 as well as each other (Fig.
8B). These structure predictions have not been tested by any physical means, however, nor by appropriate mutational analyses, so their validity is unknown. Despite the differences in the computer-predicted secondary structures, we suspect that these sequences are similarly structured in all three
cres. Such structural homology would make the observation that these
cres are functionally exchangeable with each other in uridylylation reactions understandable (
4,
17). Alternatively, it is possible that the binding of proteins such as 3CD, 3D, and VPg to the PV-1 and HRV-2 elements in the earliest steps of the uridylylation reaction might cause a change in the conformation of the RNA, opening up internal base pairs so that the PV-1 and HRV-2 structures resemble that of the HRV-14
cre.
Gerber et al. (
4) identified potential
cres in two rhinoviruses that are closely related to HRV-2, HRV-16 and HRV-1b. These proposed
cre sequences were located at essentially the same position in the genome as the HRV-2
cre. Again, the bases that we found to be essential for HRV-14
cre activity are conserved in these putative
cres, including 63G and 76A at the junction of the loop and the stem. It is interesting to note, however, that in the proposed HRV-16
cre, the AAA triplet is replaced with AAG. We found this substitution to be functional in the HRV-14 background, both for replication and for uridylylation (Fig.
4 and
6).
Based on our mutational analysis of the HRV-14
cre and comparisons of the HRV-14
cre sequence with the
cres of other viruses, we propose a common motif for the loop segment of rhinovirus and enterovirus
cres that likely defines a common structure (Fig.
8B): R
1NNNAAR
2NNNNNNR
3.
We predict that an AAR triplet will be present in the 5′ half of the loop sequence in all these replication elements. At the third base position of this triplet (R
2), the base may be either A or G, but there is clearly a preference for A. This is also the case for R
3, while a G is preferable at R
1. R
1 and R
3 are located at the extreme ends of the loop and seem likely to be involved in a non-Watson-Crick base pair interaction (see below). At the remaining positions, A, C, G, or U appears to be equivalently acceptable for both replication and uridylylation, and the specific base present is likely to be determined more by the requirement for coding specific amino acids than by a requirement for preservation of
cre function. However, a mutation involving the substitution of 4 contiguous bases within the loop sequence (69A through 72A) was found previously by McKnight and Lemon (
12) to be lethal for replication, even though we have shown here that each of the individual base substitutions in this mutant was permissive for replication and uridylylation. It is likely that the failure of the multiple-substitution mutant to function as a
cre was due to perturbation of the structure of the loop following substitution of such a large proportion of the bases within it.
All of the known cardiovirus
cre sequences also possess an AAA motif that is predicted to be at least partially single stranded in these viruses. A mutation at the second A of this motif in the Theiler's virus
cre was shown to be lethal for RNA replication (
9). However, the cardiovirus
cre appears to have a very different structure from that of the PV-1 or rhinovirus
cre, because the homologous AAA triplet in Theiler's virus appears to be located within a small bulge-loop that is part of a larger hairpin structure (
9). This difference in structure is consistent with the failure of the HRV-14
cre to substitute for the cardioviral
cre in supporting viral replication (
9), as well as the greater phylogenetic distance between the cardioviruses and these other picornaviral genera.
Studies by Paul et al. (
17) and Gerber et al. (
4) have shown that the PV-1
cre acts as the primary template for VPg uridylylation in vitro, a reaction that requires only synthetic VPg, UTP, purified PV-1 RNA, PV-1 RNA polymerase 3D
pol, and Mg
2+. Mutations that abolish the ability of the PV-1 or HRV-2
cre to act as template for VPg uridylylation in vitro also eliminate their ability to support viral RNA replication in vivo (
4,
17). Substitution of the AAA triplet in the HRV-2
cre with CAA also led to a change in the specificity of the nucleotidylylation reaction, with the covalent addition of guanine to VPg leading to VPg-pG (
4). These observations suggest that the AAA triplet functions as a template for the nucleotidylylation of VPg by using a slide-back mechanism in which the most 5′ adenosine of the AAA triplet templates the nucleotide to be linked to VPg. This indicates that the conserved AAR
2 residues within the loop of the rhinovirus and enterovirus
cres are likely to be located on the surface of the folded RNA structure. The conserved 63G and 76A residues (Fig.
8B) have the potential to form a non-Watson-Crick closing pair at the base of the loop. Recent studies of synthetic RNA aptomers suggest that the presence of a GA closing pair significantly influences the structure of the adjacent RNA loop and may have a critical role in determining the ability of the loop to form stable loop-loop interactions (
2). While there is no evidence that the
cre loop is involved in a loop-loop interaction, we speculate that the conserved 63G and 76A residues form such a closing pair and that the presence of this closing pair contributes in an important way to the structure of the
cre loop that is required for proper presentation of the AAR
2 triplet as the template for uridylylation.