Comparisons within and between sister pairs.
Overall mean
dS and
dN values were computed for nonoverlapping nonepitope regions, overlapping nonepitope regions, nonoverlapping epitope regions, and overlapping nonepitope regions both within and between sister pairs (Table
1). The mean
dS value was greater than the mean
dN value in all cases, and this difference was statistically significant for all comparisons except for within-pair comparisons of overlapping epitope regions (Table
1). This result implies that, on average, both epitope and nonepitope regions are subject to purifying selection acting at nonsynonymous sites; in other words, selection acting to eliminate mutations harmful to protein structure (
22).
For the 69 epitope regions, there was a significant positive correlation between mean
dN value in within-pair comparisons and mean
dN value in between-pair comparisons (
r = 0.809;
P < 0.001) (Fig.
2A). This result implies that functional constraints on CTL epitopes were generally similar over the shorter evolutionary time spans represented by within-pair comparisons and the longer time spans represented by between-pair comparisons.
There was also a significant positive correlation between mean
dS value in within-pair comparisons and mean
dS value in between-pair comparisons (
r = 0.443;
P < 0.001; Fig.
2B). However, the two correlations were significantly different (
P = 0.003; two-tailed test). The weaker correlation in the case of
dS than in the case of
dN may simply reflect the greater stochastic error in the former due to the smaller number of synonymous sites than of nonsynonymous sites. Nonetheless, the positive correlation between the mean
dS value in within-pair comparisons and that in between-pair comparisons suggests that mutation rates in the CTL epitopes were generally similar over the shorter evolutionary time spans represented by within-pair comparisons and the longer time spans represented by between-pair comparisons.
Selection on CTL epitopes.
In spite of the fact that the overall mean
dS value exceeded the mean
dN value in within-pair comparisons (Table
1), there were certain within-pair comparisons in all regions in which
dN was greater than
dS (Fig.
3). In nonoverlapping regions, there was a significant difference between epitopes and nonepitope regions with respect to the proportion of individual comparisons with
dN values greater than
dS values, the proportion with
dS values equal to
dN values, and the proportion with
dS values greater than
dN values (χ
2 = 135.8, 2 df;
P < 0.001) (Fig.
3). Similarly, in overlapping regions, there was a significant difference between epitopes and nonepitope regions with respect to the proportion of individual comparisons with
dN values greater than
dS values, the proportion with
dS values equal to
dN values, and the proportion with
dS values greater than
dN values (χ
2 = 44.5, 2 df;
P < 0.001) (Fig.
3). In each case, the proportions of comparisons with
dN values greater than
dS values and with
dS values greater than
dN values were higher in nonepitopes than in epitopes, while a much higher proportion of comparisons of epitopes showed
dS values that were equal to
dN values (Fig.
3). As noted by Yusim and colleagues (
32), the conservation of CTL epitopes in comparison with nonepitope regions may largely be an artifact due to the process by which CTL epitopes have been identified.
Comparison among the 69 CTL epitope regions showed that within-pair comparisons with
dN values that were greater than
dS values were not equally apportioned among the regions. Rather, certain epitope regions had very high proportions of such comparisons, while in other epitopes,
dN did not exceed
dS in any comparison (Fig.
4A; also see Table S2 in the supplemental material). The mean value for
dN −
dS was significantly different among epitope regions by a one-way analysis of variance (
F68,
1,293 = 1.49;
P = 0.007). A nonparametric Kruskal-Wallis test for differences in the median value for
dN −
dS among epitopes likewise yielded significant results (
P = 0.001). On the basis of these comparisons, we identified 18 epitope regions subject to persistent positive selection, as evidenced by consistently high proportions (>20%) of comparisons with
dN values greater than
dS values (Table
2). Conversely, we identified 10 epitope regions subject to strong constraint at the amino acid level, as evidenced by the absence of comparisons with
dN values that were greater than
dS values (Table
2).
In order to test for convergent evolution of CTL epitopes, we examined the proportion of amino acid sequence differences (including both amino acid replacements and indels) between sister pairs that also occurred in other sister pairs. Of 436 amino acid sequence differences in CTL epitope regions between sister pairs, 148 (33.9%) were convergent. The proportions of convergent differences were similar in nonoverlapping epitope regions (127 of 383 or 33.2%) and in overlapping epitope regions (21 of 53 or 39.6%). The proportions of convergent changes in nonepitope regions were similar: 308 of 1,072 (35.4%) in nonoverlapping regions and 111 of 300 (37.0%) in overlapping regions. None of these proportions were significantly different from one another by χ2 tests.
The proportion of amino acid changes that were convergent differed markedly among epitope regions (Fig.
4B). In several cases, epitope regions with high proportions of comparisons with
dN values greater than
dS values also had high proportions of convergent change. For example, among the epitope regions with the highest proportions of
dN values that were greater than
dS values were regions 8 and 9 of Gag, regions 1 and 14 of Env, and region 2 of Nef (Fig.
4A and Table
2). Each of these epitope regions also showed a high proportion of convergent changes (Fig.
4B). An apparent exception to this trend was CTL epitope region 16 of Gag, which showed no comparisons with
dN values greater than
dS values yet 100% of changes were convergent (Fig.
4); however, in this region, a total of only three amino acid sequence changes were observed, all of which were convergent.
In order to test further the hypothesis that CTL-driven selection has favored amino acid changes in epitopes, we examined the pattern of correlation among variables relating to the nucleotide substitution pattern and variables relating to the amino acid sequence changes in epitopes. Because these variables were intercorrelated in complex ways, we used partial correlation to assess independent associations between a set of independent variables relating to the nucleotide substitution pattern and dependent variables reflecting amino acid changes in epitopes (Table
3). (These analyses were applied to 68 epitopes, because one epitope showed no amino acid difference in within-pair comparisons of any of the 21 sister pairs [Fig.
4B]).
The first dependent variable we examined was the proportion of sequences in the 21 sister pairs that conserved the immunologically defined “best epitope” sequence (Table
3). In the case of this variable, there were highly significant negative partial correlations with
dS values within pairs and with the proportion of within-pair comparisons showing
dN values that were greater than
dS values (Table
3). The correlation with
dS values within pairs implies that epitopes with high mutation rates were more likely to lose the “best epitope” sequence. However, the significant correlation with the proportion of comparisons with
dN values greater than
dS values is evidence that positive Darwinian selection plays a role in loss of the “best epitope” sequence that is independent of the mutation rate.
In addition, we examined partial correlations between the same dependent variable set and the proportion of convergent amino acid sequence differences between sister pairs. In this case, the single significant partial correlation was a positive correlation with the proportion of comparisons with
dN values greater than
dS values (Table
3). This correlation reflects the fact that epitopes with a high proportion of
dN values greater than
dS values tended to have high proportions of convergent change (Fig.
4). It implies that positive selection is a factor enhancing the likelihood of convergent changes at the amino acid level in CTL epitopes.