Skip to main content
Intended for healthcare professionals
Open access
Research article
First published online July 28, 2013

Evolutionary Modeling of Genotype-Phenotype Associations, and Application to primate coding and Non-coding mtDNA Rate Variation

Abstract

Variation in substitution rates across a phylogeny can be indicative of shifts in the evolutionary dynamics of a protein or nonprotein coding regions. One way to understand these signals is to seek the phenotypic correlates of rate variation. Here, we extended a previously published likelihood method designed to detect evolutionary associations between genotypic evolutionary rate and phenotype over a phylogeny. In simulation with two discrete categories of phenotype, the method has a low false-positive rate and detects greater than 80% of true-positives with a tree length of three or greater and a three-fold or greater change in substitution rate given the phenotype. In addition, we successfully extend the test from two to four phenotype categories and evaluated its performance. We then applied the method to two major hypotheses for rate variation in the mitochondrial genome of primates–-longevity and generation time as well as body mass which is correlated with many aspects of life history–-using three categories of phenotype through discretization of continuous values. Similar to previous results for mammals, we find that the majority of mitochondrial protein-coding genes show associations consistent with the longevity and body mass predictions and that the predominant signal of association comes from the third codon position. We also found a significant association between maximum lifespan and the evolutionary rate of the control region of the mtDNA. In contrast, 24 protein-coding genes from the nuclear genome do not show a consistent pattern of association, which is inconsistent with the generation time hypothesis. These results show the extended method can robustly identify genotype-phenotype associations up to at least four phenotypic categories, and demonstrate the successful application of the method to study factors affecting neutral evolutionary rate in protein-coding and non-coding loci.

Introduction

One crucial aspect of molecular evolution is the exploration of variation in the rate of substitution between different lineages or under different evolutionary pressures. Investigating this variation can lead to interesting insights into the adaptation or neutral evolution of a gene.13 Many papers have presented statistical methods that distinguish different types of rate variation, such as variation in substitution rate of a single site among different lineages (heterotachy),46 or variation between sites.7,8
Molecular rate variation with a consistent pattern can be indicative of underlying biological processes.3 Using different phylogenetic models of molecular evolution, some life history traits have been shown to correlate with substitution rate including body mass, time to sexual maturity, and lifespan.912 Other traits that have shown to be correlated with substitution rate include sperm competition1316 and sexual dimorphism, either in mass17 or coloration.18 A set of abiotic variables that may be correlated with rate are environmental conditions such as salinity, temperature, latitude, and elevation.1921
The mitochondrial genome (mtDNA) is known to exhibit substantial rate variation,22 and two prominent hypotheses to explain this are the longevity hypothesis and generation time hypothesis. The longevity hypothesis proposes that there is selection for greater fidelity in the replication mechanism of the mtDNA between an organism's conception and production of its gametes, and thus predicts that shorter-lived species will have a faster rate of neutral mtDNA substitution.23 The longevity hypothesis is also consistent with a prediction for selective advantage for decreased reactive oxygen species (ROS) production and/or more resistant macromolecular components.24
The generation time hypothesis predicts a faster substitution rate for both mtDNA and nuclear DNA in species with shorter time to sexual maturity and thus shorter generation length because it assumes a constant error rate and more replications per geological time.25 Generation time was proposed as an exception to the assumption of the ‘molecular clock’ hypothesis26 of a single rate of molecular evolution between species.27 It originated from the results of DNA hybridization experiments between rodents and artiodactyls, and rodents and primates which found a higher rate of nucleotide substitution measured in time, but not generations.28,29 This was later supported in phylogenetic studies where molecular evolution was faster in rodents than primates, consistent with their generation time differences.30,31
The pattern and causes of mtDNA rate variation in primates are of particular interest, and have been the subject of longstanding debate.32 Much early work was concerned with correcting for primate mtDNA rate variation to estimate divergence times.33,34 The generation time hypothesis is of particular interest for primates as it has been invoked to explain the difference in neutral rate in both nuclear and mtDNA genomes between Old World monkeys and apes,31,33,35 but conflicts with genome-wide estimates that include strepsirhines and New World monkeys.36 In addition, lemurs are known to have a low rate of transitions but a similar rate of transversions to apes and monkeys.37
One set of studies compared the three major hypotheses for the third codon position of cytochrome b (CYTB) from the mtDNA and a few nuclear loci across mammals and birds1012 and was later extended to include an amino acid analysis of the protein-coding regions of the mtDNA for the longevity hypothesis.38 Welch et al9 examined the nine longest protein-coding genes of the mtDNA (at the codon level) in mammals and found similar negative correlations with the synonymous rate and body mass, lifespan, fecundity, and time to sexual maturity. In both approaches, primates have been identified as an outlier group for the longevity and body mass phenotypes, where they either reduce the association with lifespan9,38 or cause the association to be positive with body mass.9 Recently, a Bayesian approach which simultaneously estimates phenotype state, divergence dates, and evolutionary rate also found a negative correlation between the evolution of CYTB, and longevity and body mass, but no association with generation time.39 Considering the results described above, the generation time hypothesis was rejected because of lack of a significant association with nuclear loci, whereas the longevity hypothesis was favored over the body mass association because of a stronger genotype-phenotype correlation. Non-protein-coding portions of mtDNA are predicted to show similar signals to the protein-coding regions, but have not been examined.
For these analyses, the main methods used are phylogenetically corrected correlations between the substitution rate and the relevant variable value (eg, life history, external condition, or phenotype) of the extant taxa. One difficulty with this type of approach is that the two values being compared are not properties of the same part of the tree: the substitution rate is reconstructed on the branches, while information about the phenotype is generally located only at the tips of the tree or reconstructed on internal nodes. This type of pairing may be statistically problematic.
In both cases, these methods require a broad comparison between taxa (ie, mammals) and/or concatenation of genes (or gene groups) in order to to achieve sufficient power.10,40 Sufficient evolutionary time is a requirement in forming comparisons in order to correctly estimate the correlations as the full phylogeny is not used. Thus, these papers did not examine the primate clade on its own because the methods are not well-suited to scaling down to a relatively small number of taxa. Some studies have analyzed body mass and generation time in primates, mostly anthropoids,31,33,34,41 but so far primates have not been specifically tested for the longevity hypothesis.
Recently, a model was implemented that can overcome many of these problems by directly estimating the association between phenotype and substitution rate in a relatively small number of taxa.16 It achieves this by comparing extant discrete phenotypes to extant sequences across a phylogeny and, through the maximum likelihood framework, both genotype and phenotype substitutions are allowed on the branches. Thus, the phenotype can evolve across the tree. The rate of genotypic substitution while under a given phenotype can then be estimated throughout the whole tree as opposed to just the terminal branches giving increased power with limited numbers in a taxonomic group. The method uses a mixture model approach that allows for different sites (potentially codon positions) to associate with either a foreground-associated model or a background-unassociated model.16,42
In this paper, we extend the method and examine its robustness through extensive simulations. We then use the method to explore the longevity and generation time hypotheses as well as body mass in primates. We found that the longevity hypothesis is supported for the third codon position of protein-coding genes, consistent with mammal wide estimates,912 and find a similar result for the non-coding control region. We find no consistent association between the phenotypes tested and nuclear genes, including those that encode mitochondrial proteins. In light of our results, we discuss the effect of the three important phenotypes on rate variation applied to primate evolution.

Methods

Theory

The hybrid substitution models and likelihood ratio test (LRT) first introduced in O'Connor and Mundy16 are used in the present study with some modification. This pair of models uses the coevolutionary model of Pagel43 as inspiration to combine other substitution models for both genotype (for example General Time Reversible (GTR))7,44,45 and discrete phenotype models (such as Mk).46 This allows us to detect differences in genotypic evolutionary rates associated with a phenotype.
The background evolutionary rate is estimated by the independent matrix which assumes a single rate of genotypic substitution irrespective of the phenotypic state. The instantaneous independent Q matrix is given as:
Q I [ ( g i , p i ) , ( g i , p i ) ] + { Q p [ p i , p j ] if g i = g j and p i p j Q p [ g i , g j ] if p i = p j and g i g j 0 if g i g j and p i p j
(1)
For every site in the alignment the Q matrix defines the instantaneous rate of change. In this case there are two inputs and outputs, pi is the input phenotype and pj is the output phenotype, with similar notation for the genotype state of gi changing to gj. Here, we test with nucleotide data so gi and gj can take any of four character states (A, C, G, T). The phenotypic characters can take on the characters 0, 1, … Np − 1 depending on the number of phenotypes tested (Np).
For single changes the rate is calculated based on the underlying rate matrix for the type of change; genotypes use the GTR model and phenotypes use the Mk. For example, if the phenotypic character is the one to change, then the rate of change is defined by phenotypic rate matrix (Qp). Similarly, for a genotypic change Qg, the genotypic rate matrix, is used. Double substitutions are fixed to zero following the arguments of Pagel.43
To construct the null model for our likelihood ratio statistic (LRS), we expand from the basic structure of the independent matrix to include rate heterogeneity among sites. Previous studies have shown that real data can be highly susceptible to this type of variation.8 We use a mixture model approach42 with two different rate matrices, otherwise the time and rate are mathematically confounded. Using the independent matrix to detect the background signal creates a contrast for a different matrix to find the associated or heterogeneous signal.16,42,47
In our null model, we combine an independent rate matrix and an independent Q matrix with a scaling weight parameter (W) that multiplies each of the genotypic rate parameters. All of the branch lengths and rate parameters are equal in the two Q matrices, except for the addition of the scaling parameter in the scaled Q matrix. Also, we add a parameter to estimate the proportion of sites (Pr) coming from the independent Q matrix, with 1 - Pr coming from the scaled Q matrix. Thus, only two additional parameters (the proportion of sites and the scaling weight parameter) are added to allow for a scaled rate heterogeneity of sites.
The dependent matrix is designed similarly to the scaled matrix but takes on scaling parameters corresponding to different phenotypes (Wpi). The Q matrix for the dependent matrix is defined as:
Q D [ ( g i , p i ) , ( g j , p j ) ] + { Q p [ p i , p j ] if g i = g j and p i p j Q g [ g i , g j ] * W p i if p i = p j and g i g j 0 if g i g j and p i p j
(2)
When Wpi is set to one for all phenotypic states, it reduces to the independent model. As described before, the null model is a mixture model between the independent matrix and the scaled matrix. The alternative model is constructed as a mixture model with a proportion of sites from the independent matrix (the background unassociated signal) and the dependent matrix (ie, replaces the scaled Q matrix in the previous formulation). The dependent model is a likelihood approximation as the true likelihood would be calculated conditional on all phenotypic intermediate substitutions. This mathematical simplification does not seem to bias the test for any scenario for which we are aware (see additional file 1).
As a result of the approximation, our LRT is constructed by calculating the approximate LRS (ALRS) (two times the difference in log likelihood from the alternative and null models), which is distributed as a Ç2 distribution with the degrees of freedom being the difference in the number of parameters in the hierarchically related models.4850 In the general case, the degrees of freedom are equal to the number of phenotypes minus one; in the binary phenotype case, the degrees of freedom equals one. The simplification of the dependent matrix does not seem to adversely affect the asymptotic properties of our test statistic (the ALRS), even with heterogeneity among sites (see Fig. S1 of additional file 1).
We constrain all weight parameters (W in the scale matrix and all Wpi in the dependent matrix) to be greater than or equal to one.

Simulations

To test the models we developed a simulation program as used in O'Connor and Mundy16 with one minor modification. Previously, the program normalized the input parameters, which caused a number of difficulties. First, the parameters explicitly set were not the ones used for simulation and thus not the values tested. Second, when calculating the rate matrix values for simulation, a normalized matrix with W0 of 1 was not comparable to the probabilities for the independent matrix. Thus, the background rate simulated in part of the data (W = 1) was not the same in a different portion under the alternative model even though both values of input were the same (W0 = 1). Therefore, we modified the java library PAL51 to prevent it from normalizing the rate matrix before calculating the probabilities used in simulation, since the parameters are chosen for particular purposes.
We ran two main sets of simulations to reevaluate the performance of the method.16 First, we used a binary phenotype with 16 taxa, a strictly bifurcating tree, and concatenated 500 sites for both the independent and dependent matrices for a total of 1000 sites. We set the scale value for phenotype 0 (W0) to one and varied the scale value for phenotype 1 (W1) from 1 (false positive (FP)) to 10 (extreme true-positive (TP)) at every integer. We increased the sum of all branches (tree length) from 1 to 10 at every integer, creating 100 different scenarios with 50 tests each. The alternative model used in the ALRS was constrained with W0 < W1. Second, we tested this specification of the alternative model by simulating data sets with greater numbers of phenotypes. This served the dual purpose of evaluating the use of the ALRS beyond the binary case and testing its performance with more constrained alternative models.
We simulated data with 16 or 32 taxa, a tree length of three, and number of phenotypes ranging from two to four. Further, we tested the difference between a stricter hypothesis (W0 < ··· < W3) and a relaxed hypothesis (W0 ≠ … ≠ W3) under both the null model (weights all equal to 1) and an alternative model (Wx = 2x + 1, for x from 0 to Np − 1 with Np set as two, three, or four). This created 24 scenarios (two numbers of taxa by two hypotheses by three different numbers of phenotypes by a null and alternative model) each tested with 200 simulated data sets.

Primate genotype and phenotype data

We applied the method just described to the question of association between the rate of mtDNA evolution in primates and key life history variables. We collected two forms of data, phenotypic and genotypic. We used the primate phylogenies of Goodman et al52 (Old and New World monkeys and apes) and Horvath et al53 (lemurs and lorisoids).
We collected data on maximum lifespan, time to female sexual maturity, and female body mass to test hypotheses of longevity and generation time and general life history associations, respectively.10,23 These continuous trait values were discretized because our method is based on discrete characters. We split the values into three equal groups based on the distribution of all readily available values for primates and then mapped them back to the taxa we used (Table 1). We used three categories so that they would roughly correspond to lower than average, average, and higher than average in the distribution to account for the two tails of the distributions. Also, as will be shown from our simulation results, this is the greatest number of categories with appreciable power. The data were collected from Smith and Junders54 and Isaac et al55 for female body mass and An Age for time to female sexual maturity and maximum lifespan for most of the species.56,57 Less data were available for time to female sexual maturity and thus we used more genus level assignments. The phenotype data and assignments used are reported in additional file 2.
Table 1. Phenotype data summary statistics.
Variable N P0.33 P0.66 Mean σ
Female body mass (In (grams)) 210 7.11 8.68 7.72 1.46
Time to female sexual maturity (In (days)) 112 6.68 7.29 6.90 0.68
Maximum lifespan (In (days)) 149 9.12 9.43 9.26 0.36
Notes: The three key traits used with the number of primates used to estimate the percentiles (N) and the percentiles used to discretize the variables into three groups. P0.33 and P0.66 are the first third and second third log values of the distribution. The discrete phenotypic mapping was: 0 (x < P0.33), 1 (P0.33 < x < P0.66), and 2 (P0.66 < x)
We retrieved 26 primate mitochondrial genomes from Genbank (see additional file 3) and aligned these sequences using MultiZ genome aligner58,59 with a human sequence as reference and extracted individual coding genes based on their annotation in the human sequence (accession number: NC_001807). Again, based on the human reference, we extracted all the tRNAs and concatenated them into a single alignment. We also retrieved the control region, which is known for its high level of divergence.60,61 We then realigned the control region using the alignment program PRANK with the parameter -F, following the recommendation of the authors.62,63 PRANK was designed to align rapidly divergent and gap-inclined alignments and has been shown to perform well with simulations of insertion-deletion models.64 All alignments were manually evaluated for consistency.
In addition to the 17 mtDNA alignments, we collected 24 alignments of nuclear protein-coding genes where data were available for at least 8 taxa. This data included 6 nuclear-encoded mitochondrial complex IV genes which interact with three mtDNA genes (COX1, COX2, COX3) and should be functionally coevolving.65,66 Accession numbers can be found in additional file 3.
For all three hypotheses (including general life history) we constrained the alternative model to consider only W0 > W1 > W2 as described previously in the simulation section as this formulation is consistent with the directionality predictions from the hypotheses.

Results and Discussion

Simulations

Figure 1 shows our first reevaluation of the method presented in O'Connor and Mundy16 using a revised simulation program. Under this new, more intuitive method of simulation the binary case has a high level of power (>80%) for tree lengths greater than three and a W1 value greater than three. This is important as the results reported earlier required a 10-fold change in molecular rate in order to observe any level of power. The power of the method increases quite dramatically, but similarly to other likelihood methods, discriminating between null and weak alternative models can be difficult.67 Here, a doubling of the rate can still be detected when given enough evolutionary divergence, as measured by tree length. For example, with a tree length of 1, 5, and 10 and W1 = 2 we observe 12%, 40%, and 82% significance, respectively.
Figure 1. Results of simulation for binary phenotype. The Z-axis is percentage significant out of 50 tests done for each permutation of tree length (sum of all branch lengths) and W1. Tree length and W1 take on values from 1 to 10 at each integer. All tests have W0 set to 1, a 16 taxa tree, and 50% sites associated.
In addition, the FP rate should be 5% given our significance threshold and all of our tests are less then or equal to 6% for any given tree length. When averaged across all tree lengths the value was 2.4% for 500 simulations.
Table 2 shows a comparison of the strict and relaxed alternative models. The stricter alternative hypothesis performs better than the relaxed hypothesis and we prefer its use. In the 16 taxa case, all three numbers of phenotypes have a lower FP rate with the strict vs. the relaxed. This signal is not as clear with the 32 taxa case, but is visible at the 0.01 level. In addition, we prefer the stricter alternative hypothesis because when tested under correlated phenotypes, the stricter alternative is more discriminatory (unpublished data).
Table 2. Strict vs. relaxed alternative simulations.
Taxa Hypothesis Np Model χ20.05 χ20.01
16 WiWi+1 2 Null 3.5 0.0
16 Wi < Wi+1 2 Null 2.0 0.0
16 WiWi+1 3 Null 2.5 0.5
16 Wi < Wi+1 3 Null 1.0 0.5
16 WiWi+1 4 Null 0.5 0.0
16 Wi < Wi+1 4 Null 0.0 0.0
16 WiWi+1 2 Alt. 81.0 71.5
16 Wi < Wi+1 2 Alt. 86.5 71.5
16 WiWi+1 3 Alt. 82.5 68.0
16 Wi < Wi+1 3 Alt. 87.5 74.0
16 WiWi+1 4 Alt. 69.5 53.5
16 Wi < Wi+1 4 Alt. 67.0 53.5
32 WiWi+1 2 Null 4.0 1.5
32 Wi < Wi+1 2 Null 4.5 1.0
32 WiWi+1 3 Null 3.5 0.0
32 Wi < Wi+1 3 Null 0.5 0.0
32 WiWi+1 4 Null 0.5 0.5
32 Wi < Wi+1 4 Null 1.5 0.0
32 WiWi+1 2 Alt. 88.0 78.5
32 Wi < Wi+1 2 Alt. 91.5 78.5
32 WiWi+1 3 Alt. 84.0 73.5
32 Wi < Wi+1 3 Alt. 85.5 76.5
32 WiWi+1 4 Alt. 78.5 67.5
32 Wi < Wi+1 4 Alt. 66.0 50.5
Notes: Simulation results from a symmetrical tree for either 16 or 32 taxa. Np are the number of phenotypes being tested simulated under either the null or alternative hypotheses (Alt.). These are examined with a test using a strict (Wi < Wi+1) or relaxed (Wi ≠ 6 Wi+1) alternative hypothesis. The ALRSs are compared against a χ2 with degrees of freedom equal to the number of phenotypes minus one and here are reported as the percentage out of 200 simulated tests that were significant.
The overall power under the alternative simulation scenario for two and three phenotype categories was similar, but in the four category case the percentage of significant TP rate decreased. With four phenotypes the significance of the ALRS is more conservative and future use of the current implementation with four phenotype categories will need to take this into account.

Comparison with other approaches

A few approaches have been developed to address similar questions to those addressed in our method. For instance, following the comparative approach,68 one method is to regress the difference in trait value against a difference in either the number of predicted substitutions40 or the branch length.1012 Welch and Waxman40 estimate the expected number of substitutions along each branch by first breaking a full tree into independent taxa quartets. A quartet contains a pair of sister taxa which are selected based on sufficient divergence to allow for a better estimation of the number substitutions on each branch and rooted by an outgroup pair or single species. Internal contrasts are not examined.
The major method used in Nabholz et al10 recurses down the overall phylogeny based on Genbank's taxonomic classification until the median pairwise distance is less than a specified value for all taxa pairs in the group or the genus level is met. Next, through the use of fossil calibration points and amino acid data the neutral branch lengths for each taxa are calculated. The terminal branch values as calculated through auto-correlation of the rate are then used as an estimate of the species-specific rate and compared to trait values by general linear regression of the phylogenetic contrasts.
Our method uses tip values of nucleotide and phenotype characters and allow both to evolve across the entire phylogeny. The comparative approach, using only terminal comparisons or rate values (as in Welch and Waxman),40 reduces the proportion of an unrooted phylogeny used to estimate the association from 2 * N − 3 branches to N with N taxa.
Some researchers use the terminal branch values as an estimate of evolutionary rate and input this into the full comparative method (including internal contrasts).10,12 This use of the branches may be problematic as it matches parameter estimates from the phylogeny (and hence independent of the phylogeny) and phenotypic values (which are not independent of the phylogeny). Correcting for phylogeny may then overcorrect the substitution rate and lead to low power, thus requiring significant numbers of taxa in order to obtain sufficient contrasts for a regression with power. One exception to this may be the use of root-to-tip measurements of substitution rates as these are not independent of phylogeny and are more properties of the taxa than terminal branches alone.6971
Studies by Wlasiuk and Nachman72 and Tsantes and Steiper73 used this combination of contrasts approach. The analysis proceeds by generating contrasts for each of the genes and then combining the contrasts from each into a single regression to assess significance. In both studies, power was an issue. Further, there may also be a problem with non-independence as each of the contrasts is not independent in the full analysis as they contain the same taxon comparisons from multiple genes.
In contrast, our method does not use a comparative contrasts approach. We do not create a fixed ancestral reconstruction; instead, the maximum likelihood framework accounts for internal evolution of the life history traits as well as the change in substitution. This allows us to use the full phylogeny and thus multiple lineages to estimate the difference in substitution rate between phenotypes74 and model phenotypic evolution concurrently with the change in substitution rate. Thus, our method has the advantage over many other methods of having good power (eg, with single loci) with relatively few taxa.
Our models use discretization of continuous values. As a result of this, the method may have a loss of signal, but this could be beneficial if the trait is not distributed normally.75 Further, our model does not incorporate multiple phenotypes into a single analysis as does multiple regression,912 but requires the distribution of ALRSs to distinguish between different phenotypes (see following discussion of Wilcoxan Rank Test).
A recently published approach uses an LRT of molecular clocks to investigate genotype phenotype associations and has shown significant power.21 The proportion of time any given branch is under alternate binary characters is calculated through a system of probabilistic mapping of phenotypes onto an ultrametrized tree. This proportion is then scaled and expanded based on a single parameter to create a two-rate molecular clock dependent on the phenotype and can form an LRT with a null model of a single molecular clock. This is philosophically similar to our own method, especially as a model of phenotypes substitution is allowed across the tree. Interestingly, given roughly similar parameters the two methods have a comparable power level (when r = 2.6, under the Mayrose and Otto model, it has over 80% power, and when our models has W1/W0 = 3, it also has over 80% power).
While the two models are similar in both formulation and the questions they address, they make different assumptions that will lead to different results and biological interpretations. For example, Mayrose and Otto21 compared our method with theirs on an rRNA gene in daphniids: their model showed a significant signal, while our method did not. This most likely stems from a difference in assumption of how the association is applied to the gene. Our method is designed to detect associations that occur in part of the gene and not in another portion, which is relevant to the neutrality assumptions of the longevity and generation time hypotheses of protein-coding genes. Their method is geared to detect associations between full-length genes, an assumption made for the daphniids. Which condition is more biologically plausible is open to debate and likely depends on the system in question. This difference makes the tools complementary, given the hypothesis of the investigator. For instance, in our analysis of mtDNA rate variation, the prediction is that only neutral sites will be affected, with purifying selection acting as a relative background. The model of Mayrose and Otto is more pertinent for tests of change in the molecular clock acting on all sites, relevant to the situation in rRNA genes where sites are less differentiated (compared to codon positions), and it may have reduced power when only a subset of sites is associated.

Association of life history variables with rate variation in primates Mitochondrial genes

We apply our method to mtDNA rate variation in primates for the two main hypotheses described in the introduction–-longevity and generation time as well as general life history associations (Table 3). After correcting for multiple testing, nine of the thirteen protein-coding regions showed a significant association with maximum lifespan (longevity hypothesis) and eight protein-coding regions were significantly associated with female body mass. In contrast, time to female sexual maturity, relevant to the generation time hypothesis, was only significantly associated with two of the protein-coding genes.
Table 3. Results of mitochondrial coding and non-protein-coding genes under three different phenotypic hypotheses.
Gene N Length Sexual maturity Max. lifespan Body mass
ALRS P-value ALRS P-value ALRS P-value
Protein-co                
ATP6 26 678 1.962 1.00 38.518 1.8 × 10−7** 21.887 7.2 × 10−4**
ATP8 26 207 0.201 1.00 −0.056 1.00 0.935 1.00
COX1 26 1536 1.648 1.00 56.729 2.0 × 10−11** 14.735 2.6 × 10−2*
COX2 26 681 0.485 1.00 0.000 1.00 0.001 1.00
COX3 26 781 0.001 1.00 72.172 8.7 × 10−15** 19.758 2.1 × 10−3**
CYTB 26 1141 1.457 1.00 67.133 1.1 × 10−13** 33.677 2.0 × 10−6**
ND1 26 954 17.741 6.0 × 10−3** 49.468 7.4 × 10−10** 18.118 4.8 × 10−3**
ND2 26 1057 0.033 1.00 58.806 7.0 × 10−12** −0.019 1.00
ND3 26 344 4.394 1.00 39.372 1.2 × 10−7** 27.643 4.1 × 10−5**
ND4 26 1378 −0.037 1.00 115.289 3.8 × 10−24** 14.915 2.4 × 10−2*
ND4L 26 294 0.037 1.00 4.039 1.00 6.657 1.00
ND5 26 1812 21.683 1.0 × 10−3** 192.480 6.5 × 10−41** 39.047 1.4 × 10−7**
ND6 26 528 −0.001 1.00 7.762 0.85 3.120 1.00
Other                
12s rRNA 26 1031 0.015 1.00 −0.023 1.00 −0.001 1.00
16s rRNA 26 1680 0.000 1.00 0.536 1.00 0.602 1.00
tRNA 26 1636 1.646 1.00 0.000 1.00 0.000 1.00
Cont. reg. 24 1746 −0.007 1.00 21.806 1.0 × 10−3** −0.023 1.00
Notes: Time to sexual maturity, maximum lifespan, and body mass are used as indicators for the generation time, longevity, and general life history hypotheses respectively. P-values are corrected by Bonferroni multiple test correction (N = 41). Significant values with a P-value less than 0.05 or 0.01 are signified with * and ** respectively using a χ2 with two degrees of freedom. The control region is signified as “Cont. reg.”.
We examined the hypothesis of third codon positions by calculating the distribution of site ALRSs based on the parameters estimated across the whole gene. Thus, at each site, we calculate the relative support for the null (negative value) or alternative model (positive value). We then selected the codon positions of each gene and summed the site ALRSs.
In Table 5, we report the results of this experiment for our significant protein-coding genes, as identified by the original ALRS. For maximum lifespan and body mass the mtDNA protein-coding genes show a clear signal from the third codon position, implying changes in neutral rates. Comparing the distribution of sum of site ALRS for all 13 mtDNA protein-coding genes by a paired Wilcoxon signed rank test shows that the third codon distribution is significantly different for maximum lifespan for both first (P-value = 7.3 × 10−4) and second codon positions (P-value = 2.4 × 10−4). Similarly, for body mass, third codon positions are significantly different than either first or second positions (first: P-value = 4.9 × 10−4 and second: P-value = 4.9 × 10−4). In contrast, the first and second positions are not significantly different for any phenotype. This final method works to test groups of sites we have selected as a whole, but we do not know which of the third codon positions are associated and which are not, nor if some first or second codon positions are associated as well; we only explore the overall trend from a particular group of sites. Most synonymous substitutions, those that do not alter the protein, are found in the third codon position, consistent with the neutral rate effects predicted by the longevity hypotheses.
The association between neutral substitutions can be further tested by examining the non-protein-coding portions of the mtDNA. 12s rRNA, 16s rRNA, and the tRNA genes were not significant for any of the tested phenotypic associations. One possible explanation for this is that the ribosomal and tRNA genes are strongly conserved and as a result there is relatively low power to detect an association.76 Another explanation is that the association occurs across the entire gene, similar to the results of the rRNA gene of daphniids.21 In contrast, the control region is highly divergent with few distinct classes of sites, but was detected as significantly associated with maximum lifespan. This, along with the results examining the third codon position of protein-coding loci, supports the prediction of the longevity hypothesis that neutral rates of mtDNA substitution are associated with the lifespan of primates.

Nuclear genes

In contrast to the mtDNA, the nuclear genome does not have a consistent pattern of association with the phenotypes (Table 4). 61.5% of the mtDNA protein-coding genes were significantly associated with body mass whereas only 12.5% of the nuclear genes tested were significant. Similarly, maximum lifespan is significantly associated with 69.2% of the mtDNA protein-coding genes and 4.2% of the nuclear genes. Time to sexual maturity shows a low but similar quantity of signal in both the mtDNA and nuclear genes, with 15.3% and 8.3% respectively.
Table 4. Results of nuclear genes under three different phenotypic hypotheses.
Gene N Length Sexual maturity Max. lifespan Body mass
ALRS P- value ALRS P- value ALRS P- value
Male associated                
ACPP 13 1161 −0.105 1.00 2.054 1.00 2.463 1.00
BRCA1 11 2893 −0.005 1.00 −0.012 1.00 23.393 3.4 × 10−4**
CATSPER1 20 2514 13.010 6.1 × 10−2 63.034 8.4 × 10−13** 49.779 6.4 × 10−10**
DBI 18 315 0.000 1.00 0.133 1.00 1.158 1.00
KLK2 12 786 0.230 1.00 0.185 1.00 0.191 1.00
PIP 13 441 −0.035 1.00 −0.001 1.00 −0.001 1.00
PSA 11 786 −0.007 1.00 2.212 1.00 −0.517 1.00
SEMG1 14 2649 15.865 1.5 × 10−2* 4.458 1.00 14.330 3.2 × 10−2*
SEMG2 16 4245 12.635 7.4 × 10−2 0.100 1.00 4.831 1.00
SRY 37 648 1.584 1.0 12.568 7.7 × 10−2 −0.014 1.00
TGM4 15 2055 0.893 1.00 0.000 1.00 0.000 1.00
TMPRSS2 15 1479 0.000 1.00 0.000 1.00 0.000 1.00
VAT1 14 1212 19.557 2.3 × 10−3** 3.687 1.00 −0.005 1.00
ZAN 16 555 0.000 1.00 0.761 1.00 2.460 1.00
Mt complex IV                
COX4 19 438 2.790 1.00 0.251 1.00 0.064 1.00
COX5A 19 459 0.467 1.00 0.000 1.00 0.000 1.00
COX6C 12 228 0.219 1.00 3.247 1.00 −0.225 1.00
COX7A1 10 243 4.499 1.00 7.640 0.90 8.507 0.58
COX7C 12 192 3.528 1.00 4.609 1.00 0.122 1.00
COX8L 19 210 1.070 1.00 0.669 1.00 3.809 1.00
Other                
CD45 14 3963 −0.003 1.00 −0.003 1.00 −0.004 1.00
HBB 8 444 0.784 1.00 1.598 1.00 0.021 1.00
LYZ 26 447 −0.001 1.00 0.002 1.00 0.665 1.00
MC1R 18 954 2.271 1.00 4.287 1.00 0.753 1.00
Notes: Male associated genes are genes expressed in semen, found on the Y chromosome, or implicated in sperm competition. P-values are corrected by Bonferroni multiple test correction (N = 41). Significant values with a P-value less than 0.05 or 0.01 are signified with * and ** respectively using a χ2 with two degrees of freedom.
Two specific molecular functions predominate the nuclear genes we tested: nuclear encoded genes that function in the electron transport chain of the mitochondria and male associated genes.
In the first set, we tested six of the mitochondrial complex IV genes and found no significant associations. Thus, the associations found with mitochondrially-encoded proteins do not extend to mitochondrial proteins encoded in the nuclear genome. Even after all six loci are concatenated, they are not significant for any phenotype tested (uncorrected P-values; body mass = 0.980; maximum lifespan = 0.850; and time to female maturity = 0.743).
The second main functional group of nuclear genes, male associated genes, show the most significant associations with phenotypes for the nuclear genome. These are genes either known to be expressed in the testis, found in the ejaculate, or involved in sperm function. In this case, however, in contrast to the mtDNA genes, the strength of signal from the third codon position is not generally greater than that from the first two codon positions (Table 5). This indicates that some of the associations in male genes may be due to positive selection, which is consistent with the frequent occurrence of adaptive evolution at these loci.14,15,77,78 However, the cause of the specific associations seen with different phenotypes is unclear. Several of these loci are related to sperm competition, and there is a possibility of an association arising indirectly from a correlation between the traits tested and the degree of male competition, but this requires further analysis.
Table 5. ALRS by codon position.
Gene Phenotype Codon position ALRS
1st 2nd 3rd 3rd–-1st–-3rd
ATP6 Body mass 7.06 −15.98 30.81 39.73
COX1 Body mass −30.98 −20.47 66.18 117.63
COX3 Body mass −2.63 −8.89 31.06 42.58
CYTB Body mass 5.16 −33.56 62.09 90.49
ND1 Body mass −13.86 −6.8 38.78 59.44
ND3 Body mass 1.71 2.8 23.07 18.57
ND4 Body mass −22.28 −21.2 58.17 101.65
ND5 Body mass 1.35 −53.1 90.47 142.22
BRCA1 Body mass 6.69 6.5 10.22 −2.97
CATSPER1 Body mass 11.14 25.96 12.68 −24.41
SEMG1 Body mass 4.52 7.48 2.33 −9.67
ATP6 Max. lifespan −9.89 −13.11 61.52 84.52
COX1 Max. lifespan −39.06 −28.49 124.28 191.84
COX3 Max. lifespan −12.11 1.13 82.96 93.94
CYTB Max. lifespan −17.38 −29.70 114.27 161.35
ND1 Max. lifespan −9.61 −48.24 107.32 165.17
ND2 Max. lifespan 2.26 −42.45 98.99 139.18
ND3 Max. lifespan −20.23 3.03 56.57 73.77
ND4 Max. lifespan −11.72 −33.84 160.6 206.15
ND5 Max. lifespan 32.68 −46.87 206.67 220.87
CATSPER1 Max. lifespan 15.07 33.12 14.84 −33.35
ND1 Sexual maturity 18.72 −4.91 3.93 −9.87
ND5 Sexual maturity 19.25 14.96 −12.53 −46.74
SEMG1 Sexual maturity 1.54 10.88 3.44 −8.98
VAT1 Sexual maturity 9.06 −0.14 10.63 1.71
Notes: Break down of the ALRS by codon position for significant results (see Tables 3 and 4). Negative values occur because the parameters are optimized for the entire gene. The final column shows the strength of signal from the third codon position compared to the other two. Genes from the mitochondrial genome are in bold.

Assessing the three phenotypes and their association with mtDNA rate variation in primates

Two of the three phenotypes we tested, body mass and maximum lifespan, are associated with mtDNA rate variation in primates. It is difficult to differentiate between the two significant phenotypes (corresponding to general life history and the longevity hypotheses) because the two traits are highly correlated.25 In our results, the proportion of mtDNA loci supporting the longevity hypothesis (10/17) was very similar to that for the association with body mass (8/17), providing no discrimination between the two. When examining the distribution of ALRSs for each hypothesis, we are able to differentiate between the two. Using a two-tailed paired Wilcoxon Rank Test for the 17 regions of the mtDNA, we find that maximum lifespan is significantly different from body mass (P-value = 6.63 × 10−3; median maximum lifespan 38.518; median body mass 6.657), thus providing more support for a longevity association than of general life history for primates. We now discuss how our results relate to each phenotype in more detail.

Generation time hypothesis

The generation time hypothesis predicts that if a species has shorter intervals between generations then the mutation rate per year will increases with decreased generation time and there will be be greater number of substitutions fixed because of the increased number of genomic replications.35,79 Some studies have found evidence in support of this hypothesis across a variety of taxonomic groups.25,35,7982
The generation time hypothesis predicts that both nuclear and mitochondrial substitution rates should be affected. In contrast, when examining CYTB in mammals, Nabholz et al10 found a correlation with time to female maturity (though not as the main predictor), but not in the the nuclear gene interphotoreceptor retinoid-binding protein exon 1.
Our results provide little support for the generation time hypothesis for primates in either the mtDNA or nuclear genomes. Initially, this may seem surprising as previous studies examining primate rate variation have invoked the generation time hypothesis as an explanation for the rate variation between Old World monkeys and apes (also known as the hominoid rate-slowdown hypothesis).31,35
However, these findings are consistent with what we know about substitution rates in primates. Based on a genome-wide analysis, strepsirhines and New World monkeys are known to have a significantly slower rate, about half that of apes and Old World monkeys.36 Similarly, the mtDNA specifically is known to have a slower rate in lemurs than anthropoids, especially the rate of transitions.37 In contrast, strepsirhines have a faster generation time than apes and monkeys (t-test of log transformed maturity time: P-value = 7.18 × 10−14 for females only;56P-value = 8.42 × 10−16 for a maturity time function of both male and female83) and New World monkeys also have a faster generation time than apes and Old World monkeys (P-value = 1.98 × 10−6;56P-value = 1.85 × 10−5 83). Thus, the generation time hypothesis predicts the following order for rates, from lowest to highest: apes, Old World monkeys, New World monkeys, strepsirhines, but the actual order based on36 is: New World monkeys and strepsirhines, apes, Old World monkeys. This implies some additional trait may be involved in the differences between apes and Old World monkeys, and that the hominoid slowdown hypothesis may have alternative explanations other than generation time. Similarly, if the hominoid slowdown hypothesis is explained by generation time, then some other feature is obscuring the general signal in primates.
Another potential explanation for our lack of significant association is that the generation time signal is associated with longer evolutionary divergence that is too noisy when examining closely related species such as primates. One study examining generation time in invertebrates found that it was the best predictor of rate variation, but over a considerably longer evolutionary time than that considered here.79 Even among mammals, the time may not be sufficient to detect a significant association other than that found in common with other life history variables.9,10,12,39,79
Some caution is necessary in evaluating the generation time hypothesis for primates, because the phenotypic data are less complete with age at female sexual maturity and we had to make a greater number of genus level assignments to generate the taxon mapping of the discretization (see Methods). Further, only two of the 26 taxa were in the smallest phenotypic category. This may have effectively reduced our test from a three category to a two category test but significance still evaluated by a Ç2 distribution with degrees of freedom equal to two. We reexamined this phenotype by reducing the number of categories to two (based on either the median or the P0.66 threshold) and compared it to a Ç2 distribution with one degree of freedom. In both cases, we still failed to achieve any level of significance greater than that reported in Table 3.

Longevity hypothesis

The longevity hypothesis predicts that species where long life is adaptive will show selection on the molecular pathways that mitigate or reduce the production of radical oxidative species and as well as the molecular machinery involved in mtDNA replication. As a result, the selective effect should reduce the rate of mutation in the mtDNA.23
Previous studies have supported this hypothesis in one of two ways: biochemistry and molecular phylogenetic rate variation. Biochemically, it has been shown that long-lived species have reduced rates of reactive oxygen species (ROS) generation and increased resistance to damage in structural components (for reviews see Pamplona and Barja24 and Galtier et al23). In terms of rate variation, the third codon position of CYTB in mammals and birds was shown to be correlated with substitution rate.10,12 Further, maximum lifespan was the strongest predictor of synonymous rate for the nine longest protein-coding genes in the mtDNA and some nuclear encoded genes, even with multiple regressions.9 This association was found across all Mammalia using 160 taxa and concatenated genes (all nine coding genes as a single alignment) in order to obtain sufficient power.9 Here, we have included a larger number of loci (including non-protein-coding mtDNA genes), but a smaller number of taxa and we were still able to detect an association. We only required 26 taxa within a single order and used separate genes to detect the association. Moreover, we were able to examine the control region because of the close evolutionary relationship of primates and because our model uses nucleotide models for the genotype substitution.
Further, these results in primates are important as some studies have found either a lack of association9 (when only examining primates and rodents) or an insignificant mammalian association with primates and a negative association when primates are excluded.38 One possibility is the difference in power between tests, as mentioned previously.
We found a significant association (Table 3) with at least one representative alignment for all gene groups of Galtier et al.38 As previously stated, we analyzed gene by gene not by gene groups. Our results also imply that the primary signal comes from neutral changes rather than positive selection or relaxation of negative selection. This third codon signal would have been missed by Galtier et al38 as that study only examined amino acids, but our results for primates are consistent with their overall conclusion of a negative association of neutral substitution and longevity across mammals for the mtDNA.

General life history and metabolic rate

Body mass is associated with many aspects of life history.8488 We have so far interpreted the body mass association as one of general life history, but many studies have used body mass as a surrogate for metabolic rate.
The metabolic rate hypothesis predicts that organisms with a higher metabolic rate should have increased production of ROS and thus higher mutation rates in the mtDNA.23 Some have hypothesized that the correlation between metabolic rate and substitution rate is general enough to recalibrate the molecular clock.19,89 The metabolic rate hypothesis has had some support within birds and mammals,41,90 but these studies have been criticized for their small data sets.91 A few investigations using larger data sets have failed to find a correlation between mass-specific metabolic and substitution rates, again within birds and mammals,25 and also in invertebrates.91,92 Other papers with large data sets have found a negative correlation between body mass and substitution rate,10,12 but for mammals this is removed when regressed with other life history variables.10
Our results support a connection between body mass and substitution rate that may be separate from the metabolic rate hypothesis. Body mass is a known correlate of many life history and ecological traits8488 and it is plausible that an unknown phenotype, or longevity alone as indicated by the Wilcoxon Test between their ALRSs, is the root correlate.

Conclusions

We have successfully extended the method of16 to account for a greater number of phenotypic categories and tested it on coding and non-coding sequence data from primate mitochondrial and nuclear genomes. Simulation studies showed that the method performs well, with a false-positive rate within acceptable values for a 5% cutoff as well as reasonable power (>80%) when there is greater than a three-fold change in substitution rate and greater than a tree length of three. Further, we have shown that a stricter alternative hypothesis (W1 > W0) is preferred over a more relaxed hypothesis for two or three phenotype categories. In simulation, the number of phenotypic categories was tested up to four and shown to have a low FP rate, but lower power with the four phenotype category case.
When applied to primate data with three phenotypic categories, the method identified many of the genes from the mtDNA as having an association with maximum lifespan and body mass consistent with the longevity hypothesis and general life history respectively. The significant signal predominantly originates from the third codon position for both hypotheses in the mtDNA protein-coding genes, implying a neutral signal on the mtDNA. The control region, which is non-coding and highly divergent, was also significantly associated with maximum lifespan and again supports the prediction of the longevity hypothesis. Those few nuclear encoded genes that were significant have a different pattern of contribution from the codon positions that indicate a separate and unknown explanation for the associations.
Primates had previously been identified as an outlier group among mammals for the maximum lifespan and body mass associations as it was inconsistent with the general mammalian pattern.9,38 Also, some signal of adaptive evolution has been identified in the primate mtDNA.22,93 Examining this pattern with our method, we show that primate mtDNA is evolving in a manner consistent with previous predictors for the phenotypes and that the signal coming from the third codon position is indicative of the neutral rate, which may have been overlooked in some of the previous studies. We find little support for the generation time hypothesis, which is inconsistent with the typical rationale for the hominoid slowdown,31,33,35 but consistent with what we know of substitution rates between apes and Old World monkeys, and New World monkeys and strepsirhine primates.36
Our method is a useful tool in identifying variation in the neutral rate of evolution associated with phenotype. Here, it has detected an association with longevity and body size, and mtDNA rate variation in primates missed by other methods, showing that primates are consistent with other mammals in this respect.

Author Contributions

Conceived and designed the experiments: TDO, NIM. Analyzed the data: TDO. Wrote the first draft of the manuscript: TDO, NIM. Agree with manuscript results and conclusions: TDO, NIM. Made critical revisions and approved final version: TDO, NIM. All authors reviewed and approved of the final manuscript.

Funding

TDO was funded by a Gates Scholarship.

Competing Interests

Author(s) disclose no potential conflicts of interest.

Disclosures and Ethics

As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.

Acknowledgments

We would like to thank Adrian Friday and Stephen Montgomery for useful discussion and comments on a previous draft of this paper and anonymous reviewers for their comments. The simulation and analyzes were done using the CamGrid cluster (http://www.escience.cam.ac.uk/projects/camgrid/) at the University of Cambridge, which is maintained by the Cambridge eScience Center. TDO was supported by the Gates Cambridge Trust and the Overseas Research Student Award. NIM was supported by the Leverhulme Trust.

References

1. Anisimova M., Liberles D.A. The quest for natural selection in the age of comparative genomics. Heredity. 2007; 99: 567–79.
2. Baer C.F., Miyamoto M.M., Denver D.R. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nature Rev Genet. 2007; 8: 619–31.
3. Bromham L. Why do species vary in their rate of molecular evolution? Biol Lett. 2009; 5: 401.
4. Lopez P., Casane D., Philippe H. Heterotachy, an important process of protein evolution. Mol Biol Evol. 2002; 19: 1–7.
5. Lockhart P., Steel M. A tale of two processes. Syst Biol. 2005; 54: 948–51.
6. Zhou Y., Rodrigue N., Lartillot N., Philippe H. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol Biol. 2007; 7: 1471–2148.
7. Yang Z. Estimating the pattern of nucleotide substitution. J Mol Evol. 1994; 39: 105–11.
8. Yang Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996; 11: 367–72.
9. Welch J.J., Bininda-Emonds O.R.P., Bromham L. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evol Biol. 2008; 8: 53.
10. Nabholz B., Glemin S., Galtier N. Strong variations of mitochondrial mutation rate across mammals—the longevity hypothesis. Mol Biol Evol. 2008; 25: 120.
11. Nabholz B., Mauffrey J.F., Bazin E., Galtier N., Glemin S. Determination of mitochondrial genetic diversity in mammals. Genetics. 2008; 178: 351.
12. Nabholz B., Glémin S., Galtier N. The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evol Biol. 2009; 9: 54.
13. Dorus S., Evans P.D., Wyckoff G.J., Choi S.S., Lahn B.T. Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nature Genet. 2004; 36: 1326–9.
14. Hurle B., Swanson W.J.; NISC Comparative Sequencing Program, Green ED. Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage. Genome Res. 2007; 17: 276.
15. Ramm S.A., Oliver P.L., Ponting C.P., Stockley P., Emes R.D. Sexual selection and the adaptive evolution of mammalian ejaculate proteins. Mol Biol Evol. 2008; 25: 207.
16. O'Connor T.D., Mundy N.I. Genotype-phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate. Bioinformatics. 2009; 25: i94–100.
17. Herlyn H., Zischler H. Sequence evolution of the sperm ligand zonadhesin correlates negatively with body weight dimophism in primates. Evolution. 2007; 61: 289–98.
18. Nadeau N.J., Burke T., Mundy N.I. Evolution of an avian pigmentation gene correlates with a measure of sexual selection. Proc R Soc B Lon. 2007; 274.
19. Gillooly J.F., Allen A.P., West G.B., Brown J.H. The rate of DNA evolution: effects of body size and temperature on the molecular clock. Proc Natl Acad Sci. 2005; 102: 140–5.
20. Gillman L.N., Keeling D.J., Ross H.A., Wright S.D. Latitude, elevation and the tempo of molecular evolution in mammals. P Roy Soc B-Biol Sci. 2009.
21. Mayrose I., Otto S.P. A likelihood method for detecting trait-dependent shifts in the rate of molecular evolution. Mol Biol Evol. 2011; 28: 759–70.
22. Pupko T., Galtier N. A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc R Soc B Lon. 2002; 269: 1313–6.
23. Galtier N., Jobson R.W., Nabholz B., Glémin S., Blier P.U. Mitochondrial whims: metabolic rate, longevity and the rate of molecular evolution. Biol Lett. 2009; 5: 413.
24. Pamplona R., Barja G. Highly resistant macromolecular components and low rate of generation of endogenous damage: two key traits of longevity. Ageing Res Rev. 2007; 6: 189–210.
25. Bromham L., Rambaut A., Harvey P.H. Determinants of rate variation in mammalian DNA sequence evolution. J Mol Evol. 1996; 43: 610–21.
26. Li W.H., Tanimura M., Sharp P.M. An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J Mol Evol. 1987; 25: 330–42.
27. Zuckerkandl E., Pauling L. Evolving Genes and Proteins. In: Bryson V., Vogelch H.J., editors. Evolutionary divergence and convergence in proteins. New York: Academic Press; 1965: 97–166.
28. Laird C.D., McConaughy B.L., McCarthy B.J. Rate of fixation of nucleotide substitutions in evolution. Nature. 1969; 224: 149–54.
29. Kohne D.E. Evolution of higher-organism DNA. Quart Rev Biophys. 1970; 3: 327–75.
30. Wu C.I., Li W.H. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci. 1985; 82: 1741–5.
31. Li W.H., Ellsworth D.L., Krushkal J., Chang B.H.J., Hewett-Emmett D. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylo Evol. 1996; 5: 182–7.
32. Tëtushkin E.Y. Rates of molecular evolution of primates. Rus J Genet. 2003; 39: 721–36.
33. Hayasaka K., Gojobori T., Horai S. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol. 1988; 5: 626–44.
34. Hasegawa M. Phylogeny and molecular evolution in primates. Jpn J Genet. 1990; 65: 243–66.
35. Li W.H., Tanimura M. The molecular clock runs more slowly in man than in apes and monkeys. Nature. 1987; 326: 93–6.
36. Steiper M.E., Young N.M. Primate molecular divergence dates. Mol Phylo Evol. 2006; 41: 384–94.
37. Hasegawa M., Kishino H., Hayasaka K., Horai S. Mitochondrial DNA evolution in primates: transition rate has been extremely low in the lemur. J Mol Evol. 1990; 31: 113–21.
38. Galtier N., Blier P.U., Nabholz B. Inverse relationship between longevity and evolutionary rate of mitochondrial proteins in mammals and birds. Mitochondrion. 2009; 9: 51–7.
39. Lartillot N., Poujol R. A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol. 2011; 28: 729–44.
40. Welch J.J., Waxman D. Calculating independent contrasts for the comparative study of substitution rates. J Theor Biol. 2008; 251: 667–78.
41. Martin A.P., Palumbi S.R. Body size, metabolic rate, generation time, and the molecular clock. Proc Natl Acad Sci. 1993; 90: 4087–91.
42. Pagel M., Meade A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol. 2004; 53: 571–81.
43. Pagel M. Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. P Roy Soc B-Biol Sci. 1994; 255: 37–45.
44. Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci. 1986; 17: 57–86.
45. Zharkikh A. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol. 1994; 39: 315–29.
46. Lewis P.O. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst Biol. 2001; 50: 913–25.
47. Yang Z. Computational Molecular Evolution. Oxford University Press, USA; 2006.
48. Silvey S.D. Statistical Inference. Chapman & Hall/CRC; 1975.
49. Kendall M., Stuart A. The Advanced Theory of Statistics. Charles Griffn, London; 1979; 2.
50. Lindgren B.W. Statistical Theory. Chapman & Hall/CRC; 1993.
51. Drummond A., Strimmer K. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics. 2001; 17: 662–3.
52. Goodman M., Grossman L.I., Wildman D.E. Moving primate genomics beyond the chimpanzee genome. Trends Genet. 2005; 21: 511–7.
53. Horvath J.E., Weisrock D.W., Embry S.L.et al. Development and application of a phylogenomic toolkit: Resolving the evolutionary history of Madagascar's lemurs. Genome Res. 2008; 18: 489.
54. Smith R.J., Jungers W.L. Body mass in comparative primatology. J of Human Evol. 1997; 32: 523–59.
55. Isaac N.J.B., Jones K.E., Gittleman J.L., Purvis A. Correlates of species richness in mammals: body size, life history, and ecology. Amer Naturalist. 2005; 165: 600–7.
56. Magalhães J.P., Budovsky A., Lehmann G.et al. The Human Ageing Genomic Resources: online databases and tools for biogerontologists. Aging Cell. 2009; 8: 65–72.
57. Magalhães J.P., Costa J. A database of vertebrate longevity records and their relation to other life-history traits. J Evol Biol. 2009; 22: 1770–4.
58. Schwartz S., Kent W.J., Smit A.et al. Human-mouse alignments with BLASTZ. Genome Res. 2003; 13: 103–7.
59. Blanchette M., Kent W.J., Riemer C.et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004; 14: 708–15.
60. Aquadro C.F., Greenberg B.D. Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics. 1983; 103: 287–312.
61. Cann R.L., Brown W.M., Wilson A.C. Polymorphic sites and the mechanism of evolution in human mitochondrial DNA. Genetics. 1984; 106: 479–99.
62. Loytynoja A., Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci. 2005; 102: 10557–62.
63. Loytynoja A., Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science. 2008; 320: 1632.
64. Fletcher W., Yang Z. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol. 2010; 27: 2257–67.
65. Shoemaker B.A., Panchenko A.R. Deciphering protein–protein interactions. Part I: Experimental techniques and databases. PLoS Comput Biol. 2007; 3: e42.
66. Shoemaker B.A., Panchenko A.R. Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol. 2007; 3: e43.
67. Wong W.S.W., Yang Z., Goldman N., Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004; 168: 1041–51.
68. Felsenstein J. Phylogenies and the comparative method. American Naturalist. 1985; 125: 1.
69. Pagel M. Inferring evolutionary processes from phylogenies. Zoologica Scripta. 1997; 26: 331–48.
70. Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999; 401: 877–84.
71. Freckleton R.P., Harvey P.H., Pagel M. Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat. 2002; 160: 712–26.
72. Wlasiuk G., Nachman M.W. Promiscuity and the rate of molecular evolutionat primate immunity genes. Evolution. 2010; 64: 2204–20.
73. Tsantes C., Steiper M.E. Age at first reproduction explains rate variation in the strepsirrhine molecular clock. Proc Natl Acad Sci. 2009; 106: 18165–70.
74. Schwartz R.S., Mueller R.L. Variation in DNA substitution rates among lineages erroneously inferred from simulated clock-like data. PLoS ONE. 2010; 5: e9649.
75. Ives A.R., Garland T. Jr. Phylogenetic logistic regression for binary dependent variables. Syst Biol. 2010; 59: 9–26.
76. Moritz C., Dowling T.E., Brown W.M. Evolution of animal mitochondrial DNA: relevance for population biology and systematics. Annu Rev Ecol Evol Syst. 1987; 18: 269–92.
77. Pamilo P., O'Neill R.J.W. Evolution of the sry genes. Mol Biol Evol. 1997; 14: 49–55.
78. Clark N.L., Swanson W.J. Pervasive adaptive evolution in primate seminal proteins. PLoS Genet. 2005; 1: e35.
79. Thomas J.A., Welch J.J., Lanfear R., Bromham L. A generation time effect on the rate of molecular evolution in invertebrates. Mol Biol Evol. 2010; 27: 1173–80.
80. Ohta T. An examination of the generation-time effect on molecular evolution. Proc Natl Acad Sci. 1993; 90: 10676–80.
81. Mooers AØ, Harvey P.H. Metabolic rate, generation time, and the rate of molecular evolution in birds. Mol Phylo Evol. 1994; 3: 344–50.
82. Nikolaev S.I., Montoya-Burgos J.I., Popadin K., Parand L., Margulies E.H., Antonarakis S.E. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc Natl Acad Sci. 2007; 104: 20443.
83. Jones K.E., Bielby J., Cardillo M.et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology. 2009; 90: 2648.
84. Millar J.S. Adaptive features of mammalian reproduction. Evolution. 1977; 31: 370–86.
85. Blueweiss L., Fox H., Kudzma V., Nakashima D., Peters R., Sams S. Relationships between body size and some life history parameters. Oecologia. 1978; 37: 257–72.
86. Millar R.M. Life histories of mammals: an analysis of life tables. Ecology. 1983; 64: 631–5.
87. Peters R.H. The Ecological Significance of Body Size. Cambridge University Press, Cambridge, UK; 1983.
88. Calder W.A.L. Size, Function and Life History. Harvard University Press, Cambridge, MA, USA; 1984.
89. Gillooly J.F., Brown J.H., West G.B., Savage V.M., Charnov E.L. Effects of size and temperature on metabolic rate. Science. 2001; 293: 2248.
90. Bleiweiss R. Relative-rate tests and biological causes of molecular evolution in hummingbirds. Mol Biol Evol. 1998; 15: 481–91.
91. Lanfear R., Thomas J.A., Welch J.J., Brey T., Bromham L. Metabolic rate does not calibrate the molecular clock. Proc Natl Acad Sci. 2007; 104: 15388.
92. Thomas J.A., Welch J.J., Woolfit M., Bromham L. There is no universal molecular clock for invertebrates, but rate variation does not scale with body size. Proc Natl Acad Sci. 2006; 103: 7366.
93. Grossman L.I., Wildman D.E., Schmidt T.R., Goodman M. Accelerated evolution of the electron transport chain in anthropoid primates. Trends Genet. 2004; 20: 578–85.

Cite article

Cite article

Cite article

OR

Download to reference manager

If you have citation software installed, you can download article citation data to the citation manager of your choice

Share options

Share

Share this article

Share with email
EMAIL ARTICLE LINK
Share on social media

Share access to this article

Sharing links are not relevant where the article is open access and not available if you do not have a subscription.

For more information view the Sage Journals article sharing page.

Information, rights and permissions

Information

Published In

Article first published online: July 28, 2013
Issue published: January - December 2013

Keywords

  1. genotype-phenotype
  2. generation time hypothesis
  3. longevity hypothesis
  4. mitochondria
  5. primate

Rights and permissions

© 2013 SAGE Publications.
Creative Commons License (CC BY-NC 3.0)
This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 3.0 License (http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
Request permissions for this article.
PubMed: 23926418

Authors

Affiliations

Timothy D. O'Connor
Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA.
Nicholas I. Mundy
Department of Zoology, Downing Street, University of Cambridge, Cambridge CB2 3EJ, UK.

Notes

This article is available from http://www.la-press.com.

Metrics and citations

Metrics

Journals metrics

This article was published in Evolutionary Bioinformatics.

VIEW ALL JOURNAL METRICS

Article usage*

Total views and downloads: 313

*Article usage tracking started in December 2016


Altmetric

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores



Articles citing this one

Receive email alerts when this article is cited

Web of Science: 7 view articles Opens in new tab

Crossref: 0

  1. PhyloAcc-GT: A Bayesian Method for Inferring Patterns of Substitution ...
    Go to citation Crossref Google Scholar
  2. Genes underlying the evolution of tetrapod testes size
    Go to citation Crossref Google Scholar
  3. Noncoding regions underpin avian bill shape diversification at macroev...
    Go to citation Crossref Google Scholar
  4. Biological Processes Modulating Longevity across Primates: A Phylogene...
    Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

Access options

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:


Alternatively, view purchase options below:

Access journal content via a DeepDyve subscription or find out more about this option.