New content
Evolutionary Bioinformatics

Open access

Research article

First published online July 28, 2013

Evolutionary Modeling of Genotype-Phenotype Associations, and Application to primate coding and Non-coding mtDNA Rate Variation

Timothy D. O'Connor [email protected] and Nicholas I. MundyView all authors and affiliations

All Articles

Abstract

Variation in substitution rates across a phylogeny can be indicative of shifts in the evolutionary dynamics of a protein or nonprotein coding regions. One way to understand these signals is to seek the phenotypic correlates of rate variation. Here, we extended a previously published likelihood method designed to detect evolutionary associations between genotypic evolutionary rate and phenotype over a phylogeny. In simulation with two discrete categories of phenotype, the method has a low false-positive rate and detects greater than 80% of true-positives with a tree length of three or greater and a three-fold or greater change in substitution rate given the phenotype. In addition, we successfully extend the test from two to four phenotype categories and evaluated its performance. We then applied the method to two major hypotheses for rate variation in the mitochondrial genome of primates–-longevity and generation time as well as body mass which is correlated with many aspects of life history–-using three categories of phenotype through discretization of continuous values. Similar to previous results for mammals, we find that the majority of mitochondrial protein-coding genes show associations consistent with the longevity and body mass predictions and that the predominant signal of association comes from the third codon position. We also found a significant association between maximum lifespan and the evolutionary rate of the control region of the mtDNA. In contrast, 24 protein-coding genes from the nuclear genome do not show a consistent pattern of association, which is inconsistent with the generation time hypothesis. These results show the extended method can robustly identify genotype-phenotype associations up to at least four phenotypic categories, and demonstrate the successful application of the method to study factors affecting neutral evolutionary rate in protein-coding and non-coding loci.

Introduction

One crucial aspect of molecular evolution is the exploration of variation in the rate of substitution between different lineages or under different evolutionary pressures. Investigating this variation can lead to interesting insights into the adaptation or neutral evolution of a gene.^1–3 Many papers have presented statistical methods that distinguish different types of rate variation, such as variation in substitution rate of a single site among different lineages (heterotachy),^4–6 or variation between sites.^7,8

Molecular rate variation with a consistent pattern can be indicative of underlying biological processes.³ Using different phylogenetic models of molecular evolution, some life history traits have been shown to correlate with substitution rate including body mass, time to sexual maturity, and lifespan.^9–12 Other traits that have shown to be correlated with substitution rate include sperm competition^13–16 and sexual dimorphism, either in mass¹⁷ or coloration.¹⁸ A set of abiotic variables that may be correlated with rate are environmental conditions such as salinity, temperature, latitude, and elevation.^19–21

The mitochondrial genome (mtDNA) is known to exhibit substantial rate variation,²² and two prominent hypotheses to explain this are the longevity hypothesis and generation time hypothesis. The longevity hypothesis proposes that there is selection for greater fidelity in the replication mechanism of the mtDNA between an organism's conception and production of its gametes, and thus predicts that shorter-lived species will have a faster rate of neutral mtDNA substitution.²³ The longevity hypothesis is also consistent with a prediction for selective advantage for decreased reactive oxygen species (ROS) production and/or more resistant macromolecular components.²⁴

The generation time hypothesis predicts a faster substitution rate for both mtDNA and nuclear DNA in species with shorter time to sexual maturity and thus shorter generation length because it assumes a constant error rate and more replications per geological time.²⁵ Generation time was proposed as an exception to the assumption of the ‘molecular clock’ hypothesis²⁶ of a single rate of molecular evolution between species.²⁷ It originated from the results of DNA hybridization experiments between rodents and artiodactyls, and rodents and primates which found a higher rate of nucleotide substitution measured in time, but not generations.^28,29 This was later supported in phylogenetic studies where molecular evolution was faster in rodents than primates, consistent with their generation time differences.^30,31

The pattern and causes of mtDNA rate variation in primates are of particular interest, and have been the subject of longstanding debate.³² Much early work was concerned with correcting for primate mtDNA rate variation to estimate divergence times.^33,34 The generation time hypothesis is of particular interest for primates as it has been invoked to explain the difference in neutral rate in both nuclear and mtDNA genomes between Old World monkeys and apes,^31,33,35 but conflicts with genome-wide estimates that include strepsirhines and New World monkeys.³⁶ In addition, lemurs are known to have a low rate of transitions but a similar rate of transversions to apes and monkeys.³⁷

One set of studies compared the three major hypotheses for the third codon position of cytochrome b (CYTB) from the mtDNA and a few nuclear loci across mammals and birds^10–12 and was later extended to include an amino acid analysis of the protein-coding regions of the mtDNA for the longevity hypothesis.³⁸ Welch et al⁹ examined the nine longest protein-coding genes of the mtDNA (at the codon level) in mammals and found similar negative correlations with the synonymous rate and body mass, lifespan, fecundity, and time to sexual maturity. In both approaches, primates have been identified as an outlier group for the longevity and body mass phenotypes, where they either reduce the association with lifespan^9,38 or cause the association to be positive with body mass.⁹ Recently, a Bayesian approach which simultaneously estimates phenotype state, divergence dates, and evolutionary rate also found a negative correlation between the evolution of CYTB, and longevity and body mass, but no association with generation time.³⁹ Considering the results described above, the generation time hypothesis was rejected because of lack of a significant association with nuclear loci, whereas the longevity hypothesis was favored over the body mass association because of a stronger genotype-phenotype correlation. Non-protein-coding portions of mtDNA are predicted to show similar signals to the protein-coding regions, but have not been examined.

For these analyses, the main methods used are phylogenetically corrected correlations between the substitution rate and the relevant variable value (eg, life history, external condition, or phenotype) of the extant taxa. One difficulty with this type of approach is that the two values being compared are not properties of the same part of the tree: the substitution rate is reconstructed on the branches, while information about the phenotype is generally located only at the tips of the tree or reconstructed on internal nodes. This type of pairing may be statistically problematic.

In both cases, these methods require a broad comparison between taxa (ie, mammals) and/or concatenation of genes (or gene groups) in order to to achieve sufficient power.^10,40 Sufficient evolutionary time is a requirement in forming comparisons in order to correctly estimate the correlations as the full phylogeny is not used. Thus, these papers did not examine the primate clade on its own because the methods are not well-suited to scaling down to a relatively small number of taxa. Some studies have analyzed body mass and generation time in primates, mostly anthropoids,^31,33,34,41 but so far primates have not been specifically tested for the longevity hypothesis.

Recently, a model was implemented that can overcome many of these problems by directly estimating the association between phenotype and substitution rate in a relatively small number of taxa.¹⁶ It achieves this by comparing extant discrete phenotypes to extant sequences across a phylogeny and, through the maximum likelihood framework, both genotype and phenotype substitutions are allowed on the branches. Thus, the phenotype can evolve across the tree. The rate of genotypic substitution while under a given phenotype can then be estimated throughout the whole tree as opposed to just the terminal branches giving increased power with limited numbers in a taxonomic group. The method uses a mixture model approach that allows for different sites (potentially codon positions) to associate with either a foreground-associated model or a background-unassociated model.^16,42

In this paper, we extend the method and examine its robustness through extensive simulations. We then use the method to explore the longevity and generation time hypotheses as well as body mass in primates. We found that the longevity hypothesis is supported for the third codon position of protein-coding genes, consistent with mammal wide estimates,^9–12 and find a similar result for the non-coding control region. We find no consistent association between the phenotypes tested and nuclear genes, including those that encode mitochondrial proteins. In light of our results, we discuss the effect of the three important phenotypes on rate variation applied to primate evolution.

Methods

Theory

The hybrid substitution models and likelihood ratio test (LRT) first introduced in O'Connor and Mundy¹⁶ are used in the present study with some modification. This pair of models uses the coevolutionary model of Pagel⁴³ as inspiration to combine other substitution models for both genotype (for example General Time Reversible (GTR))^7,44,45 and discrete phenotype models (such as Mk).⁴⁶ This allows us to detect differences in genotypic evolutionary rates associated with a phenotype.

The background evolutionary rate is estimated by the independent matrix which assumes a single rate of genotypic substitution irrespective of the phenotypic state. The instantaneous independent Q matrix is given as:

Q_{I} [(g_{i}, p_{i}), (g_{i}, p_{i})] + {\begin{array}{l} Q_{p} [p_{i}, p_{j}] & if g_{i} = g_{j} and p_{i} \neq p_{j} \\ Q_{p} [g_{i}, g_{j}] & if p_{i} = p_{j} and g_{i} \neq g_{j} \\ 0 & if g_{i} \neq g_{j} and p_{i} \neq p_{j} \end{array}

(1)

For every site in the alignment the Q matrix defines the instantaneous rate of change. In this case there are two inputs and outputs, p_i is the input phenotype and p_j is the output phenotype, with similar notation for the genotype state of g_i changing to g_j. Here, we test with nucleotide data so g_i and g_j can take any of four character states (A, C, G, T). The phenotypic characters can take on the characters 0, 1, … N_p − 1 depending on the number of phenotypes tested (N_p).

For single changes the rate is calculated based on the underlying rate matrix for the type of change; genotypes use the GTR model and phenotypes use the Mk. For example, if the phenotypic character is the one to change, then the rate of change is defined by phenotypic rate matrix (Q_p). Similarly, for a genotypic change Q_g, the genotypic rate matrix, is used. Double substitutions are fixed to zero following the arguments of Pagel.⁴³

To construct the null model for our likelihood ratio statistic (LRS), we expand from the basic structure of the independent matrix to include rate heterogeneity among sites. Previous studies have shown that real data can be highly susceptible to this type of variation.⁸ We use a mixture model approach⁴² with two different rate matrices, otherwise the time and rate are mathematically confounded. Using the independent matrix to detect the background signal creates a contrast for a different matrix to find the associated or heterogeneous signal.^16,42,47

In our null model, we combine an independent rate matrix and an independent Q matrix with a scaling weight parameter (W) that multiplies each of the genotypic rate parameters. All of the branch lengths and rate parameters are equal in the two Q matrices, except for the addition of the scaling parameter in the scaled Q matrix. Also, we add a parameter to estimate the proportion of sites (Pr) coming from the independent Q matrix, with 1 - Pr coming from the scaled Q matrix. Thus, only two additional parameters (the proportion of sites and the scaling weight parameter) are added to allow for a scaled rate heterogeneity of sites.

The dependent matrix is designed similarly to the scaled matrix but takes on scaling parameters corresponding to different phenotypes (W_pi). The Q matrix for the dependent matrix is defined as:

Q_{D} [(g_{i}, p_{i}), (g_{j}, p_{j})] + {\begin{array}{l} Q_{p} [p_{i}, p_{j}] & if g_{i} = g_{j} and p_{i} \neq p_{j} \\ Q_{g} [g_{i}, g_{j}] * W_{p i} & if p_{i} = p_{j} and g_{i} \neq g_{j} \\ 0 & if g_{i} \neq g_{j} and p_{i} \neq p_{j} \end{array}

(2)

When W_pi is set to one for all phenotypic states, it reduces to the independent model. As described before, the null model is a mixture model between the independent matrix and the scaled matrix. The alternative model is constructed as a mixture model with a proportion of sites from the independent matrix (the background unassociated signal) and the dependent matrix (ie, replaces the scaled Q matrix in the previous formulation). The dependent model is a likelihood approximation as the true likelihood would be calculated conditional on all phenotypic intermediate substitutions. This mathematical simplification does not seem to bias the test for any scenario for which we are aware (see additional file 1).

As a result of the approximation, our LRT is constructed by calculating the approximate LRS (ALRS) (two times the difference in log likelihood from the alternative and null models), which is distributed as a Ç² distribution with the degrees of freedom being the difference in the number of parameters in the hierarchically related models.^48–50 In the general case, the degrees of freedom are equal to the number of phenotypes minus one; in the binary phenotype case, the degrees of freedom equals one. The simplification of the dependent matrix does not seem to adversely affect the asymptotic properties of our test statistic (the ALRS), even with heterogeneity among sites (see Fig. S1 of additional file 1).

We constrain all weight parameters (W in the scale matrix and all W_pi in the dependent matrix) to be greater than or equal to one.

Simulations

To test the models we developed a simulation program as used in O'Connor and Mundy¹⁶ with one minor modification. Previously, the program normalized the input parameters, which caused a number of difficulties. First, the parameters explicitly set were not the ones used for simulation and thus not the values tested. Second, when calculating the rate matrix values for simulation, a normalized matrix with W₀ of 1 was not comparable to the probabilities for the independent matrix. Thus, the background rate simulated in part of the data (W = 1) was not the same in a different portion under the alternative model even though both values of input were the same (W₀ = 1). Therefore, we modified the java library PAL⁵¹ to prevent it from normalizing the rate matrix before calculating the probabilities used in simulation, since the parameters are chosen for particular purposes.

We ran two main sets of simulations to reevaluate the performance of the method.¹⁶ First, we used a binary phenotype with 16 taxa, a strictly bifurcating tree, and concatenated 500 sites for both the independent and dependent matrices for a total of 1000 sites. We set the scale value for phenotype 0 (W₀) to one and varied the scale value for phenotype 1 (W₁) from 1 (false positive (FP)) to 10 (extreme true-positive (TP)) at every integer. We increased the sum of all branches (tree length) from 1 to 10 at every integer, creating 100 different scenarios with 50 tests each. The alternative model used in the ALRS was constrained with W₀ < W₁. Second, we tested this specification of the alternative model by simulating data sets with greater numbers of phenotypes. This served the dual purpose of evaluating the use of the ALRS beyond the binary case and testing its performance with more constrained alternative models.

We simulated data with 16 or 32 taxa, a tree length of three, and number of phenotypes ranging from two to four. Further, we tested the difference between a stricter hypothesis (W₀ < ··· < W₃) and a relaxed hypothesis (W₀ ≠ … ≠ W₃) under both the null model (weights all equal to 1) and an alternative model (W_x = 2x + 1, for x from 0 to N_p − 1 with N_p set as two, three, or four). This created 24 scenarios (two numbers of taxa by two hypotheses by three different numbers of phenotypes by a null and alternative model) each tested with 200 simulated data sets.

Primate genotype and phenotype data

We applied the method just described to the question of association between the rate of mtDNA evolution in primates and key life history variables. We collected two forms of data, phenotypic and genotypic. We used the primate phylogenies of Goodman et al⁵² (Old and New World monkeys and apes) and Horvath et al⁵³ (lemurs and lorisoids).

We collected data on maximum lifespan, time to female sexual maturity, and female body mass to test hypotheses of longevity and generation time and general life history associations, respectively.^10,23 These continuous trait values were discretized because our method is based on discrete characters. We split the values into three equal groups based on the distribution of all readily available values for primates and then mapped them back to the taxa we used (Table 1). We used three categories so that they would roughly correspond to lower than average, average, and higher than average in the distribution to account for the two tails of the distributions. Also, as will be shown from our simulation results, this is the greatest number of categories with appreciable power. The data were collected from Smith and Junders⁵⁴ and Isaac et al⁵⁵ for female body mass and An Age for time to female sexual maturity and maximum lifespan for most of the species.^56,57 Less data were available for time to female sexual maturity and thus we used more genus level assignments. The phenotype data and assignments used are reported in additional file 2.

Table 1. Phenotype data summary statistics.

Variable	N	P_0.33	P_0.66	Mean	σ
Female body mass (In (grams))	210	7.11	8.68	7.72	1.46
Time to female sexual maturity (In (days))	112	6.68	7.29	6.90	0.68
Maximum lifespan (In (days))	149	9.12	9.43	9.26	0.36

Notes: The three key traits used with the number of primates used to estimate the percentiles (N) and the percentiles used to discretize the variables into three groups. P_0.33 and P_0.66 are the first third and second third log values of the distribution. The discrete phenotypic mapping was: 0 (x < P_0.33), 1 (P_0.33 < x < P_0.66), and 2 (P_0.66 < x)

We retrieved 26 primate mitochondrial genomes from Genbank (see additional file 3) and aligned these sequences using MultiZ genome aligner^58,59 with a human sequence as reference and extracted individual coding genes based on their annotation in the human sequence (accession number: NC_001807). Again, based on the human reference, we extracted all the tRNAs and concatenated them into a single alignment. We also retrieved the control region, which is known for its high level of divergence.^60,61 We then realigned the control region using the alignment program PRANK with the parameter -F, following the recommendation of the authors.^62,63 PRANK was designed to align rapidly divergent and gap-inclined alignments and has been shown to perform well with simulations of insertion-deletion models.⁶⁴ All alignments were manually evaluated for consistency.

In addition to the 17 mtDNA alignments, we collected 24 alignments of nuclear protein-coding genes where data were available for at least 8 taxa. This data included 6 nuclear-encoded mitochondrial complex IV genes which interact with three mtDNA genes (COX1, COX2, COX3) and should be functionally coevolving.^65,66 Accession numbers can be found in additional file 3.

For all three hypotheses (including general life history) we constrained the alternative model to consider only W₀ > W₁ > W₂ as described previously in the simulation section as this formulation is consistent with the directionality predictions from the hypotheses.

Results and Discussion

Simulations

Figure 1 shows our first reevaluation of the method presented in O'Connor and Mundy¹⁶ using a revised simulation program. Under this new, more intuitive method of simulation the binary case has a high level of power (>80%) for tree lengths greater than three and a W₁ value greater than three. This is important as the results reported earlier required a 10-fold change in molecular rate in order to observe any level of power. The power of the method increases quite dramatically, but similarly to other likelihood methods, discriminating between null and weak alternative models can be difficult.⁶⁷ Here, a doubling of the rate can still be detected when given enough evolutionary divergence, as measured by tree length. For example, with a tree length of 1, 5, and 10 and W₁ = 2 we observe 12%, 40%, and 82% significance, respectively.

Figure 1. Results of simulation for binary phenotype. The Z-axis is percentage significant out of 50 tests done for each permutation of tree length (sum of all branch lengths) and W₁. Tree length and W₁ take on values from 1 to 10 at each integer. All tests have W₀ set to 1, a 16 taxa tree, and 50% sites associated.

In addition, the FP rate should be 5% given our significance threshold and all of our tests are less then or equal to 6% for any given tree length. When averaged across all tree lengths the value was 2.4% for 500 simulations.

Table 2 shows a comparison of the strict and relaxed alternative models. The stricter alternative hypothesis performs better than the relaxed hypothesis and we prefer its use. In the 16 taxa case, all three numbers of phenotypes have a lower FP rate with the strict vs. the relaxed. This signal is not as clear with the 32 taxa case, but is visible at the 0.01 level. In addition, we prefer the stricter alternative hypothesis because when tested under correlated phenotypes, the stricter alternative is more discriminatory (unpublished data).

Table 2. Strict vs. relaxed alternative simulations.

Taxa	Hypothesis	N_p	Model	χ²_0.05	χ²_0.01
16	W_i ≠ W_i+1	2	Null	3.5	0.0
16	W_i < W_i+1	2	Null	2.0	0.0
16	W_i ≠ W_i+1	3	Null	2.5	0.5
16	W_i < W_i+1	3	Null	1.0	0.5
16	W_i ≠ W_i+1	4	Null	0.5	0.0
16	W_i < W_i+1	4	Null	0.0	0.0
16	W_i ≠ W_i+1	2	Alt.	81.0	71.5
16	W_i < W_i+1	2	Alt.	86.5	71.5
16	W_i ≠ W_i+1	3	Alt.	82.5	68.0
16	W_i < W_i+1	3	Alt.	87.5	74.0
16	W_i ≠ W_i+1	4	Alt.	69.5	53.5
16	W_i < W_i+1	4	Alt.	67.0	53.5
32	W_i ≠ W_i+1	2	Null	4.0	1.5
32	W_i < W_i+1	2	Null	4.5	1.0
32	W_i ≠ W_i+1	3	Null	3.5	0.0
32	W_i < W_i+1	3	Null	0.5	0.0
32	W_i ≠ W_i+1	4	Null	0.5	0.5
32	W_i < W_i+1	4	Null	1.5	0.0
32	W_i ≠ W_i+1	2	Alt.	88.0	78.5
32	W_i < W_i+1	2	Alt.	91.5	78.5
32	W_i ≠ W_i+1	3	Alt.	84.0	73.5
32	W_i < W_i+1	3	Alt.	85.5	76.5
32	W_i ≠ W_i+1	4	Alt.	78.5	67.5
32	W_i < W_i+1	4	Alt.	66.0	50.5

Notes: Simulation results from a symmetrical tree for either 16 or 32 taxa. N_p are the number of phenotypes being tested simulated under either the null or alternative hypotheses (Alt.). These are examined with a test using a strict (W_i < W_i+1) or relaxed (W_i ≠ 6 W_i+1) alternative hypothesis. The ALRSs are compared against a χ² with degrees of freedom equal to the number of phenotypes minus one and here are reported as the percentage out of 200 simulated tests that were significant.

The overall power under the alternative simulation scenario for two and three phenotype categories was similar, but in the four category case the percentage of significant TP rate decreased. With four phenotypes the significance of the ALRS is more conservative and future use of the current implementation with four phenotype categories will need to take this into account.

Comparison with other approaches

A few approaches have been developed to address similar questions to those addressed in our method. For instance, following the comparative approach,⁶⁸ one method is to regress the difference in trait value against a difference in either the number of predicted substitutions⁴⁰ or the branch length.^10–12 Welch and Waxman⁴⁰ estimate the expected number of substitutions along each branch by first breaking a full tree into independent taxa quartets. A quartet contains a pair of sister taxa which are selected based on sufficient divergence to allow for a better estimation of the number substitutions on each branch and rooted by an outgroup pair or single species. Internal contrasts are not examined.

The major method used in Nabholz et al¹⁰ recurses down the overall phylogeny based on Genbank's taxonomic classification until the median pairwise distance is less than a specified value for all taxa pairs in the group or the genus level is met. Next, through the use of fossil calibration points and amino acid data the neutral branch lengths for each taxa are calculated. The terminal branch values as calculated through auto-correlation of the rate are then used as an estimate of the species-specific rate and compared to trait values by general linear regression of the phylogenetic contrasts.

Our method uses tip values of nucleotide and phenotype characters and allow both to evolve across the entire phylogeny. The comparative approach, using only terminal comparisons or rate values (as in Welch and Waxman),⁴⁰ reduces the proportion of an unrooted phylogeny used to estimate the association from 2 * N − 3 branches to N with N taxa.

Some researchers use the terminal branch values as an estimate of evolutionary rate and input this into the full comparative method (including internal contrasts).^10,12 This use of the branches may be problematic as it matches parameter estimates from the phylogeny (and hence independent of the phylogeny) and phenotypic values (which are not independent of the phylogeny). Correcting for phylogeny may then overcorrect the substitution rate and lead to low power, thus requiring significant numbers of taxa in order to obtain sufficient contrasts for a regression with power. One exception to this may be the use of root-to-tip measurements of substitution rates as these are not independent of phylogeny and are more properties of the taxa than terminal branches alone.^69–71

Studies by Wlasiuk and Nachman⁷² and Tsantes and Steiper⁷³ used this combination of contrasts approach. The analysis proceeds by generating contrasts for each of the genes and then combining the contrasts from each into a single regression to assess significance. In both studies, power was an issue. Further, there may also be a problem with non-independence as each of the contrasts is not independent in the full analysis as they contain the same taxon comparisons from multiple genes.

In contrast, our method does not use a comparative contrasts approach. We do not create a fixed ancestral reconstruction; instead, the maximum likelihood framework accounts for internal evolution of the life history traits as well as the change in substitution. This allows us to use the full phylogeny and thus multiple lineages to estimate the difference in substitution rate between phenotypes⁷⁴ and model phenotypic evolution concurrently with the change in substitution rate. Thus, our method has the advantage over many other methods of having good power (eg, with single loci) with relatively few taxa.

Our models use discretization of continuous values. As a result of this, the method may have a loss of signal, but this could be beneficial if the trait is not distributed normally.⁷⁵ Further, our model does not incorporate multiple phenotypes into a single analysis as does multiple regression,^9–12 but requires the distribution of ALRSs to distinguish between different phenotypes (see following discussion of Wilcoxan Rank Test).

A recently published approach uses an LRT of molecular clocks to investigate genotype phenotype associations and has shown significant power.²¹ The proportion of time any given branch is under alternate binary characters is calculated through a system of probabilistic mapping of phenotypes onto an ultrametrized tree. This proportion is then scaled and expanded based on a single parameter to create a two-rate molecular clock dependent on the phenotype and can form an LRT with a null model of a single molecular clock. This is philosophically similar to our own method, especially as a model of phenotypes substitution is allowed across the tree. Interestingly, given roughly similar parameters the two methods have a comparable power level (when r = 2.6, under the Mayrose and Otto model, it has over 80% power, and when our models has W₁/W₀ = 3, it also has over 80% power).

While the two models are similar in both formulation and the questions they address, they make different assumptions that will lead to different results and biological interpretations. For example, Mayrose and Otto²¹ compared our method with theirs on an rRNA gene in daphniids: their model showed a significant signal, while our method did not. This most likely stems from a difference in assumption of how the association is applied to the gene. Our method is designed to detect associations that occur in part of the gene and not in another portion, which is relevant to the neutrality assumptions of the longevity and generation time hypotheses of protein-coding genes. Their method is geared to detect associations between full-length genes, an assumption made for the daphniids. Which condition is more biologically plausible is open to debate and likely depends on the system in question. This difference makes the tools complementary, given the hypothesis of the investigator. For instance, in our analysis of mtDNA rate variation, the prediction is that only neutral sites will be affected, with purifying selection acting as a relative background. The model of Mayrose and Otto is more pertinent for tests of change in the molecular clock acting on all sites, relevant to the situation in rRNA genes where sites are less differentiated (compared to codon positions), and it may have reduced power when only a subset of sites is associated.

Association of life history variables with rate variation in primates Mitochondrial genes

We apply our method to mtDNA rate variation in primates for the two main hypotheses described in the introduction–-longevity and generation time as well as general life history associations (Table 3). After correcting for multiple testing, nine of the thirteen protein-coding regions showed a significant association with maximum lifespan (longevity hypothesis) and eight protein-coding regions were significantly associated with female body mass. In contrast, time to female sexual maturity, relevant to the generation time hypothesis, was only significantly associated with two of the protein-coding genes.

Table 3. Results of mitochondrial coding and non-protein-coding genes under three different phenotypic hypotheses.

Gene	N	Length	Sexual maturity		Max. lifespan		Body mass
Gene	N	Length	ALRS	P-value	ALRS	P-value	ALRS	P-value
Protein-co
ATP6	26	678	1.962	1.00	38.518	1.8 × 10⁻⁷**	21.887	7.2 × 10⁻⁴**
ATP8	26	207	0.201	1.00	−0.056	1.00	0.935	1.00
COX1	26	1536	1.648	1.00	56.729	2.0 × 10⁻¹¹**	14.735	2.6 × 10⁻²*
COX2	26	681	0.485	1.00	0.000	1.00	0.001	1.00
COX3	26	781	0.001	1.00	72.172	8.7 × 10⁻¹⁵**	19.758	2.1 × 10⁻³**
CYTB	26	1141	1.457	1.00	67.133	1.1 × 10⁻¹³**	33.677	2.0 × 10⁻⁶**
ND1	26	954	17.741	6.0 × 10⁻³**	49.468	7.4 × 10⁻¹⁰**	18.118	4.8 × 10⁻³**
ND2	26	1057	0.033	1.00	58.806	7.0 × 10⁻¹²**	−0.019	1.00
ND3	26	344	4.394	1.00	39.372	1.2 × 10⁻⁷**	27.643	4.1 × 10⁻⁵**
ND4	26	1378	−0.037	1.00	115.289	3.8 × 10⁻²⁴**	14.915	2.4 × 10⁻²*
ND4L	26	294	0.037	1.00	4.039	1.00	6.657	1.00
ND5	26	1812	21.683	1.0 × 10⁻³**	192.480	6.5 × 10⁻⁴¹**	39.047	1.4 × 10⁻⁷**
ND6	26	528	−0.001	1.00	7.762	0.85	3.120	1.00
Other
12s rRNA	26	1031	0.015	1.00	−0.023	1.00	−0.001	1.00
16s rRNA	26	1680	0.000	1.00	0.536	1.00	0.602	1.00
tRNA	26	1636	1.646	1.00	0.000	1.00	0.000	1.00
Cont. reg.	24	1746	−0.007	1.00	21.806	1.0 × 10⁻³**	−0.023	1.00

Notes: Time to sexual maturity, maximum lifespan, and body mass are used as indicators for the generation time, longevity, and general life history hypotheses respectively. P-values are corrected by Bonferroni multiple test correction (N = 41). Significant values with a P-value less than 0.05 or 0.01 are signified with * and ** respectively using a χ² with two degrees of freedom. The control region is signified as “Cont. reg.”.

We examined the hypothesis of third codon positions by calculating the distribution of site ALRSs based on the parameters estimated across the whole gene. Thus, at each site, we calculate the relative support for the null (negative value) or alternative model (positive value). We then selected the codon positions of each gene and summed the site ALRSs.

In Table 5, we report the results of this experiment for our significant protein-coding genes, as identified by the original ALRS. For maximum lifespan and body mass the mtDNA protein-coding genes show a clear signal from the third codon position, implying changes in neutral rates. Comparing the distribution of sum of site ALRS for all 13 mtDNA protein-coding genes by a paired Wilcoxon signed rank test shows that the third codon distribution is significantly different for maximum lifespan for both first (P-value = 7.3 × 10⁻⁴) and second codon positions (P-value = 2.4 × 10⁻⁴). Similarly, for body mass, third codon positions are significantly different than either first or second positions (first: P-value = 4.9 × 10⁻⁴ and second: P-value = 4.9 × 10⁻⁴). In contrast, the first and second positions are not significantly different for any phenotype. This final method works to test groups of sites we have selected as a whole, but we do not know which of the third codon positions are associated and which are not, nor if some first or second codon positions are associated as well; we only explore the overall trend from a particular group of sites. Most synonymous substitutions, those that do not alter the protein, are found in the third codon position, consistent with the neutral rate effects predicted by the longevity hypotheses.

The association between neutral substitutions can be further tested by examining the non-protein-coding portions of the mtDNA. 12s rRNA, 16s rRNA, and the tRNA genes were not significant for any of the tested phenotypic associations. One possible explanation for this is that the ribosomal and tRNA genes are strongly conserved and as a result there is relatively low power to detect an association.⁷⁶ Another explanation is that the association occurs across the entire gene, similar to the results of the rRNA gene of daphniids.²¹ In contrast, the control region is highly divergent with few distinct classes of sites, but was detected as significantly associated with maximum lifespan. This, along with the results examining the third codon position of protein-coding loci, supports the prediction of the longevity hypothesis that neutral rates of mtDNA substitution are associated with the lifespan of primates.

Nuclear genes

In contrast to the mtDNA, the nuclear genome does not have a consistent pattern of association with the phenotypes (Table 4). 61.5% of the mtDNA protein-coding genes were significantly associated with body mass whereas only 12.5% of the nuclear genes tested were significant. Similarly, maximum lifespan is significantly associated with 69.2% of the mtDNA protein-coding genes and 4.2% of the nuclear genes. Time to sexual maturity shows a low but similar quantity of signal in both the mtDNA and nuclear genes, with 15.3% and 8.3% respectively.

Table 4. Results of nuclear genes under three different phenotypic hypotheses.

Gene	N	Length	Sexual maturity		Max. lifespan		Body mass
Gene	N	Length	ALRS	P- value	ALRS	P- value	ALRS	P- value
Male associated
ACPP	13	1161	−0.105	1.00	2.054	1.00	2.463	1.00
BRCA1	11	2893	−0.005	1.00	−0.012	1.00	23.393	3.4 × 10⁻⁴**
CATSPER1	20	2514	13.010	6.1 × 10⁻²	63.034	8.4 × 10⁻¹³**	49.779	6.4 × 10⁻¹⁰**
DBI	18	315	0.000	1.00	0.133	1.00	1.158	1.00
KLK2	12	786	0.230	1.00	0.185	1.00	0.191	1.00
PIP	13	441	−0.035	1.00	−0.001	1.00	−0.001	1.00
PSA	11	786	−0.007	1.00	2.212	1.00	−0.517	1.00
SEMG1	14	2649	15.865	1.5 × 10⁻²*	4.458	1.00	14.330	3.2 × 10⁻²*
SEMG2	16	4245	12.635	7.4 × 10⁻²	0.100	1.00	4.831	1.00
SRY	37	648	1.584	1.0	12.568	7.7 × 10⁻²	−0.014	1.00
TGM4	15	2055	0.893	1.00	0.000	1.00	0.000	1.00
TMPRSS2	15	1479	0.000	1.00	0.000	1.00	0.000	1.00
VAT1	14	1212	19.557	2.3 × 10⁻³**	3.687	1.00	−0.005	1.00
ZAN	16	555	0.000	1.00	0.761	1.00	2.460	1.00
Mt complex IV
COX4	19	438	2.790	1.00	0.251	1.00	0.064	1.00
COX5A	19	459	0.467	1.00	0.000	1.00	0.000	1.00
COX6C	12	228	0.219	1.00	3.247	1.00	−0.225	1.00
COX7A1	10	243	4.499	1.00	7.640	0.90	8.507	0.58
COX7C	12	192	3.528	1.00	4.609	1.00	0.122	1.00
COX8L	19	210	1.070	1.00	0.669	1.00	3.809	1.00
Other
CD45	14	3963	−0.003	1.00	−0.003	1.00	−0.004	1.00
HBB	8	444	0.784	1.00	1.598	1.00	0.021	1.00
LYZ	26	447	−0.001	1.00	0.002	1.00	0.665	1.00
MC1R	18	954	2.271	1.00	4.287	1.00	0.753	1.00

Notes: Male associated genes are genes expressed in semen, found on the Y chromosome, or implicated in sperm competition. P-values are corrected by Bonferroni multiple test correction (N = 41). Significant values with a P-value less than 0.05 or 0.01 are signified with * and ** respectively using a χ² with two degrees of freedom.

Two specific molecular functions predominate the nuclear genes we tested: nuclear encoded genes that function in the electron transport chain of the mitochondria and male associated genes.

In the first set, we tested six of the mitochondrial complex IV genes and found no significant associations. Thus, the associations found with mitochondrially-encoded proteins do not extend to mitochondrial proteins encoded in the nuclear genome. Even after all six loci are concatenated, they are not significant for any phenotype tested (uncorrected P-values; body mass = 0.980; maximum lifespan = 0.850; and time to female maturity = 0.743).

The second main functional group of nuclear genes, male associated genes, show the most significant associations with phenotypes for the nuclear genome. These are genes either known to be expressed in the testis, found in the ejaculate, or involved in sperm function. In this case, however, in contrast to the mtDNA genes, the strength of signal from the third codon position is not generally greater than that from the first two codon positions (Table 5). This indicates that some of the associations in male genes may be due to positive selection, which is consistent with the frequent occurrence of adaptive evolution at these loci.^14,15,77,78 However, the cause of the specific associations seen with different phenotypes is unclear. Several of these loci are related to sperm competition, and there is a possibility of an association arising indirectly from a correlation between the traits tested and the degree of male competition, but this requires further analysis.

Table 5. ALRS by codon position.

Gene	Phenotype	Codon position ALRS
Gene	Phenotype	1st	2nd	3rd	3rd–-1st–-3rd
ATP6	Body mass	7.06	−15.98	30.81	39.73
COX1	Body mass	−30.98	−20.47	66.18	117.63
COX3	Body mass	−2.63	−8.89	31.06	42.58
CYTB	Body mass	5.16	−33.56	62.09	90.49
ND1	Body mass	−13.86	−6.8	38.78	59.44
ND3	Body mass	1.71	2.8	23.07	18.57
ND4	Body mass	−22.28	−21.2	58.17	101.65
ND5	Body mass	1.35	−53.1	90.47	142.22
BRCA1	Body mass	6.69	6.5	10.22	−2.97
CATSPER1	Body mass	11.14	25.96	12.68	−24.41
SEMG1	Body mass	4.52	7.48	2.33	−9.67
ATP6	Max. lifespan	−9.89	−13.11	61.52	84.52
COX1	Max. lifespan	−39.06	−28.49	124.28	191.84
COX3	Max. lifespan	−12.11	1.13	82.96	93.94
CYTB	Max. lifespan	−17.38	−29.70	114.27	161.35
ND1	Max. lifespan	−9.61	−48.24	107.32	165.17
ND2	Max. lifespan	2.26	−42.45	98.99	139.18
ND3	Max. lifespan	−20.23	3.03	56.57	73.77
ND4	Max. lifespan	−11.72	−33.84	160.6	206.15
ND5	Max. lifespan	32.68	−46.87	206.67	220.87
CATSPER1	Max. lifespan	15.07	33.12	14.84	−33.35
ND1	Sexual maturity	18.72	−4.91	3.93	−9.87
ND5	Sexual maturity	19.25	14.96	−12.53	−46.74
SEMG1	Sexual maturity	1.54	10.88	3.44	−8.98
VAT1	Sexual maturity	9.06	−0.14	10.63	1.71

Notes: Break down of the ALRS by codon position for significant results (see Tables 3 and 4). Negative values occur because the parameters are optimized for the entire gene. The final column shows the strength of signal from the third codon position compared to the other two. Genes from the mitochondrial genome are in bold.

Assessing the three phenotypes and their association with mtDNA rate variation in primates

Two of the three phenotypes we tested, body mass and maximum lifespan, are associated with mtDNA rate variation in primates. It is difficult to differentiate between the two significant phenotypes (corresponding to general life history and the longevity hypotheses) because the two traits are highly correlated.²⁵ In our results, the proportion of mtDNA loci supporting the longevity hypothesis (10/17) was very similar to that for the association with body mass (8/17), providing no discrimination between the two. When examining the distribution of ALRSs for each hypothesis, we are able to differentiate between the two. Using a two-tailed paired Wilcoxon Rank Test for the 17 regions of the mtDNA, we find that maximum lifespan is significantly different from body mass (P-value = 6.63 × 10⁻³; median maximum lifespan 38.518; median body mass 6.657), thus providing more support for a longevity association than of general life history for primates. We now discuss how our results relate to each phenotype in more detail.

Generation time hypothesis

The generation time hypothesis predicts that if a species has shorter intervals between generations then the mutation rate per year will increases with decreased generation time and there will be be greater number of substitutions fixed because of the increased number of genomic replications.^35,79 Some studies have found evidence in support of this hypothesis across a variety of taxonomic groups.^{25,35,79–82}

The generation time hypothesis predicts that both nuclear and mitochondrial substitution rates should be affected. In contrast, when examining CYTB in mammals, Nabholz et al¹⁰ found a correlation with time to female maturity (though not as the main predictor), but not in the the nuclear gene interphotoreceptor retinoid-binding protein exon 1.

Our results provide little support for the generation time hypothesis for primates in either the mtDNA or nuclear genomes. Initially, this may seem surprising as previous studies examining primate rate variation have invoked the generation time hypothesis as an explanation for the rate variation between Old World monkeys and apes (also known as the hominoid rate-slowdown hypothesis).31,35

However, these findings are consistent with what we know about substitution rates in primates. Based on a genome-wide analysis, strepsirhines and New World monkeys are known to have a significantly slower rate, about half that of apes and Old World monkeys.³⁶ Similarly, the mtDNA specifically is known to have a slower rate in lemurs than anthropoids, especially the rate of transitions.³⁷ In contrast, strepsirhines have a faster generation time than apes and monkeys (t-test of log transformed maturity time: P-value = 7.18 × 10⁻¹⁴ for females only;⁵⁶P-value = 8.42 × 10⁻¹⁶ for a maturity time function of both male and female⁸³) and New World monkeys also have a faster generation time than apes and Old World monkeys (P-value = 1.98 × 10⁻⁶;⁵⁶P-value = 1.85 × 10^{−5 83}). Thus, the generation time hypothesis predicts the following order for rates, from lowest to highest: apes, Old World monkeys, New World monkeys, strepsirhines, but the actual order based on³⁶ is: New World monkeys and strepsirhines, apes, Old World monkeys. This implies some additional trait may be involved in the differences between apes and Old World monkeys, and that the hominoid slowdown hypothesis may have alternative explanations other than generation time. Similarly, if the hominoid slowdown hypothesis is explained by generation time, then some other feature is obscuring the general signal in primates.

Another potential explanation for our lack of significant association is that the generation time signal is associated with longer evolutionary divergence that is too noisy when examining closely related species such as primates. One study examining generation time in invertebrates found that it was the best predictor of rate variation, but over a considerably longer evolutionary time than that considered here.⁷⁹ Even among mammals, the time may not be sufficient to detect a significant association other than that found in common with other life history variables.^{9,10,12,39,79}

Some caution is necessary in evaluating the generation time hypothesis for primates, because the phenotypic data are less complete with age at female sexual maturity and we had to make a greater number of genus level assignments to generate the taxon mapping of the discretization (see Methods). Further, only two of the 26 taxa were in the smallest phenotypic category. This may have effectively reduced our test from a three category to a two category test but significance still evaluated by a Ç² distribution with degrees of freedom equal to two. We reexamined this phenotype by reducing the number of categories to two (based on either the median or the P_0.66 threshold) and compared it to a Ç² distribution with one degree of freedom. In both cases, we still failed to achieve any level of significance greater than that reported in Table 3.

Longevity hypothesis

The longevity hypothesis predicts that species where long life is adaptive will show selection on the molecular pathways that mitigate or reduce the production of radical oxidative species and as well as the molecular machinery involved in mtDNA replication. As a result, the selective effect should reduce the rate of mutation in the mtDNA.²³

Previous studies have supported this hypothesis in one of two ways: biochemistry and molecular phylogenetic rate variation. Biochemically, it has been shown that long-lived species have reduced rates of reactive oxygen species (ROS) generation and increased resistance to damage in structural components (for reviews see Pamplona and Barja²⁴ and Galtier et al²³). In terms of rate variation, the third codon position of CYTB in mammals and birds was shown to be correlated with substitution rate.^10,12 Further, maximum lifespan was the strongest predictor of synonymous rate for the nine longest protein-coding genes in the mtDNA and some nuclear encoded genes, even with multiple regressions.⁹ This association was found across all Mammalia using 160 taxa and concatenated genes (all nine coding genes as a single alignment) in order to obtain sufficient power.⁹ Here, we have included a larger number of loci (including non-protein-coding mtDNA genes), but a smaller number of taxa and we were still able to detect an association. We only required 26 taxa within a single order and used separate genes to detect the association. Moreover, we were able to examine the control region because of the close evolutionary relationship of primates and because our model uses nucleotide models for the genotype substitution.

Further, these results in primates are important as some studies have found either a lack of association⁹ (when only examining primates and rodents) or an insignificant mammalian association with primates and a negative association when primates are excluded.³⁸ One possibility is the difference in power between tests, as mentioned previously.

We found a significant association (Table 3) with at least one representative alignment for all gene groups of Galtier et al.³⁸ As previously stated, we analyzed gene by gene not by gene groups. Our results also imply that the primary signal comes from neutral changes rather than positive selection or relaxation of negative selection. This third codon signal would have been missed by Galtier et al³⁸ as that study only examined amino acids, but our results for primates are consistent with their overall conclusion of a negative association of neutral substitution and longevity across mammals for the mtDNA.

General life history and metabolic rate

Body mass is associated with many aspects of life history.^84–88 We have so far interpreted the body mass association as one of general life history, but many studies have used body mass as a surrogate for metabolic rate.

The metabolic rate hypothesis predicts that organisms with a higher metabolic rate should have increased production of ROS and thus higher mutation rates in the mtDNA.²³ Some have hypothesized that the correlation between metabolic rate and substitution rate is general enough to recalibrate the molecular clock.^19,89 The metabolic rate hypothesis has had some support within birds and mammals,^41,90 but these studies have been criticized for their small data sets.⁹¹ A few investigations using larger data sets have failed to find a correlation between mass-specific metabolic and substitution rates, again within birds and mammals,²⁵ and also in invertebrates.^91,92 Other papers with large data sets have found a negative correlation between body mass and substitution rate,^10,12 but for mammals this is removed when regressed with other life history variables.¹⁰

Our results support a connection between body mass and substitution rate that may be separate from the metabolic rate hypothesis. Body mass is a known correlate of many life history and ecological traits^84–88 and it is plausible that an unknown phenotype, or longevity alone as indicated by the Wilcoxon Test between their ALRSs, is the root correlate.

Conclusions

We have successfully extended the method of¹⁶ to account for a greater number of phenotypic categories and tested it on coding and non-coding sequence data from primate mitochondrial and nuclear genomes. Simulation studies showed that the method performs well, with a false-positive rate within acceptable values for a 5% cutoff as well as reasonable power (>80%) when there is greater than a three-fold change in substitution rate and greater than a tree length of three. Further, we have shown that a stricter alternative hypothesis (W₁ > W₀) is preferred over a more relaxed hypothesis for two or three phenotype categories. In simulation, the number of phenotypic categories was tested up to four and shown to have a low FP rate, but lower power with the four phenotype category case.

When applied to primate data with three phenotypic categories, the method identified many of the genes from the mtDNA as having an association with maximum lifespan and body mass consistent with the longevity hypothesis and general life history respectively. The significant signal predominantly originates from the third codon position for both hypotheses in the mtDNA protein-coding genes, implying a neutral signal on the mtDNA. The control region, which is non-coding and highly divergent, was also significantly associated with maximum lifespan and again supports the prediction of the longevity hypothesis. Those few nuclear encoded genes that were significant have a different pattern of contribution from the codon positions that indicate a separate and unknown explanation for the associations.

Primates had previously been identified as an outlier group among mammals for the maximum lifespan and body mass associations as it was inconsistent with the general mammalian pattern.^9,38 Also, some signal of adaptive evolution has been identified in the primate mtDNA.^22,93 Examining this pattern with our method, we show that primate mtDNA is evolving in a manner consistent with previous predictors for the phenotypes and that the signal coming from the third codon position is indicative of the neutral rate, which may have been overlooked in some of the previous studies. We find little support for the generation time hypothesis, which is inconsistent with the typical rationale for the hominoid slowdown,^31,33,35 but consistent with what we know of substitution rates between apes and Old World monkeys, and New World monkeys and strepsirhine primates.³⁶

Our method is a useful tool in identifying variation in the neutral rate of evolution associated with phenotype. Here, it has detected an association with longevity and body size, and mtDNA rate variation in primates missed by other methods, showing that primates are consistent with other mammals in this respect.

Author Contributions

Conceived and designed the experiments: TDO, NIM. Analyzed the data: TDO. Wrote the first draft of the manuscript: TDO, NIM. Agree with manuscript results and conclusions: TDO, NIM. Made critical revisions and approved final version: TDO, NIM. All authors reviewed and approved of the final manuscript.

Funding

TDO was funded by a Gates Scholarship.

Competing Interests

Author(s) disclose no potential conflicts of interest.

Disclosures and Ethics

As a requirement of publication the authors have provided signed confirmation of their compliance with ethical and legal obligations including but not limited to compliance with ICMJE authorship and competing interests guidelines, that the article is neither under consideration for publication nor published elsewhere, of their compliance with legal and ethical guidelines concerning human and animal research participants (if applicable), and that permission has been obtained for reproduction of any copyrighted material. This article was subject to blind, independent, expert peer review. The reviewers reported no competing interests.

Acknowledgments

We would like to thank Adrian Friday and Stephen Montgomery for useful discussion and comments on a previous draft of this paper and anonymous reviewers for their comments. The simulation and analyzes were done using the CamGrid cluster (http://www.escience.cam.ac.uk/projects/camgrid/) at the University of Cambridge, which is maintained by the Cambridge eScience Center. TDO was supported by the Gates Cambridge Trust and the Overseas Research Student Award. NIM was supported by the Leverhulme Trust.

References

1. Anisimova M., Liberles D.A. The quest for natural selection in the age of comparative genomics. Heredity. 2007; 99: 567–79.

Crossref

Google Scholar

2. Baer C.F., Miyamoto M.M., Denver D.R. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nature Rev Genet. 2007; 8: 619–31.

Crossref

Google Scholar

3. Bromham L. Why do species vary in their rate of molecular evolution? Biol Lett. 2009; 5: 401.

Crossref

Google Scholar

4. Lopez P., Casane D., Philippe H. Heterotachy, an important process of protein evolution. Mol Biol Evol. 2002; 19: 1–7.

Crossref

Google Scholar

5. Lockhart P., Steel M. A tale of two processes. Syst Biol. 2005; 54: 948–51.

Crossref

Google Scholar

6. Zhou Y., Rodrigue N., Lartillot N., Philippe H. Evaluation of the models handling heterotachy in phylogenetic inference. BMC Evol Biol. 2007; 7: 1471–2148.

Crossref

Google Scholar

7. Yang Z. Estimating the pattern of nucleotide substitution. J Mol Evol. 1994; 39: 105–11.

Crossref

Google Scholar

8. Yang Z. Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol. 1996; 11: 367–72.

Crossref

PubMed

Google Scholar

9. Welch J.J., Bininda-Emonds O.R.P., Bromham L. Correlates of substitution rate variation in mammalian protein-coding sequences. BMC Evol Biol. 2008; 8: 53.

Crossref

Google Scholar

10. Nabholz B., Glemin S., Galtier N. Strong variations of mitochondrial mutation rate across mammals—the longevity hypothesis. Mol Biol Evol. 2008; 25: 120.

Crossref

Google Scholar

11. Nabholz B., Mauffrey J.F., Bazin E., Galtier N., Glemin S. Determination of mitochondrial genetic diversity in mammals. Genetics. 2008; 178: 351.

Crossref

Google Scholar

12. Nabholz B., Glémin S., Galtier N. The erratic mitochondrial clock: variations of mutation rate, not population size, affect mtDNA diversity across birds and mammals. BMC Evol Biol. 2009; 9: 54.

Crossref

Google Scholar

13. Dorus S., Evans P.D., Wyckoff G.J., Choi S.S., Lahn B.T. Rate of molecular evolution of the seminal protein gene SEMG2 correlates with levels of female promiscuity. Nature Genet. 2004; 36: 1326–9.

Crossref

Google Scholar

14. Hurle B., Swanson W.J.; NISC Comparative Sequencing Program, Green ED. Comparative sequence analyses reveal rapid and divergent evolutionary changes of the WFDC locus in the primate lineage. Genome Res. 2007; 17: 276.

Crossref

PubMed

ISI

Google Scholar

15. Ramm S.A., Oliver P.L., Ponting C.P., Stockley P., Emes R.D. Sexual selection and the adaptive evolution of mammalian ejaculate proteins. Mol Biol Evol. 2008; 25: 207.

Crossref

Google Scholar

16. O'Connor T.D., Mundy N.I. Genotype-phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate. Bioinformatics. 2009; 25: i94–100.

Crossref

Google Scholar

17. Herlyn H., Zischler H. Sequence evolution of the sperm ligand zonadhesin correlates negatively with body weight dimophism in primates. Evolution. 2007; 61: 289–98.

Crossref

Google Scholar

18. Nadeau N.J., Burke T., Mundy N.I. Evolution of an avian pigmentation gene correlates with a measure of sexual selection. Proc R Soc B Lon. 2007; 274.

Google Scholar

19. Gillooly J.F., Allen A.P., West G.B., Brown J.H. The rate of DNA evolution: effects of body size and temperature on the molecular clock. Proc Natl Acad Sci. 2005; 102: 140–5.

Crossref

Google Scholar

20. Gillman L.N., Keeling D.J., Ross H.A., Wright S.D. Latitude, elevation and the tempo of molecular evolution in mammals. P Roy Soc B-Biol Sci. 2009.

Crossref

Google Scholar

21. Mayrose I., Otto S.P. A likelihood method for detecting trait-dependent shifts in the rate of molecular evolution. Mol Biol Evol. 2011; 28: 759–70.

Crossref

Google Scholar

22. Pupko T., Galtier N. A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes. Proc R Soc B Lon. 2002; 269: 1313–6.

Crossref

Google Scholar

23. Galtier N., Jobson R.W., Nabholz B., Glémin S., Blier P.U. Mitochondrial whims: metabolic rate, longevity and the rate of molecular evolution. Biol Lett. 2009; 5: 413.

Crossref

Google Scholar

24. Pamplona R., Barja G. Highly resistant macromolecular components and low rate of generation of endogenous damage: two key traits of longevity. Ageing Res Rev. 2007; 6: 189–210.

Crossref

PubMed

ISI

Google Scholar

25. Bromham L., Rambaut A., Harvey P.H. Determinants of rate variation in mammalian DNA sequence evolution. J Mol Evol. 1996; 43: 610–21.

Crossref

Google Scholar

26. Li W.H., Tanimura M., Sharp P.M. An evaluation of the molecular clock hypothesis using mammalian DNA sequences. J Mol Evol. 1987; 25: 330–42.

Crossref

Google Scholar

27. Zuckerkandl E., Pauling L. Evolving Genes and Proteins. In: Bryson V., Vogelch H.J., editors. Evolutionary divergence and convergence in proteins. New York: Academic Press; 1965: 97–166.

Google Scholar

28. Laird C.D., McConaughy B.L., McCarthy B.J. Rate of fixation of nucleotide substitutions in evolution. Nature. 1969; 224: 149–54.

Crossref

Google Scholar

29. Kohne D.E. Evolution of higher-organism DNA. Quart Rev Biophys. 1970; 3: 327–75.

Crossref

Google Scholar

30. Wu C.I., Li W.H. Evidence for higher rates of nucleotide substitution in rodents than in man. Proc Natl Acad Sci. 1985; 82: 1741–5.

Crossref

Google Scholar

31. Li W.H., Ellsworth D.L., Krushkal J., Chang B.H.J., Hewett-Emmett D. Rates of nucleotide substitution in primates and rodents and the generation-time effect hypothesis. Mol Phylo Evol. 1996; 5: 182–7.

Crossref

PubMed

ISI

Google Scholar

32. Tëtushkin E.Y. Rates of molecular evolution of primates. Rus J Genet. 2003; 39: 721–36.

Crossref

Google Scholar

33. Hayasaka K., Gojobori T., Horai S. Molecular phylogeny and evolution of primate mitochondrial DNA. Mol Biol Evol. 1988; 5: 626–44.

Google Scholar

34. Hasegawa M. Phylogeny and molecular evolution in primates. Jpn J Genet. 1990; 65: 243–66.

Crossref

Google Scholar

35. Li W.H., Tanimura M. The molecular clock runs more slowly in man than in apes and monkeys. Nature. 1987; 326: 93–6.

Crossref

Google Scholar

36. Steiper M.E., Young N.M. Primate molecular divergence dates. Mol Phylo Evol. 2006; 41: 384–94.

Crossref

PubMed

Google Scholar

37. Hasegawa M., Kishino H., Hayasaka K., Horai S. Mitochondrial DNA evolution in primates: transition rate has been extremely low in the lemur. J Mol Evol. 1990; 31: 113–21.

Crossref

Google Scholar

38. Galtier N., Blier P.U., Nabholz B. Inverse relationship between longevity and evolutionary rate of mitochondrial proteins in mammals and birds. Mitochondrion. 2009; 9: 51–7.

Crossref

Google Scholar

39. Lartillot N., Poujol R. A phylogenetic model for investigating correlated evolution of substitution rates and continuous phenotypic characters. Mol Biol Evol. 2011; 28: 729–44.

Crossref

Google Scholar

40. Welch J.J., Waxman D. Calculating independent contrasts for the comparative study of substitution rates. J Theor Biol. 2008; 251: 667–78.

Crossref

Google Scholar

41. Martin A.P., Palumbi S.R. Body size, metabolic rate, generation time, and the molecular clock. Proc Natl Acad Sci. 1993; 90: 4087–91.

Crossref

Google Scholar

42. Pagel M., Meade A. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol. 2004; 53: 571–81.

Crossref

PubMed

ISI

Google Scholar

43. Pagel M. Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. P Roy Soc B-Biol Sci. 1994; 255: 37–45.

Crossref

ISI

Google Scholar

44. Tavaré S. Some probabilistic and statistical problems in the analysis of DNA sequences. Lect Math Life Sci. 1986; 17: 57–86.

Google Scholar

45. Zharkikh A. Estimation of evolutionary distances between nucleotide sequences. J Mol Evol. 1994; 39: 315–29.

Crossref

Google Scholar

46. Lewis P.O. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst Biol. 2001; 50: 913–25.

Crossref

Google Scholar

47. Yang Z. Computational Molecular Evolution. Oxford University Press, USA; 2006.

Crossref

Google Scholar

48. Silvey S.D. Statistical Inference. Chapman & Hall/CRC; 1975.

Google Scholar

49. Kendall M., Stuart A. The Advanced Theory of Statistics. Charles Griffn, London; 1979; 2.

Google Scholar

50. Lindgren B.W. Statistical Theory. Chapman & Hall/CRC; 1993.

Google Scholar

51. Drummond A., Strimmer K. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics. 2001; 17: 662–3.

Crossref

Google Scholar

52. Goodman M., Grossman L.I., Wildman D.E. Moving primate genomics beyond the chimpanzee genome. Trends Genet. 2005; 21: 511–7.

Crossref

Google Scholar

53. Horvath J.E., Weisrock D.W., Embry S.L.et al. Development and application of a phylogenomic toolkit: Resolving the evolutionary history of Madagascar's lemurs. Genome Res. 2008; 18: 489.

Crossref

Google Scholar

54. Smith R.J., Jungers W.L. Body mass in comparative primatology. J of Human Evol. 1997; 32: 523–59.

Crossref

PubMed

ISI

Google Scholar

55. Isaac N.J.B., Jones K.E., Gittleman J.L., Purvis A. Correlates of species richness in mammals: body size, life history, and ecology. Amer Naturalist. 2005; 165: 600–7.

Crossref

Google Scholar

56. Magalhães J.P., Budovsky A., Lehmann G.et al. The Human Ageing Genomic Resources: online databases and tools for biogerontologists. Aging Cell. 2009; 8: 65–72.

Crossref

Google Scholar

57. Magalhães J.P., Costa J. A database of vertebrate longevity records and their relation to other life-history traits. J Evol Biol. 2009; 22: 1770–4.

Crossref

Google Scholar

58. Schwartz S., Kent W.J., Smit A.et al. Human-mouse alignments with BLASTZ. Genome Res. 2003; 13: 103–7.

Crossref

ISI

Google Scholar

59. Blanchette M., Kent W.J., Riemer C.et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004; 14: 708–15.

Crossref

Google Scholar

60. Aquadro C.F., Greenberg B.D. Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. Genetics. 1983; 103: 287–312.

Google Scholar

61. Cann R.L., Brown W.M., Wilson A.C. Polymorphic sites and the mechanism of evolution in human mitochondrial DNA. Genetics. 1984; 106: 479–99.

Google Scholar

62. Loytynoja A., Goldman N. An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci. 2005; 102: 10557–62.

Crossref

ISI

Google Scholar

63. Loytynoja A., Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science. 2008; 320: 1632.

Crossref

ISI

Google Scholar

64. Fletcher W., Yang Z. The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection. Mol Biol Evol. 2010; 27: 2257–67.

Crossref

ISI

Google Scholar

65. Shoemaker B.A., Panchenko A.R. Deciphering protein–protein interactions. Part I: Experimental techniques and databases. PLoS Comput Biol. 2007; 3: e42.

Crossref

Google Scholar

66. Shoemaker B.A., Panchenko A.R. Deciphering protein–protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol. 2007; 3: e43.

Crossref

Google Scholar

67. Wong W.S.W., Yang Z., Goldman N., Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004; 168: 1041–51.

Crossref

Google Scholar

68. Felsenstein J. Phylogenies and the comparative method. American Naturalist. 1985; 125: 1.

Crossref

ISI

Google Scholar

69. Pagel M. Inferring evolutionary processes from phylogenies. Zoologica Scripta. 1997; 26: 331–48.

Crossref

Google Scholar

70. Pagel M. Inferring the historical patterns of biological evolution. Nature. 1999; 401: 877–84.

Crossref

PubMed

ISI

Google Scholar

71. Freckleton R.P., Harvey P.H., Pagel M. Phylogenetic analysis and comparative data: a test and review of evidence. Am Nat. 2002; 160: 712–26.

Crossref

Google Scholar

72. Wlasiuk G., Nachman M.W. Promiscuity and the rate of molecular evolutionat primate immunity genes. Evolution. 2010; 64: 2204–20.

Google Scholar

73. Tsantes C., Steiper M.E. Age at first reproduction explains rate variation in the strepsirrhine molecular clock. Proc Natl Acad Sci. 2009; 106: 18165–70.

Crossref

Google Scholar

74. Schwartz R.S., Mueller R.L. Variation in DNA substitution rates among lineages erroneously inferred from simulated clock-like data. PLoS ONE. 2010; 5: e9649.

Crossref

Google Scholar

75. Ives A.R., Garland T. Jr. Phylogenetic logistic regression for binary dependent variables. Syst Biol. 2010; 59: 9–26.

Crossref

Google Scholar

76. Moritz C., Dowling T.E., Brown W.M. Evolution of animal mitochondrial DNA: relevance for population biology and systematics. Annu Rev Ecol Evol Syst. 1987; 18: 269–92.

Crossref

Google Scholar

77. Pamilo P., O'Neill R.J.W. Evolution of the sry genes. Mol Biol Evol. 1997; 14: 49–55.

Crossref

Google Scholar

78. Clark N.L., Swanson W.J. Pervasive adaptive evolution in primate seminal proteins. PLoS Genet. 2005; 1: e35.

Crossref

Google Scholar

79. Thomas J.A., Welch J.J., Lanfear R., Bromham L. A generation time effect on the rate of molecular evolution in invertebrates. Mol Biol Evol. 2010; 27: 1173–80.

Crossref

PubMed

Google Scholar

80. Ohta T. An examination of the generation-time effect on molecular evolution. Proc Natl Acad Sci. 1993; 90: 10676–80.

Crossref

Google Scholar

81. Mooers AØ, Harvey P.H. Metabolic rate, generation time, and the rate of molecular evolution in birds. Mol Phylo Evol. 1994; 3: 344–50.

Crossref

Google Scholar

82. Nikolaev S.I., Montoya-Burgos J.I., Popadin K., Parand L., Margulies E.H., Antonarakis S.E. Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc Natl Acad Sci. 2007; 104: 20443.

Crossref

Google Scholar

83. Jones K.E., Bielby J., Cardillo M.et al. PanTHERIA: a species-level database of life history, ecology, and geography of extant and recently extinct mammals. Ecology. 2009; 90: 2648.

Crossref

Google Scholar

84. Millar J.S. Adaptive features of mammalian reproduction. Evolution. 1977; 31: 370–86.

Crossref

Google Scholar

85. Blueweiss L., Fox H., Kudzma V., Nakashima D., Peters R., Sams S. Relationships between body size and some life history parameters. Oecologia. 1978; 37: 257–72.

Crossref

Google Scholar

86. Millar R.M. Life histories of mammals: an analysis of life tables. Ecology. 1983; 64: 631–5.

Crossref

Google Scholar

87. Peters R.H. The Ecological Significance of Body Size. Cambridge University Press, Cambridge, UK; 1983.

Crossref

Google Scholar

88. Calder W.A.L. Size, Function and Life History. Harvard University Press, Cambridge, MA, USA; 1984.

Google Scholar

89. Gillooly J.F., Brown J.H., West G.B., Savage V.M., Charnov E.L. Effects of size and temperature on metabolic rate. Science. 2001; 293: 2248.

Crossref

PubMed

ISI

Google Scholar

90. Bleiweiss R. Relative-rate tests and biological causes of molecular evolution in hummingbirds. Mol Biol Evol. 1998; 15: 481–91.

Crossref

Google Scholar

91. Lanfear R., Thomas J.A., Welch J.J., Brey T., Bromham L. Metabolic rate does not calibrate the molecular clock. Proc Natl Acad Sci. 2007; 104: 15388.

Crossref

Google Scholar

92. Thomas J.A., Welch J.J., Woolfit M., Bromham L. There is no universal molecular clock for invertebrates, but rate variation does not scale with body size. Proc Natl Acad Sci. 2006; 103: 7366.

Crossref

Google Scholar

93. Grossman L.I., Wildman D.E., Schmidt T.R., Goodman M. Accelerated evolution of the electron transport chain in anthropoid primates. Trends Genet. 2004; 20: 578–85.

Crossref

Google Scholar

Cite article

If you have citation software installed, you can download article citation data to the citation manager of your choice

Information, rights and permissions

Information

Published In

Evolutionary Bioinformatics

Volume 9

Article first published online: July 28, 2013

Issue published: January - December 2013

Keywords

Rights and permissions

This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 3.0 License (http://www.creativecommons.org/licenses/by-nc/3.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).

Request permissions for this article.

Request Permissions

PubMed: 23926418

Authors

Affiliations

Timothy D. O'Connor

Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA.

[email protected]

View all articles by this author

Nicholas I. Mundy

Department of Zoology, Downing Street, University of Cambridge, Cambridge CB2 3EJ, UK.

View all articles by this author

Notes

email: [email protected]

This article is available from http://www.la-press.com.

Metrics and citations

Metrics

This article was published in Evolutionary Bioinformatics.

VIEW ALL JOURNAL METRICS

Total views and downloads: 313

^*Article usage tracking started in December 2016

See the impact this article is making through the number of times it’s been read, and the Altmetric Score.
Learn more about the Altmetric Scores

Receive email alerts when this article is cited

Web of Science: 7 view articles Opens in new tab

Crossref: 0

PhyloAcc-GT: A Bayesian Method for Inferring Patterns of Substitution ...

Go to citation Crossref Google Scholar
Genes underlying the evolution of tetrapod testes size

Go to citation Crossref Google Scholar
Noncoding regions underpin avian bill shape diversification at macroev...

Go to citation Crossref Google Scholar
Biological Processes Modulating Longevity across Primates: A Phylogene...

Go to citation Crossref Google Scholar

Figures and tables

Figures & Media

Tables

View Options

View options

PDF/ePub

View PDF/ePub

Get access

If you have access to journal content via a personal subscription, university, library, employer or society, select from the options below:

Sage Journals profile

Sign in

Access personal subscriptions, purchases, paired institutional or society access and free tools such as email alerts and saved searches.

Required fields

Email:

Password:

Remember me

Forgotten your password?

Create profile

Institution

Society

Alternatively, view purchase options below:

Purchase access

Read with DeepDyve

Need help?

Abstract

Introduction

Methods

Theory

Simulations

Primate genotype and phenotype data

Results and Discussion

Simulations

Comparison with other approaches

Association of life history variables with rate variation in primates Mitochondrial genes

Nuclear genes

Assessing the three phenotypes and their association with mtDNA rate variation in primates

Generation time hypothesis

Longevity hypothesis

General life history and metabolic rate

Conclusions

Author Contributions

Funding

Competing Interests

Disclosures and Ethics

Acknowledgments

References

Cite article

Cite article

Download to reference manager

Share

Share this article

Share with email

Share on social media

Share access to this article

Information

Published In

Keywords

Rights and permissions

Authors

Affiliations

Notes

Metrics

Journals metrics

Article usage*

Altmetric

Articles citing this one

Figures & Media

Tables

View options

PDF/ePub

Get access

Access options

Sign in

Also from Sage

Article usage^*