Abstract
Mapping evolutionary trajectories of discrete traits onto phylogenies receives considerable attention in evolutionary biology. Given the trait observations at the tips of a phylogenetic tree, researchers are often interested where on the tree the trait changes its state and whether some changes are preferential in certain parts of the tree. In a model-based phylogenetic framework, such questions translate into characterizing probabilistic properties of evolutionary trajectories. Current methods of assessing these properties rely on computationally expensive simulations. In this paper, we present an efficient, simulation-free algorithm for computing two important and ubiquitous evolutionary trajectory properties. The first is the mean number of trait changes, where changes can be divided into classes of interest (e.g. synonymous/non-synonymous mutations). The mean evolutionary reward, accrued proportionally to the time a trait occupies each of its states, is the second property. To illustrate the usefulness of our results, we first employ our simulation-free stochastic mapping to execute a posterior predictive test of correlation between two evolutionary traits. We conclude by mapping synonymous and non-synonymous mutations onto branches of an HIV intrahost phylogenetic tree and comparing selection pressure on terminal and internal tree branches.
References
-
Ball F& Milne R . 2005 Simple derivations of properties of counting processes associated with Markov renewal processes. J. Appl. Prob. 42, 1031–1043.doi:10.1239/jap/1134587814. . Crossref, ISI, Google Scholar -
Cannings, C., Thompson, E. & Skolnick, M. 1980 Pedigree analysis of complex models. In Current developments in anthropological genetics, pp. 251–298. New York, NY: Plenum Press. Google Scholar
-
Dimmic M, Hubisz M, Bustamante C& Nielsen R . 2005 Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics. 21, i126–i135.doi:10.1093/bioinformatics/bti1032. . Crossref, PubMed, ISI, Google Scholar -
Drummond A& Rambaut A . 2007 BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 doi:10.1186/1471-2148-7-214. . Crossref, PubMed, ISI, Google Scholar -
Dutheil J, Pupko T, Marie A& Galtier N . 2005 A model-based approach for detecting coevolving positions in a molecule. Mol. Biol. Evol. 22, 1919–1928.doi:10.1093/molbev/msi183. . Crossref, PubMed, ISI, Google Scholar -
Felsenstein J . 1981 Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol. 13, 93–104. Crossref, ISI, Google Scholar -
Gelman A, Meng X& Stern H . 1996 Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6, 733–807. ISI, Google Scholar -
Guindon S, Rodrigo A, Dyer K& Huelsenbeck J . 2004 Modeling the site-specific variation of selection patterns along lineages. Proc. Natl Acad. Sci. USA. 101, 12 957–12 962.doi:10.1073/pnas.0402177101. . Crossref, ISI, Google Scholar -
Holmes I& Rubin G.M . 2002 An expectation maximization algorithm for training hidden substitution models. J. Mol. Biol. 317, 753–764.doi:10.1006/jmbi.2002.5405. . Crossref, PubMed, ISI, Google Scholar -
Huelsenbeck J, Rannala B& Masly J . 2000 Accommodating phylogenetic uncertainty in evolutionary studies. Science. 288, 2349–2350.doi:10.1126/science.288.5475.2349. . Crossref, PubMed, ISI, Google Scholar -
Huelsenbeck J, Nielsen R& Bollback J . 2003 Stochastic mapping of morphological characters. Syst. Biol. 52, 131–158.doi:10.1080/10635150390192780. . Crossref, PubMed, ISI, Google Scholar -
Lemey P, Pond S.K, Drummond A, Pybus O, Shapiro B, Barroso H, Taveira N& Rambaut A . 2007 Synonymous substitution rates predict HIV disease progression as a result of underlying replication dynamics. PLoS Comput. Biol. 3, e29 doi:10.1371/journal.pcbi.0030029. . Crossref, PubMed, ISI, Google Scholar -
Leschen R& Buckley T . 2007 Multistate characters and diet shifts: evolution of Erotylidae (Coleoptera). Syst. Biol. 56, 97–112.doi:10.1080/10635150701211844. . Crossref, PubMed, ISI, Google Scholar -
Meng X . 1994 Posterior predictive p-values. Ann. Stat. 22, 1142–1160.doi:10.1214/aos/1176325622. . Crossref, ISI, Google Scholar -
Minin V& Suchard M . 2008 Counting labeled transitions in continuous-time Markov models of evolution. J. Math. Biol. 56, 391–412.doi:10.1007/s00285-007-0120-8. . Crossref, PubMed, ISI, Google Scholar -
Neuts M Algorithmic probability: a collection of problems. 1995 London, UK:Chapman and Hall. Google Scholar -
Nielsen R . 2002 Mapping mutations on phylogenies. Syst. Biol. 51, 729–739.doi:10.1080/10635150290102393. . Crossref, PubMed, ISI, Google Scholar -
Pagel M . 1999 The maximum likelihood approach to reconstructing ancestral character states of discrete characters on phylogenies. Syst. Biol. 48, 612–622.doi:10.1080/106351599260184. . Crossref, ISI, Google Scholar -
Pagel, M. & Lutzoni, F. 2002 Accounting for phylogenetic uncertainty in comparative studies of evolution and adaptation. Biological evolution and statistical physics, pp. 148–161. Berlin, Germany: Springer. Google Scholar
-
Pagel M& Meade A . 2006 Bayesian analysis of correlated evolution of discrete characters by reversible-jump Markov chain Monte Carlo. Am. Nat. 167, 808–825.doi:10.1086/503444. . Crossref, PubMed, ISI, Google Scholar -
Pagel M, Meade A& Barker D . 2004 Bayesian estimation of ancestral character states on phylogenies. Syst. Biol. 53, 673–684.doi:10.1080/10635150490522232. . Crossref, PubMed, ISI, Google Scholar -
Rodrigue N, Philippe H& Lartillot N . 2008 Uniformization for sampling realizations of Markov processes: applications to Bayesian implementations of codon substitution models. Bioinformatics. 24, 56–62.doi:10.1093/bioinformatics/btm532. . Crossref, PubMed, ISI, Google Scholar -
Schadt E, Sinsheimer J& Lange K . 1998 Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res. 8, 222–233. Crossref, PubMed, ISI, Google Scholar -
Shankarappa R, 1999 Consistent viral evolutionary changes associated with the progression of human immunodeficiency virus type 1 infection. J. Virol. 73, 10 489–10 502. Crossref, ISI, Google Scholar -
Shapiro B, Rambaut A& Drummond A . 2006 Choosing appropriate substitution models for the phylogenetic analysis of protein-coding sequences. Mol. Biol. Evol. 23, 7–9.doi:10.1093/molbev/msj021. . Crossref, PubMed, ISI, Google Scholar -
Yang Z . 1996 Maximum-likelihood models for combined analyses of multiple sequence data. J. Mol. Evol. 42, 587–596.doi:10.1007/BF02352289. . Crossref, PubMed, ISI, Google Scholar