ABSTRACT
Phylogenetic placement is the computational task that places a query taxon into a reference phylogeny using computational analysis of biomolecular sequence data or other evolutionary characters. A chief advantage of phylogenetic placement over one-shot phylogenetic reconstruction is greatly reduced computational requirements, and the former has been applied in many different topics in phylogenetics. One of the more recent applications has been enabled by rapid advances in biomolecular sequencing technology: classification of genomes, metagenomes, and metagenome-assembled genomes (MAGs) in large-scale datasets produced by next-generation sequencing. A number of methods have been developed for this purpose, and all share the common simplifying assumption that a phylogenetic tree suffices for modeling the evolutionary history of all genomes and/or metagenomes under study. Another parallel development in today's post-genomic era is a greater understanding of the prevalence and importance of non-tree-like evolution in the Tree of Life - the evolutionary history of all life on Earth - which in fact may not be a tree at all. More general graph representations such as phylogenetic networks have therefore been proposed, and a new generation of phylogenetic network reconstruction methods are under active development. But the simplifying assumption made by phylogenetic tree placement methods is fundamentally at odds with the non-tree-like evolutionary histories of many microbes and other organisms. The consequences of this conflict are poorly understood.
In this study, we conduct a comprehensive performance study to directly assess the impact of non-tree-like evolution on phylogenetic tree placement of genomes and metagenomes. Our study includes in silico simulation experiments as well as empirical data analyses. We find that the topological accuracy of phylogenetic tree placement degrades quickly as genomic sequence evolution becomes increasingly non-tree-like. We then introduce a new statistical method for phylogenetic network placement of genomes and metagenomes, which we refer to as NetPlacer version 0. Initial experiments with NetPlacer provide a proof-of-concept, but also point to the need for greater computational scalability. We conclude with thoughts on algorithmic techniques to enable fast and accurate phylogenetic network placement.
- Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403--410.Google ScholarCross Ref
- Francesco Asnicar, Andrew Maltez Thomas, Francesco Beghini, Claudia Mengoni, Serena Manara, Paolo Manghi, Qiyun Zhu, Mattia Bolzan, Fabio Cumbo, Uyen May, et al. 2020. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nature Communications 11, 1 (2020), 1--10.Google ScholarCross Ref
- Metin Balaban, Yueyu Jiang, Daniel Roush, Qiyun Zhu, and Siavash Mirarab. 2022. Fast and accurate distance-based phylogenetic placement using divide and conquer. Molecular Ecology Resources 22, 3 (2022), 1213--1227.Google ScholarCross Ref
- Metin Balaban, Shahab Sarmashghi, and Siavash Mirarab. 2020. APPLES: scalable distance-based phylogenetic placement with or without alignments. Systematic Biology 69, 3 (2020), 566--578.Google ScholarCross Ref
- Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, and Alexandros Stamatakis. 2019. EPA-ng: massively parallel evolutionary placement of genetic sequences. Systematic Biology 68, 2 (2019), 365--369.Google ScholarCross Ref
- Holly M Bik, Dorota L Porazinska, Simon Creer, J Gregory Caporaso, Rob Knight, and W Kelley Thomas. 2012. Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology & Evolution 27, 4 (2012), 233--243.Google ScholarCross Ref
- David Bryant, Remco Bouckaert, Joseph Felsenstein, Noah A Rosenberg, and Arindam RoyChoudhury. 2012. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29, 8 (2012), 1917--1932.Google ScholarCross Ref
- James H Degnan and Laura A Salter. 2005. Gene tree distributions under the coalescent process. Evolution 59, 1 (2005), 24--37.Google ScholarCross Ref
- Casey W Dunn, Felipe Zapata, Catriona Munro, Stefan Siebert, and Andreas Hejnol. 2018. Pairwise comparisons across species are problematic when analyzing functional genomic data. Proceedings of the National Academy of Sciences 115, 3 (2018), E409--E417.Google ScholarCross Ref
- Robert Edgar. 2010. Usearch. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States).Google Scholar
- Joseph Felsenstein. 1985. Phylogenies and the comparative method. The American Naturalist 125, 1 (1985), 1--15.Google ScholarCross Ref
- William Fletcher and Ziheng Yang. 2009. INDELible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution 26, 8 (2009), 1879--1888.Google ScholarCross Ref
- Adrian Fritz, Peter Hofmann, Stephan Majda, Eik Dahms, Johannes Dröge, Jessika Fiedler, Till R Lesker, Peter Belmann, Matthew Z DeMaere, Aaron E Darling, et al. 2019. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 1 (2019), 1--12.Google ScholarCross Ref
- Jotun Hein, Mikkel Schierup, and Carsten Wiuf. 2004. Gene Genealogies, Variation and Evolution: a Primer in Coalescent Theory. Oxford University Press, USA.Google Scholar
- Hussein A Hejase and Kevin J Liu. 2016. A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. BMC Bioinformatics 17, 1 (2016), 1--12.Google ScholarCross Ref
- Hussein A Hejase, Natalie VandePol, Gregory M Bonito, and Kevin J Liu. 2018. FastNet: fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data. In Comparative Genomics: 16th International Conference, RECOMB-CG 2018, Magog-Orford, QC, Canada, October 9--12, 2018, Proceedings 16. Springer, 242--259.Google ScholarCross Ref
- Cody E Hinchliff, Stephen A Smith, James F Allman, J Gordon Burleigh, Ruchi Chaudhary, Lyndon M Coghill, Keith A Crandall, Jiabin Deng, Bryan T Drew, Romina Gazis, et al. 2015. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences 112, 41 (2015), 12764--12769.Google ScholarCross Ref
- Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. 2012. ART: a next-generation sequencing read simulator. Bioinformatics 28, 4 (2012), 593--594.Google ScholarDigital Library
- Richard R Hudson. 2002. ms a program for generating samples under neutral models. Bioinformatics 18, 2 (2002), 337--338.Google ScholarCross Ref
- Doug Hyatt, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 1 (2010), 1--11.Google ScholarCross Ref
- Kazutaka Katoh and Daron M Standley. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30, 4 (2013), 772--780.Google ScholarCross Ref
- John Frank Charles Kingman. 1982. The coalescent. Stochastic Processes and Their Applications 13, 3 (1982), 235--248.Google ScholarCross Ref
- Vincent Lefort, Richard Desper, and Olivier Gascuel. 2015. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Molecular Biology and Evolution 32, 10 (2015), 2798--2800.Google ScholarCross Ref
- Kevin Liu, Tandy J Warnow, Mark T Holder, Serita M Nelesen, Jiaye Yu, Alexandros P Stamatakis, and C Randal Linder. 2012. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Systematic Biology 61, 1 (2012), 90.Google ScholarCross Ref
- James Mallet, Nora Besansky, and Matthew W Hahn. 2016. How reticulated are species? BioEssays 38, 2 (2016), 140--149.Google ScholarCross Ref
- Frederick A Matsen, Robin B Kodner, and E Armbrust. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 1 (2010), 1--16.Google ScholarCross Ref
- Chen Meng and Laura Salter Kubatko. 2009. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theoretical Population Biology 75, 1 (2009), 35--45.Google ScholarCross Ref
- Siavash Mirarab, Nam Nguyen, and Tandy Warnow. 2012. SEPP: SATé-enabled phylogenetic placement. In Biocomputing 2012. World Scientific, 247--258.Google Scholar
- Luay Nakhleh. 2009. A metric on the space of reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 2 (2009), 218--222.Google ScholarDigital Library
- Luay Nakhleh, Bernard ME Moret, Usman Roshan, Katherine St. John, Jerry Sun, and Tandy Warnow. 2001. The accuracy of fast phylogenetic methods for large datasets. In Biocomputing 2002. World Scientific, 211--222.Google Scholar
- Nam-phuong Nguyen, Siavash Mirarab, Bo Liu, Mihai Pop, and Tandy Warnow. 2014. TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30, 24 (2014), 3548--3555.Google ScholarCross Ref
- Sergey Nurk, Dmitry Meleshko, Anton Korobeynikov, and Pavel A Pevzner. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Research 27, 5 (2017), 824--834.Google ScholarCross Ref
- Howard Ochman, Jeffrey G Lawrence, and Eduardo A Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 6784 (2000), 299--304.Google Scholar
- F. Rodriguez, J.L. Oliver, A. Marin, and J.R. Medina. 1990. The general stochastic model of nucleotide substitution. Journal of Theoretical Biology 142 (1990), 485--501.Google ScholarCross Ref
- Luna L Sánchez-Reyes, Martha Kandziora, and Emily Jane McTavish. 2021. Physcraper: a Python package for continually updated phylogenetic trees using the Open Tree of Life. BMC Bioinformatics 22, 1 (2021), 1--13.Google ScholarCross Ref
- Michael J Sanderson. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 2 (2003), 301--302.Google ScholarCross Ref
- Esther Singer, Bill Andreopoulos, Robert M Bowers, Janey Lee, Shweta Deshpande, Jennifer Chiniquy, Doina Ciobanu, Hans-Peter Klenk, Matthew Zane, Christopher Daum, et al. 2016. Next generation sequencing data of a defined microbial mock community. Scientific Data 3, 1 (2016), 1--8.Google ScholarCross Ref
- Alexandros Stamatakis. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312--1313.Google ScholarCross Ref
- Cuong Than, Derek Ruths, and Luay Nakhleh. 2008. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9 (2008), 1--16.Google ScholarCross Ref
- Todd J Treangen and Eduardo PC Rocha. 2011. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genetics 7, 1 (2011), e1001284.Google ScholarCross Ref
- Susannah Green Tringe and Edward M Rubin. 2005. Metagenomics: DNA sequencing of environmental samples. Nature Reviews Genetics 6, 11 (2005), 805--814.Google ScholarCross Ref
- Tandy Warnow. 2013. Large-scale multiple sequence alignment and phylogeny estimation. Models and Algorithms for Genome Evolution (2013), 85--146.Google Scholar
- Derrick E Wood and Steven L Salzberg. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 15, 3 (2014), 1--12.Google ScholarCross Ref
- Yun Yu, James H Degnan, and Luay Nakhleh. 2012. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics 8, 4 (2012), e1002660.Google ScholarCross Ref
Index Terms
-
Phylogenetic Placement of Aligned Genomes and Metagenomes with Non-tree-like Evolutionary Histories
-
Recommendations
-
Computing the minimum number of hybridization events for a consistent evolutionary history
It is now well-documented that the structure of evolutionary relationships between a set of present-day species is not necessarily tree-like. The reason for this is that reticulation events such as hybridizations mean that species are a mixture of genes ...
-
Inferring models of rearrangements, recombinations, and horizontal transfers by the minimum evolution criterion
WABI'07: Proceedings of the 7th international conference on Algorithms in BioinformaticsThe evolution of viruses is very rapid and in addition to local point mutations (insertion, deletion, substitution) it also includes frequent recombinations, genome rearrangements, and horizontal transfer of genetic material. Evolutionary analysis of ...
-
Quantifying Reticulation in Phylogenetic Complexes Using Homology
BICT'15: Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies (formerly BIONETICS)Reticulate evolutionary processes result in phylogenetic histories that cannot be modeled using a tree topology.
Here, we apply methods from topological data analysis to molecular sequence data with reticulations.
Using a simple example, we demonstrate ...
Comments