skip to main content
10.1145/3584371.3612981acmconferencesArticle/Chapter ViewAbstractPublication PagesbcbConference Proceedingsconference-collections
research-article

Phylogenetic Placement of Aligned Genomes and Metagenomes with Non-tree-like Evolutionary Histories

Published:04 October 2023Publication History

ABSTRACT

Phylogenetic placement is the computational task that places a query taxon into a reference phylogeny using computational analysis of biomolecular sequence data or other evolutionary characters. A chief advantage of phylogenetic placement over one-shot phylogenetic reconstruction is greatly reduced computational requirements, and the former has been applied in many different topics in phylogenetics. One of the more recent applications has been enabled by rapid advances in biomolecular sequencing technology: classification of genomes, metagenomes, and metagenome-assembled genomes (MAGs) in large-scale datasets produced by next-generation sequencing. A number of methods have been developed for this purpose, and all share the common simplifying assumption that a phylogenetic tree suffices for modeling the evolutionary history of all genomes and/or metagenomes under study. Another parallel development in today's post-genomic era is a greater understanding of the prevalence and importance of non-tree-like evolution in the Tree of Life - the evolutionary history of all life on Earth - which in fact may not be a tree at all. More general graph representations such as phylogenetic networks have therefore been proposed, and a new generation of phylogenetic network reconstruction methods are under active development. But the simplifying assumption made by phylogenetic tree placement methods is fundamentally at odds with the non-tree-like evolutionary histories of many microbes and other organisms. The consequences of this conflict are poorly understood.

In this study, we conduct a comprehensive performance study to directly assess the impact of non-tree-like evolution on phylogenetic tree placement of genomes and metagenomes. Our study includes in silico simulation experiments as well as empirical data analyses. We find that the topological accuracy of phylogenetic tree placement degrades quickly as genomic sequence evolution becomes increasingly non-tree-like. We then introduce a new statistical method for phylogenetic network placement of genomes and metagenomes, which we refer to as NetPlacer version 0. Initial experiments with NetPlacer provide a proof-of-concept, but also point to the need for greater computational scalability. We conclude with thoughts on algorithmic techniques to enable fast and accurate phylogenetic network placement.

References

  1. Stephen F Altschul, Warren Gish, Webb Miller, Eugene W Myers, and David J Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology 215, 3 (1990), 403--410.Google ScholarGoogle ScholarCross RefCross Ref
  2. Francesco Asnicar, Andrew Maltez Thomas, Francesco Beghini, Claudia Mengoni, Serena Manara, Paolo Manghi, Qiyun Zhu, Mattia Bolzan, Fabio Cumbo, Uyen May, et al. 2020. Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0. Nature Communications 11, 1 (2020), 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  3. Metin Balaban, Yueyu Jiang, Daniel Roush, Qiyun Zhu, and Siavash Mirarab. 2022. Fast and accurate distance-based phylogenetic placement using divide and conquer. Molecular Ecology Resources 22, 3 (2022), 1213--1227.Google ScholarGoogle ScholarCross RefCross Ref
  4. Metin Balaban, Shahab Sarmashghi, and Siavash Mirarab. 2020. APPLES: scalable distance-based phylogenetic placement with or without alignments. Systematic Biology 69, 3 (2020), 566--578.Google ScholarGoogle ScholarCross RefCross Ref
  5. Pierre Barbera, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, and Alexandros Stamatakis. 2019. EPA-ng: massively parallel evolutionary placement of genetic sequences. Systematic Biology 68, 2 (2019), 365--369.Google ScholarGoogle ScholarCross RefCross Ref
  6. Holly M Bik, Dorota L Porazinska, Simon Creer, J Gregory Caporaso, Rob Knight, and W Kelley Thomas. 2012. Sequencing our way towards understanding global eukaryotic biodiversity. Trends in Ecology & Evolution 27, 4 (2012), 233--243.Google ScholarGoogle ScholarCross RefCross Ref
  7. David Bryant, Remco Bouckaert, Joseph Felsenstein, Noah A Rosenberg, and Arindam RoyChoudhury. 2012. Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Molecular Biology and Evolution 29, 8 (2012), 1917--1932.Google ScholarGoogle ScholarCross RefCross Ref
  8. James H Degnan and Laura A Salter. 2005. Gene tree distributions under the coalescent process. Evolution 59, 1 (2005), 24--37.Google ScholarGoogle ScholarCross RefCross Ref
  9. Casey W Dunn, Felipe Zapata, Catriona Munro, Stefan Siebert, and Andreas Hejnol. 2018. Pairwise comparisons across species are problematic when analyzing functional genomic data. Proceedings of the National Academy of Sciences 115, 3 (2018), E409--E417.Google ScholarGoogle ScholarCross RefCross Ref
  10. Robert Edgar. 2010. Usearch. Technical Report. Lawrence Berkeley National Lab.(LBNL), Berkeley, CA (United States).Google ScholarGoogle Scholar
  11. Joseph Felsenstein. 1985. Phylogenies and the comparative method. The American Naturalist 125, 1 (1985), 1--15.Google ScholarGoogle ScholarCross RefCross Ref
  12. William Fletcher and Ziheng Yang. 2009. INDELible: a flexible simulator of biological sequence evolution. Molecular Biology and Evolution 26, 8 (2009), 1879--1888.Google ScholarGoogle ScholarCross RefCross Ref
  13. Adrian Fritz, Peter Hofmann, Stephan Majda, Eik Dahms, Johannes Dröge, Jessika Fiedler, Till R Lesker, Peter Belmann, Matthew Z DeMaere, Aaron E Darling, et al. 2019. CAMISIM: simulating metagenomes and microbial communities. Microbiome 7, 1 (2019), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  14. Jotun Hein, Mikkel Schierup, and Carsten Wiuf. 2004. Gene Genealogies, Variation and Evolution: a Primer in Coalescent Theory. Oxford University Press, USA.Google ScholarGoogle Scholar
  15. Hussein A Hejase and Kevin J Liu. 2016. A scalability study of phylogenetic network inference methods using empirical datasets and simulations involving a single reticulation. BMC Bioinformatics 17, 1 (2016), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hussein A Hejase, Natalie VandePol, Gregory M Bonito, and Kevin J Liu. 2018. FastNet: fast and accurate statistical inference of phylogenetic networks using large-scale genomic sequence data. In Comparative Genomics: 16th International Conference, RECOMB-CG 2018, Magog-Orford, QC, Canada, October 9--12, 2018, Proceedings 16. Springer, 242--259.Google ScholarGoogle ScholarCross RefCross Ref
  17. Cody E Hinchliff, Stephen A Smith, James F Allman, J Gordon Burleigh, Ruchi Chaudhary, Lyndon M Coghill, Keith A Crandall, Jiabin Deng, Bryan T Drew, Romina Gazis, et al. 2015. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proceedings of the National Academy of Sciences 112, 41 (2015), 12764--12769.Google ScholarGoogle ScholarCross RefCross Ref
  18. Weichun Huang, Leping Li, Jason R Myers, and Gabor T Marth. 2012. ART: a next-generation sequencing read simulator. Bioinformatics 28, 4 (2012), 593--594.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Richard R Hudson. 2002. ms a program for generating samples under neutral models. Bioinformatics 18, 2 (2002), 337--338.Google ScholarGoogle ScholarCross RefCross Ref
  20. Doug Hyatt, Gwo-Liang Chen, Philip F LoCascio, Miriam L Land, Frank W Larimer, and Loren J Hauser. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 1 (2010), 1--11.Google ScholarGoogle ScholarCross RefCross Ref
  21. Kazutaka Katoh and Daron M Standley. 2013. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Molecular Biology and Evolution 30, 4 (2013), 772--780.Google ScholarGoogle ScholarCross RefCross Ref
  22. John Frank Charles Kingman. 1982. The coalescent. Stochastic Processes and Their Applications 13, 3 (1982), 235--248.Google ScholarGoogle ScholarCross RefCross Ref
  23. Vincent Lefort, Richard Desper, and Olivier Gascuel. 2015. FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Molecular Biology and Evolution 32, 10 (2015), 2798--2800.Google ScholarGoogle ScholarCross RefCross Ref
  24. Kevin Liu, Tandy J Warnow, Mark T Holder, Serita M Nelesen, Jiaye Yu, Alexandros P Stamatakis, and C Randal Linder. 2012. SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Systematic Biology 61, 1 (2012), 90.Google ScholarGoogle ScholarCross RefCross Ref
  25. James Mallet, Nora Besansky, and Matthew W Hahn. 2016. How reticulated are species? BioEssays 38, 2 (2016), 140--149.Google ScholarGoogle ScholarCross RefCross Ref
  26. Frederick A Matsen, Robin B Kodner, and E Armbrust. 2010. pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree. BMC Bioinformatics 11, 1 (2010), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  27. Chen Meng and Laura Salter Kubatko. 2009. Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theoretical Population Biology 75, 1 (2009), 35--45.Google ScholarGoogle ScholarCross RefCross Ref
  28. Siavash Mirarab, Nam Nguyen, and Tandy Warnow. 2012. SEPP: SATé-enabled phylogenetic placement. In Biocomputing 2012. World Scientific, 247--258.Google ScholarGoogle Scholar
  29. Luay Nakhleh. 2009. A metric on the space of reduced phylogenetic networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics 7, 2 (2009), 218--222.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Luay Nakhleh, Bernard ME Moret, Usman Roshan, Katherine St. John, Jerry Sun, and Tandy Warnow. 2001. The accuracy of fast phylogenetic methods for large datasets. In Biocomputing 2002. World Scientific, 211--222.Google ScholarGoogle Scholar
  31. Nam-phuong Nguyen, Siavash Mirarab, Bo Liu, Mihai Pop, and Tandy Warnow. 2014. TIPP: taxonomic identification and phylogenetic profiling. Bioinformatics 30, 24 (2014), 3548--3555.Google ScholarGoogle ScholarCross RefCross Ref
  32. Sergey Nurk, Dmitry Meleshko, Anton Korobeynikov, and Pavel A Pevzner. 2017. metaSPAdes: a new versatile metagenomic assembler. Genome Research 27, 5 (2017), 824--834.Google ScholarGoogle ScholarCross RefCross Ref
  33. Howard Ochman, Jeffrey G Lawrence, and Eduardo A Groisman. 2000. Lateral gene transfer and the nature of bacterial innovation. Nature 405, 6784 (2000), 299--304.Google ScholarGoogle Scholar
  34. F. Rodriguez, J.L. Oliver, A. Marin, and J.R. Medina. 1990. The general stochastic model of nucleotide substitution. Journal of Theoretical Biology 142 (1990), 485--501.Google ScholarGoogle ScholarCross RefCross Ref
  35. Luna L Sánchez-Reyes, Martha Kandziora, and Emily Jane McTavish. 2021. Physcraper: a Python package for continually updated phylogenetic trees using the Open Tree of Life. BMC Bioinformatics 22, 1 (2021), 1--13.Google ScholarGoogle ScholarCross RefCross Ref
  36. Michael J Sanderson. 2003. r8s: inferring absolute rates of molecular evolution and divergence times in the absence of a molecular clock. Bioinformatics 19, 2 (2003), 301--302.Google ScholarGoogle ScholarCross RefCross Ref
  37. Esther Singer, Bill Andreopoulos, Robert M Bowers, Janey Lee, Shweta Deshpande, Jennifer Chiniquy, Doina Ciobanu, Hans-Peter Klenk, Matthew Zane, Christopher Daum, et al. 2016. Next generation sequencing data of a defined microbial mock community. Scientific Data 3, 1 (2016), 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  38. Alexandros Stamatakis. 2014. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 9 (2014), 1312--1313.Google ScholarGoogle ScholarCross RefCross Ref
  39. Cuong Than, Derek Ruths, and Luay Nakhleh. 2008. PhyloNet: a software package for analyzing and reconstructing reticulate evolutionary relationships. BMC Bioinformatics 9 (2008), 1--16.Google ScholarGoogle ScholarCross RefCross Ref
  40. Todd J Treangen and Eduardo PC Rocha. 2011. Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes. PLoS Genetics 7, 1 (2011), e1001284.Google ScholarGoogle ScholarCross RefCross Ref
  41. Susannah Green Tringe and Edward M Rubin. 2005. Metagenomics: DNA sequencing of environmental samples. Nature Reviews Genetics 6, 11 (2005), 805--814.Google ScholarGoogle ScholarCross RefCross Ref
  42. Tandy Warnow. 2013. Large-scale multiple sequence alignment and phylogeny estimation. Models and Algorithms for Genome Evolution (2013), 85--146.Google ScholarGoogle Scholar
  43. Derrick E Wood and Steven L Salzberg. 2014. Kraken: ultrafast metagenomic sequence classification using exact alignments. Genome Biology 15, 3 (2014), 1--12.Google ScholarGoogle ScholarCross RefCross Ref
  44. Yun Yu, James H Degnan, and Luay Nakhleh. 2012. The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genetics 8, 4 (2012), e1002660.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Phylogenetic Placement of Aligned Genomes and Metagenomes with Non-tree-like Evolutionary Histories

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Conferences
              BCB '23: Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics
              September 2023
              626 pages
              ISBN:9798400701269
              DOI:10.1145/3584371

              Copyright © 2023 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 4 October 2023

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate254of885submissions,29%
            • Article Metrics

              • Downloads (Last 12 months)41
              • Downloads (Last 6 weeks)3

              Other Metrics

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader