Elsevier

Biochimie

Volume 95, Issue 8, August 2013, Pages 1593-1604
Biochimie

Research paper
Rooted phylogeny of the three superkingdoms

https://doi.org/10.1016/j.biochi.2013.04.016 Get rights and content

Highlights

  • Three superkingdom phylogeny was inferred from genome content of protein domains.

  • The most recent universal common ancestor, MRUCA is the complex root of the crown.

  • Eukaryotes evolve from MRUCA, not from bacteria or archaea.

  • MRUCA diverged primarily by reductive loss and duplication of protein domains.

  • There is little genesis of novel domains in the path from MRUCA to the crown.

Abstract

The traditional bacterial rooting of the three superkingdoms in sequence-based gene trees is inconsistent with new phylogenetic reconstructions based on genome content of compact protein domains. We find that protein domains at the level of the SCOP superfamily (SF) from sequenced genomes implement with maximum parsimony fully resolved rooted trees. Such genome content trees identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. LACA and LECA descend in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium. Rather, MRUCA presents 75% of the unique SFs encoded by extant genomes of the three superkingdoms, each encoding a proteome that partially overlaps all others. This alone implies that the common ancestor to the superkingdoms was very complex. Such ancestral complexity is confirmed by phylogenetic reconstructions. In addition, the divergence of proteomes from the complex ancestor in each superkingdom is both reductive in numbers of unique SFs as well as cumulative in the abundance of surviving SFs. These data suggest that the common ancestor was not the first cell lineage and that modern global phylogeny is the crown of a “recently” re-rooted tree. We suggest that a bottlenecked survivor of an environmental collapse, which preceded the flourishing of the modern crown, seeded the current phylogenetic tree.

Introduction

Charles Darwin's formulations of descent with modification and natural selection were rooted in the Nineteenth Century's Earth sciences and the perspective of long-term, continuous evolution of biological diversity inferred from the geological record [1], [2]. Nevertheless, a concurrent but contrasting view of abrupt evolutionary transition was championed by Georges Cuvier, who, among others, was influenced by observations of cataclysmic shifts of geological strata associated with abrupt breaks in the fossil succession [2]. Likewise, Louis Agassiz recognized the ice ages as powerful environmental signatures capable of punctuating the fossil record [2]. Though the Darwinian view now dominates evolutionary thought, there are six well-documented major global extinctions that have been identified in the most recent 800 MY of the geologic record [3], [4]. Five of these have been identified as global events in a paleontologic record that also is punctuated with numerous less extensive extinctions [3], [4]. The sixth event in the late Neoproterozoic: the so-called “Snowball Earth” may have produced survivors that were the ancestors to the Cambrian radiation [4]. Is the footprint of such a catastrophic event recognizable in the phylogenies of modern organisms?

When conditions are so inhospitable to life, as in the Snowball Earth scenario, the culling of species might be so extreme that few clades survive to propagate when conditions become more tolerable. If all but one or a few clades had been eliminated in such a mass extinction, the survivors though originally crown organisms would appear from a present day perspective to be the root ancestors of a new tree of phylogenetic diversity.

One sure sign that the present crown of the phylogenetic tree had been re-rooted and diversified after a mass extinction would be that an ancestor with an incongruously complex body plan, such as that of a frog, were identified at the root of the modern tree. Frogs are no one's idea of the first cellular common ancestor to the global phylogenetic tree. Equally incongruous would be a genome that encodes three fourths of all the compact protein domains so far identified in proteomes of the three modern superkingdoms. Such a genome is no one's idea of the genome of the ur-ancestor, i.e. the first cell and the root of Earth's first phylogenetic tree. It is just such an incongruously complex genome that we have reconstructed for the most recent universal common ancestor (MRUCA) of the modern crown of global phylogeny. Accordingly, we suggest that the modern crown is a re-diversified tree rooted in complex survivors of mass extinction events that occurred some time before the Cambrian radiation.

The data supporting these interpretations were obtained by phylogenetic analysis of roughly 1700 compact protein domains, each representing a cohort of structural and functional homologs that were identified by hidden Markov annotation at the level of superfamily in hundreds of genomes [5], [6]. The resulting genome content cladogram (tree) links archaea and bacteria (the akaryotes) as sister clades that diverge from a last akaryote common ancestor (LACA). In parallel, several eukaryote sister clades diverge independently from a last eukaryote common ancestor (LECA). Here, LACA and LECA diverge independently from a more complex MRUCA. Reconstructions of the proteomes of the three ancestors, LACA, LECA and MRUCA confirm the independent divergence of akaryotes and eukaryotes.

Speculations concerning the endosymbiotic origins of mitochondria and chloroplasts based on the previous identification of bacteria as the root of sequence-based gene trees [7], [8], [9], [10] are not supported by the present data. Instead, genome content-based trees confirm the numerous challenges to the bacterial rooting of modern phylogeny along with the rejection of the evolutionary schemes such trees claim to support [11], [12], [13], [14], [15], [16], [17], [18], [19]. In brief, the data suggest that most of the protein elements necessary for the construction of cells of the three superkingdoms, including eukaryote organelles were already expressed in the bottlenecked population that re-rooted the phylogenetic tree following a cataclysmic collapse of the biosphere. According to our phylogenetic reconstructions, bacteria and archaea are not identifiable as ancestors to eukaryotes. Instead they diverge from a common ancestor independently of the eukaryotes as highly specialized, fast growing unicellular organisms that have evolved efficient simplicity as the hallmarks of their cellular architectures [20], [21] to survive predation by their relatively complex eukaryote cousins [19].

Section snippets

Data sources

Structural and functional annotations of proteins from completely sequenced genomes were obtained from the SUPERFAMILY (1.75) database. Here, annotations are based on hidden Markov models (HMM) that identify recurrent protein domains at the superfamily level of the SCOP (Structural Classification of Proteins) hierarchy [22]. In this hierarchy, the domains correspond to stable tertiary folds that have been identified by X-ray crystallographic and/or NMR spectroscopic methods [22]. At the

Phylogenomic approach

One general reason for abandoning sequence-based reconstructions for deep rooting of phylogeny is that contrary to their label, they are not strictly speaking “sequenced-based”. Instead, most of the sequence information is lost in reconstructions because they are in reality “alignment composition-based”, which enhances their vulnerability to distortion over long evolutionary distances [14], [32], [33]. We have chosen instead to reconstruct phylogeny based on genome content of SFs for several

A view from the crown

The present genome content trees (Fig. 5) identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. In effect, LACA and LECA descend independently in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium but a very complex ancestor with a proteome featuring homologies to many eukaryote SFs as well as to many akaryote SFs.

The

Acknowledgments

We thank Minglei Wang, K. M. Kim, and G. Caetano-Anolles for teaching us about superfamilies; S. G. E. Andersson, Otto Berg, Björn Canbäck, M. A. Huynen, David Penny, Susannah Porter and I. Winkler for often scathing criticism; the Swedish Science Council (VR) for support to A. T.; the Nobel Committee for Chemistry of the Royal Swedish Science Academy and the Royal Physiographic Society, Lund for support to CGK.

References (58)

  • J.J. Sepkoski

    Biodiversity; past, present, and future

    J. Paleontol.

    (1997)
  • P.F. Hoffman et al.

    A neoproterozoic snowball earth

    Science

    (1998)
  • C.L. Worth et al.

    Structural and functional constraints in the evolution of protein families

    Nat. Rev. Mol. Cell. Biol.

    (2009)
  • S.L. Baldauf et al.

    The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny

    Proc. Nat. Acad. Sci. U.S.A.

    (1996)
  • T. Hashimoto, M. Hasegawa, Origin and early evolution of eukaryotes inferred from the amino acid sequences of...
  • L. Sagan

    On the origin of mitosing cells

    J. Theor. Biol.

    (1967)
  • W. Martin et al.

    The hydrogen hypothesis for the first eukaryote

    Nature

    (1998)
  • P. Forterre et al.

    Where is the root of the universal tree of life?

    BioEssays

    (1999)
  • H. Philippe et al.

    The rooting of the universal tree of life is not reliable

    J. Mol. Evol.

    (1999)
  • D. Penny et al.

    Evolutionary Genomics Leads the Way, Evolutionary Genomics and Systems Biology

    (2010)
  • A. Rokas et al.

    Bushes in the tree of life

    PLoS Biol.

    (2006)
  • D.A. Morrison

    Why would phylogeneticists ignore computerized sequence alignment?

    Syst. Biol.

    (2009)
  • H. Philippe et al.

    Resolving difficult phylogenetic questions: why more sequences are not enough

    PLoS Biol.

    (2011)
  • S. Gribaldo et al.

    The origin of eukaryotes and their relationship with the archaea: are we at a phylogenomic impasse?

    Nat. Rev. Microbiol.

    (2010)
  • C.G. Kurland et al.

    Genomics and the irreducible nature of eukaryote cells

    Science

    (2006)
  • M. Ehrenberg et al.

    Costs of accuracy determined by a maximal growth rate constraint

    Q. Rev. Biophys.

    (1984)
  • C.G. Kurland

    Translational accuracy and the fitness of bacteria

    Annu. Rev. Genet.

    (1992)
  • S. Yang et al.

    Phylogeny determined by protein domain content

    Proc. Nat. Acad. Sci. U.S.A.

    (2005)
  • K. Illergård et al.

    Structure is three to ten times more conserved than sequence - a study of structural response in protein cores

    Proteins

    (2009)
  • Cited by (34)

    • The elements of life: A biocentric tour of the periodic table

      2023, Advances in Microbial Physiology
    • Mitochondria are not captive bacteria

      2017, Journal of Theoretical Biology
      Citation Excerpt :

      Thus, the UCA protein-domain proteomes are based on Venn diagrams of shared superfamilies (Fig. 2B) and rootings of both the empirical Sankoff parsimony method (Harish and Kurland, 2017a; Harish et al., 2013) and a nonstationary Bayesian reconstruction (Harish and Kurland, 2017b). We cannot speak for Ouzounis et al. (2006) but the discovery that UCA did not feature a small primitive akaryote (prokaryote) genome (Woese, 1998b; Woese and Fox, 1977) came as a surprise to us (Harish et al., 2013). UCA is in fact the most diverse proteome of protein domains in the ToL (Harish et al., 2013; here, Fig. 6).

    • Protein lipograms

      2017, Journal of Theoretical Biology
    • Akaryotes and Eukaryotes are independent descendants of a universal common ancestor

      2017, Biochimie
      Citation Excerpt :

      An important obstacle to the rooting of characters in a ToL is that appropriate outgroup species or other “external” data are conspicuously absent. Nevertheless, it is possible to implement phylogenetic models of evolution that describe non-reversible and non-stationary processes of evolution in order to root a ToL [27–30]. Our approach to rooting the ToL is based on genome content of protein domains (superfamilies) that are reconstructed in a generalized (Sankoff) parsimony model that specifies asymmetric state transition “costs”.

    View all citing articles on Scopus
    View full text