Research paper
Rooted phylogeny of the three superkingdoms
Introduction
Charles Darwin's formulations of descent with modification and natural selection were rooted in the Nineteenth Century's Earth sciences and the perspective of long-term, continuous evolution of biological diversity inferred from the geological record [1], [2]. Nevertheless, a concurrent but contrasting view of abrupt evolutionary transition was championed by Georges Cuvier, who, among others, was influenced by observations of cataclysmic shifts of geological strata associated with abrupt breaks in the fossil succession [2]. Likewise, Louis Agassiz recognized the ice ages as powerful environmental signatures capable of punctuating the fossil record [2]. Though the Darwinian view now dominates evolutionary thought, there are six well-documented major global extinctions that have been identified in the most recent 800 MY of the geologic record [3], [4]. Five of these have been identified as global events in a paleontologic record that also is punctuated with numerous less extensive extinctions [3], [4]. The sixth event in the late Neoproterozoic: the so-called “Snowball Earth” may have produced survivors that were the ancestors to the Cambrian radiation [4]. Is the footprint of such a catastrophic event recognizable in the phylogenies of modern organisms?
When conditions are so inhospitable to life, as in the Snowball Earth scenario, the culling of species might be so extreme that few clades survive to propagate when conditions become more tolerable. If all but one or a few clades had been eliminated in such a mass extinction, the survivors though originally crown organisms would appear from a present day perspective to be the root ancestors of a new tree of phylogenetic diversity.
One sure sign that the present crown of the phylogenetic tree had been re-rooted and diversified after a mass extinction would be that an ancestor with an incongruously complex body plan, such as that of a frog, were identified at the root of the modern tree. Frogs are no one's idea of the first cellular common ancestor to the global phylogenetic tree. Equally incongruous would be a genome that encodes three fourths of all the compact protein domains so far identified in proteomes of the three modern superkingdoms. Such a genome is no one's idea of the genome of the ur-ancestor, i.e. the first cell and the root of Earth's first phylogenetic tree. It is just such an incongruously complex genome that we have reconstructed for the most recent universal common ancestor (MRUCA) of the modern crown of global phylogeny. Accordingly, we suggest that the modern crown is a re-diversified tree rooted in complex survivors of mass extinction events that occurred some time before the Cambrian radiation.
The data supporting these interpretations were obtained by phylogenetic analysis of roughly 1700 compact protein domains, each representing a cohort of structural and functional homologs that were identified by hidden Markov annotation at the level of superfamily in hundreds of genomes [5], [6]. The resulting genome content cladogram (tree) links archaea and bacteria (the akaryotes) as sister clades that diverge from a last akaryote common ancestor (LACA). In parallel, several eukaryote sister clades diverge independently from a last eukaryote common ancestor (LECA). Here, LACA and LECA diverge independently from a more complex MRUCA. Reconstructions of the proteomes of the three ancestors, LACA, LECA and MRUCA confirm the independent divergence of akaryotes and eukaryotes.
Speculations concerning the endosymbiotic origins of mitochondria and chloroplasts based on the previous identification of bacteria as the root of sequence-based gene trees [7], [8], [9], [10] are not supported by the present data. Instead, genome content-based trees confirm the numerous challenges to the bacterial rooting of modern phylogeny along with the rejection of the evolutionary schemes such trees claim to support [11], [12], [13], [14], [15], [16], [17], [18], [19]. In brief, the data suggest that most of the protein elements necessary for the construction of cells of the three superkingdoms, including eukaryote organelles were already expressed in the bottlenecked population that re-rooted the phylogenetic tree following a cataclysmic collapse of the biosphere. According to our phylogenetic reconstructions, bacteria and archaea are not identifiable as ancestors to eukaryotes. Instead they diverge from a common ancestor independently of the eukaryotes as highly specialized, fast growing unicellular organisms that have evolved efficient simplicity as the hallmarks of their cellular architectures [20], [21] to survive predation by their relatively complex eukaryote cousins [19].
Section snippets
Data sources
Structural and functional annotations of proteins from completely sequenced genomes were obtained from the SUPERFAMILY (1.75) database. Here, annotations are based on hidden Markov models (HMM) that identify recurrent protein domains at the superfamily level of the SCOP (Structural Classification of Proteins) hierarchy [22]. In this hierarchy, the domains correspond to stable tertiary folds that have been identified by X-ray crystallographic and/or NMR spectroscopic methods [22]. At the
Phylogenomic approach
One general reason for abandoning sequence-based reconstructions for deep rooting of phylogeny is that contrary to their label, they are not strictly speaking “sequenced-based”. Instead, most of the sequence information is lost in reconstructions because they are in reality “alignment composition-based”, which enhances their vulnerability to distortion over long evolutionary distances [14], [32], [33]. We have chosen instead to reconstruct phylogeny based on genome content of SFs for several
A view from the crown
The present genome content trees (Fig. 5) identify archaea and bacteria (akaryotes) as sister clades that diverge from an akaryote common ancestor, LACA. Several eukaryote sister clades diverge from a eukaryote common ancestor, LECA. In effect, LACA and LECA descend independently in parallel from the most recent universal common ancestor (MRUCA), which is not a bacterium but a very complex ancestor with a proteome featuring homologies to many eukaryote SFs as well as to many akaryote SFs.
The
Acknowledgments
We thank Minglei Wang, K. M. Kim, and G. Caetano-Anolles for teaching us about superfamilies; S. G. E. Andersson, Otto Berg, Björn Canbäck, M. A. Huynen, David Penny, Susannah Porter and I. Winkler for often scathing criticism; the Swedish Science Council (VR) for support to A. T.; the Nobel Committee for Chemistry of the Royal Swedish Science Academy and the Royal Physiographic Society, Lund for support to CGK.
References (58)
- et al.
SCOP: a structural classification of proteins database for the investigation of sequences and structures
J. Mol. Biol.
(1995) - et al.
Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes
Mol. Phylogenet. Evol.
(1999) - et al.
Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure
J. Mol. Biol.
(2001) The holy grail of the perfect character: the cladistic treatment of morphometric data
Cladistics
(1993)- et al.
The origins of modern proteomes
Biochimie
(2007) Evolutionary aspects of whole-genome biology
Curr. Opin. Struct. Biol.
(2005)- et al.
Lateral gene transfer
Curr. Biol.
(2011) - et al.
A minimal estimate for the gene content of the last universal common ancestor – exobiology from a terrestrial perspective
Res. Microbiol.
(2006) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life
(1859)The Growth of Biological Thought
(1982)
Biodiversity; past, present, and future
J. Paleontol.
A neoproterozoic snowball earth
Science
Structural and functional constraints in the evolution of protein families
Nat. Rev. Mol. Cell. Biol.
The root of the universal tree and the origin of eukaryotes based on elongation factor phylogeny
Proc. Nat. Acad. Sci. U.S.A.
On the origin of mitosing cells
J. Theor. Biol.
The hydrogen hypothesis for the first eukaryote
Nature
Where is the root of the universal tree of life?
BioEssays
The rooting of the universal tree of life is not reliable
J. Mol. Evol.
Evolutionary Genomics Leads the Way, Evolutionary Genomics and Systems Biology
Bushes in the tree of life
PLoS Biol.
Why would phylogeneticists ignore computerized sequence alignment?
Syst. Biol.
Resolving difficult phylogenetic questions: why more sequences are not enough
PLoS Biol.
The origin of eukaryotes and their relationship with the archaea: are we at a phylogenomic impasse?
Nat. Rev. Microbiol.
Genomics and the irreducible nature of eukaryote cells
Science
Costs of accuracy determined by a maximal growth rate constraint
Q. Rev. Biophys.
Translational accuracy and the fitness of bacteria
Annu. Rev. Genet.
Phylogeny determined by protein domain content
Proc. Nat. Acad. Sci. U.S.A.
Structure is three to ten times more conserved than sequence - a study of structural response in protein cores
Proteins
Cited by (34)
-
The elements of life: A biocentric tour of the periodic table
2023, Advances in Microbial Physiology -
Mitochondria are not captive bacteria
2017, Journal of Theoretical BiologyCitation Excerpt :Thus, the UCA protein-domain proteomes are based on Venn diagrams of shared superfamilies (Fig. 2B) and rootings of both the empirical Sankoff parsimony method (Harish and Kurland, 2017a; Harish et al., 2013) and a nonstationary Bayesian reconstruction (Harish and Kurland, 2017b). We cannot speak for Ouzounis et al. (2006) but the discovery that UCA did not feature a small primitive akaryote (prokaryote) genome (Woese, 1998b; Woese and Fox, 1977) came as a surprise to us (Harish et al., 2013). UCA is in fact the most diverse proteome of protein domains in the ToL (Harish et al., 2013; here, Fig. 6).
-
Protein lipograms
2017, Journal of Theoretical Biology -
Empirical genome evolution models root the tree of life
2017, Biochimie -
Akaryotes and Eukaryotes are independent descendants of a universal common ancestor
2017, BiochimieCitation Excerpt :An important obstacle to the rooting of characters in a ToL is that appropriate outgroup species or other “external” data are conspicuously absent. Nevertheless, it is possible to implement phylogenetic models of evolution that describe non-reversible and non-stationary processes of evolution in order to root a ToL [27–30]. Our approach to rooting the ToL is based on genome content of protein domains (superfamilies) that are reconstructed in a generalized (Sankoff) parsimony model that specifies asymmetric state transition “costs”.