Research Article

EVOLUTIONARY BIOLOGY

Microbial genomic trait evolution is dominated by frequent and rare pulsed evolution

Yingnan Gao https://orcid.org/0000-0002-0960-4519 and Martin Wu https://orcid.org/0000-0003-3093-4077 [email protected]Authors Info & Affiliations

Science Advances

15 Jul 2022

Vol 8, Issue 28

DOI: 10.1126/sciadv.abn1916

Abstract

On the macroevolutionary time scale, does trait evolution proceed gradually or by rapid bursts (pulses) separated by prolonged periods of stasis or slow evolution? Although studies have shown that pulsed evolution is prevalent in animals, our knowledge about the tempo and mode of evolution across the tree of life is very limited. This long-standing debate calls for a test in bacteria and archaea, the most ancient and diverse forms of life with unique population genetic properties. Using a likelihood-based framework, we show that pulsed evolution is not only present but also prevalent and predominant in microbial genomic trait evolution. We detected two distinct types of pulsed evolution (small frequent and large rare jumps) that are predicted by the punctuated equilibrium and quantum evolution theories. Our findings suggest that major bacterial lineages could have originated in quick bursts and that pulsed evolution is a common theme across the tree of life.

INTRODUCTION

There has been a long-standing debate about the tempo and mode of trait evolution on the macroevolutionary time scale. The gradualism theory states that evolution occurs gradually by small changes that accumulate over a long period of time (1). The pulsed evolution theory, on the other hand, argues that evolution mostly proceeds in bursts of larger changes (jumps) separated by long periods of stasis or slow evolution (1–3). Two types of jumps have been proposed in pulsed evolution. Simpson’s quantum evolution theory postulates that jumps happen when lineages shift into new adaptive zones and these jumps play an important role in the origination of higher taxa (2), while Eldredge and Gould’s later punctuated equilibrium theory focuses exclusively on jumps associated with speciation (1). Conceptually, these two types of jumps exist side by side but differ in their frequencies and magnitudes. Studies of animal fossil records support the punctuated equilibrium theory (4, 5), and more recent phylogenetic comparative studies of vertebrate body size (6–8) also provide evidence for quantum evolution. Together, they show that evolution is composed of not only slow and gradual changes but also instant jumps on the macroevolution time scale.

Analogous studies in bacteria and archaea, the most ancient and diverse forms of life on Earth, are lacking, largely because of the scarcity of fossil records and well-measured quantitative phenotypic traits in microbes. Fortunately, the phenotypic evolution of microbial species can be reconstructed from extant genome sequences. Several genomic features are highly correlated with the microbial life strategy. For example, the GC% (% guanine and cytosine content) of the ribosomal RNA (rRNA) gene is correlated with the optimal growth temperature of bacteria and archaea (9). According to the genome streamlining theory, genome size, genomic GC%, and nitrogen content of proteins all evolve in response to changes in nutrient levels in the environment (10, 11). These genomic features can be accurately determined from thousands of complete genomes now available that represent a broad range of closely and distantly related lineages, making it possible to study the tempo and mode of trait evolution in microbes over a broad spectrum of macroevolution time scales.

Long-term experimental evolution has shown evidence of pulsed evolution in Escherichia coli cell size (12). However, on the macroevolutionary time scale, the role of pulsed evolution in microbial trait evolution remains largely unknown. Although it is well known that there are large trait changes between bacterial clades (e.g., the genomic GC% of high-GC versus low-GC Gram positives, and AT (adenine-thymine)-rich obligate intracellular bacteria versus their free-living relatives) (13, 14), it is unclear whether these large trait changes arose gradually or rapidly by jumps during the time the clades diverged from each other. Compared to animals and plants, bacteria and archaea reproduce asexually and have relatively large population sizes, high dispersal rates, and short generation times. Another salient feature unique to microbes is that their genomes can often leap by large-scale horizontal gene transfers (HGTs) (15), which obviously will have an impact on the tempo and mode of evolution. Given these unique features, a central question is whether the tempo and mode of microbial trait evolution are similar to those of eukaryotes and whether pulsed evolution is a universal theme across the tree of life, and, if so, to what extent pulsed evolution contributes to microbial trait evolution.

RESULTS

Gradual evolution does not explain microbial genomic trait evolution

We downloaded 10,616 and 263 complete bacterial and archaeal genomes, respectively, from the National Center for Biotechnology Information (NCBI) RefSeq database, from which we reconstructed genome trees and selected 6668 and 247 representative genomes that passed quality control (Materials and Methods). For each representative genome, we calculated four genomic traits [genomic GC%, rRNA GC%, genome size, and the average nitrogen atoms per residual side chain (N-ARSC)], all of which showing strong phylogenetic signals (Pagel’s λ > 0.99). Using phylogenetically independent contrast (PIC) of tip pairs, we found that, although the four genomic traits are significantly correlated as previously reported (10, 16), the correlation appears to be weak (fig. S1), with the proportion of variance explained (PVE) by the other traits being less than 13.5% for all four traits. Therefore, to capture the possible variation in the tempo and mode of evolution, we chose to test each of the four traits separately. Notably, the PIC distributions of these traits in bacteria drastically deviate from the normal distribution expected by the Brownian motion (BM) model of gradual evolution (Fig. 1, A to D; two-sided Kolmogorov-Smirnov test, P < 0.001 for all four traits). Specifically, all PIC distributions exhibit a strong leptokurtic (heavy-tailed) pattern with a positive excess kurtosis ranging from 5.79 to 13.47, indicating that extremely rapid large trait changes (pulses) occur more frequently than expected by the BM model. For archaea, the PIC distribution also deviates from the normal expectation (fig. S2, A to D; two-sided Kolmogorov-Smirnov test, P < 0.001, P = 0.024, P = 0.018, and P = 0.155 for rRNA GC%, genomic GC%, genome size, and N-ARSC, respectively), with the excess kurtosis ranging from 1.47 to 7.58. Although inconsistent with BM, such a heavy-tailed pattern can be explained by pulsed evolution. Extremely rapid large trait changes (|PIC| > 3) take place more frequently than expected by the normal distribution (0.27%) throughout the bacterial evolutionary history (fig. S3), suggesting repeated episodes of pulsed trait evolution.

Fig. 1. Pulsed evolution models fit bacterial trait evolution better than the BM model and the variable rate model.

(A to D) PIC distributions (black bars) deviate significantly from the normal distribution of the BM model (blue line). The pulsed evolution models that include two or three Poisson processes (PE2 or PE3, magenta line) greatly improve the fit to the overall PIC distributions. The variable rate model (cyan line) also improves the fit to the overall PIC distribution. Square root transformation is applied to the y axis (density) to better show the deviation in the frequency of large PICs. (E to H) Patterns of bacterial trait changes at different branch lengths. Trait changes derived from the bacterial phylogeny are shown in black dots. Trait differences between genomes separated by zero branch length are shown in blue dots. The expected 95% confidence intervals of the models are shown in colored lines (blue for the BM model, magenta for the pulsed evolution model, and cyan for the variable rate model). Pseudo-log transformation is applied to the y axis (trait change) to better show the trend of trait change in short branches.

Modeling microbial genomic trait evolution

When plotted against the branch length, the trait changes between two sister nodes in the bacteria phylogeny display a “blunderbuss pattern” (Fig. 1, E to H). It starts with a period of stationary fluctuations where trait changes are bounded and the variance does not accumulate over time. Segmented linear regression analysis indicates that this phase of stasis lasts until the branch length reaches 0.001 substitutions per site for rRNA GC% (fig. S4). On longer time scales, the stasis yields to a pattern of increasing divergence over time. The archaeal traits display similar patterns (fig. S2, E to H). This blunderbuss pattern, first observed in the evolution of vertebrate body size, is a signature of pulsed evolution (6). For rRNA GC%, we observed a second spike in the trait divergence rate at 0.025 substitutions per site (fig. S4), indicating a change of evolution tempo at this point.

To formally test whether pulsed evolution explains the patterns, we model the trait change between two sister nodes using the Levy process (8). More specifically, we model the trait change as the sum of three independent stochastic variables: pulsed evolution, gradual evolution, and time-independent trait variation. We assume that pulsed evolution occurs at a constant rate relative to the molecular divergence and the jump size follows a normal distribution with a mean of zero. As a result, the pulsed evolution is modeled as a compound Poisson process with normal jumps, with parameters λ and σ² denoting the frequency (number of expected jumps per lineage per unit branch length) and the magnitude (variance of trait change) of the jumps, respectively. We model gradual evolution using the classic BM model with a single parameter

σ_{BM}^{2}

denoting the rate of the gradual trait change. Meanwhile, we observed trait variation between genomes separated by zero branch length, indicating the presence of time-independent variation in our phylogeny. Because this variation follows a leptokurtic distribution, we model the time-independent variation with the Laplace distribution with one single parameter ε denoting its variance for simplicity and convenience. It should be noted that a jump in the genomic trait may be coupled with an increase in the molecular divergence rate, especially for those traits affecting protein sequences (e.g., genomic GC% and N-ARSC). However, such correlation between the molecular branch length and the trait change will only reduce the signal of pulsed evolution, as the increased branch length provides greater power for gradual evolution to explain the trait variation (fig. S5).

The changing tempos revealed by segmented linear regression suggest that one Poisson process may not adequately describe the patterns of pulsed evolution, prompting us to add multiple Poisson processes (with different jump magnitudes) to our modeling. Therefore, using the framework described above, we tested seven different models (Table 1). The BM model delineates gradual evolution with a constant rate, while the VRG (variable rate model with Gamma distribution) model describes gradual evolution with continuous variable rates. The PE1, PE2, and PE3 models describe pulsed evolution with one, two, or three Poisson processes, respectively. The PE(n) + BM models represent trait evolution with both pulsed and gradual evolution. Details of these models and the maximum likelihood (ML) framework are provided in Supplementary Text. Using simulated data, we show that our ML framework can distinguish the BM, continuous variable rate, and pulsed evolution models (table S1) and capture the frequency and magnitude of jumps (table S2). Among pulsed evolution models, our ML framework can distinguish models with one Poisson process from those with more than one Poisson process, but it favors models with a BM component and tends to underestimate the contribution of pulsed evolution (table S1). This is because it is difficult to distinguish between frequent jumps and gradual evolution on long branches (fig. S5).

Models	Free parameters
Brownian motion (BM)	$σ_{BM}^{2}, ε$
Variable rate model with Gamma distribution (VRG)	p,k,θ,ε
One Poisson process (PE1)	$λ_{1}, σ_{1}^{2}, ε$
One Poisson process and Brownian motion (PE1 + BM)	$σ_{BM}^{2}, λ_{1}, σ_{1}^{2}, ε$
Two Poisson processes (PE2)	$λ_{1}, σ_{1}^{2}, λ_{2}, σ_{2}^{2}, ε$
Two Poisson processes and Brownian motion (PE2 + BM)	$σ_{BM}^{2}, λ_{1}, σ_{1}^{2}, λ_{2}, σ_{2}^{2}, ε$
Three Poisson processes (PE3)	$λ_{1}, σ_{1}^{2}, λ_{2}, σ_{2}^{2}, λ_{3}, σ_{3}^{2}, ε$

Table 1. The seven trait evolution models tested in this study.

Microbial trait evolution is dominated by frequent and rare jumps

For the four traits that we have examined in bacteria and archaea, the best model is always the one with a pulsed evolution component. The relative support for the gradual evolution models is marginal, with Akaike information criterion (AIC) weights for the BM model < 0.5% and those for the VRG model < 2.1% (Table 2). Both the pulsed evolution and VRG models fit the overall PIC distributions better than the BM (Fig. 1, A to D, and fig. S2, A to D). The pulsed evolution model also fits the patterns of genomic GC% and rRNA GC% changes at different branch lengths better than the BM and VRG models (Fig. 1, E and F, and fig. S2, E and F). The strong support by AIC and improved fit to the PIC pattern across different branch lengths suggest that pulsed evolution is present in both bacterial and archaeal genomic traits. To test the prevalence of pulsed evolution in bacteria, we separately fitted our models in 17 bacterial families that each contained at least 100 genomes. We found that trait evolution in 76.5, 88.2, 52.9, and 23.5% of tested families were best explained by a model with a pulsed evolution component (PE1, PE1 + BM, PE2, PE2 + BM, or PE3) for rRNA GC%, genomic GC%, genome size, and N-ARSC, respectively, indicating that pulsed evolution is prevalent in bacteria (table S3). To examine the effect of sample size on the power of detecting pulsed evolution, we simulated trait evolution under the pulsed evolution model and conducted model selection on the simulated data. Our simulation shows that, when the number of genomes decreases, the power to detect pulsed evolution also decreases (table S4), suggesting that we might have underestimated the prevalence of pulsed evolution in the 17 bacterial families. We did not test the prevalence of pulsed evolution in archaea because of the insufficient number of archaeal genomes at the lower taxonomic levels.

Domain	Trait	BM	VRG	PE1	PE1 + BM	PE2	PE2 + BM	PE3
Bacteria	rRNA GC%	−38,431	−41,519	−40,747	−41,171	−41,591	−41,589	−41,587
	Genomic GC%	−28,539	−30,432	−30,921	−31,171	−31,299	−31,297	−31,295
	Genome size	−15,572	−15,990	−15,929	−16,022	−16,075	−16,090	−16,107
	N-ARSC	−41,944	−42,092	−42,123	−42,214	−42,214	−42,218	−42,217
Archaea	rRNA GC%	−483	−601	−606	−636	−642	−640	−638
	Genomic GC%	−440	−469	−488	−503	−504	−502	−500
	Genome size	−348	−351	−359	−357	−355	−353	−351
	N-ARSC	−1202	−1205	−1212	−1211	−1210	−1208	−1206

Table 2. AIC values for each model fitted for bacterial and archaeal trait evolution.

The AIC values for the best model and models that are not significantly inferior (AIC change < 2) in each trait are in bold.

Using parameters of the best models (Table 3), we estimated the relative contribution of each compound Poisson process. The variable ε represents the trait variance in the initial stasis phase (Fig. 1). Its estimated value approximates the intraspecific trait variation between genomes with identical marker gene alignments (i.e., zero branch length) and therefore is used as the baseline. The jumps vary greatly in their frequencies and magnitudes but can be roughly classified into two types: small and frequent, or large and rare (Table 3). For example, for rRNA GC%, rare jumps (1.96 jumps per lineage per unit branch length) are extremely large in magnitude, as the SD of trait change introduced by one rare jump is approximately 60 times of

\sqrt{ε}

, or roughly equivalent to that introduced by 700 million years (0.35 substitutions per site) of gradual evolution under the BM model, and approximately corresponds to 5.8°C change in the optimal growth temperature. In comparison, the frequent jumps (118 jumps per lineage per unit branch length) are 60 times more frequent, but their sizes are only about three times of

\sqrt{ε}

. In terms of the absolute change in the trait value, a rare jump between two closely related genomes can cause an rRNA stem GC% change of up to 1.8%, while under the gradual BM model, the expected change in rRNA stem GC% will only be about 0.1%. Overall, rare jumps predominate in trait evolution as they contribute more than 74% of variation in each trait over the whole phylogeny. Similarly, pulsed evolution also predominates in archaea as the PE1 and PE2 models are the best models in all archaeal traits (Table 2). Because of the limited number of archaeal genomes, we cannot robustly estimate the parameters of each jump process in archaea.

Trait	Jump type	Jump rate (lineage⁻¹ unit branch length⁻¹)	Relative jumps size	Jump contribution	Time-independent variance
rRNA GC%	Rare	1.96 (1.62–2.79)	57.8 (50.2–63.6)	85.9% (83.8–88.3%)	1.81 (1.43–2.32) × 10⁻⁶
rRNA GC%	Frequent	118 (84.8–150)	3.0 (2.6–3.6)	14.1% (11.7–16.2%)	1.81 (1.43–2.32) × 10⁻⁶
Genomic GC%	Rare	6.31 (5.42–7.34)	34.2 (31.0–37.3)	92.7% (91.4–93.7%)	1.36 (1.12–1.70) × 10⁻⁵
Genomic GC%	Frequent	167 (101–261)	1.9 (1.4–2.4)	7.3% (6.3–8.6%)	1.36 (1.12–1.70) × 10⁻⁵
Genome size	Super rare	0.169 (0.06–0.26)	22.1 (19.3–35.9)	38.9% (30.3–49.1%)	1.40 (1.32–1.55) × 10⁻³
	Rare	5.89 (3.99–11.8)	3.7 (2.7–5.1)	39.1% (31.0–49.9%)
	Frequent	115 (48.3–271)	0.6 (0.4–1.0)	22.0% (14.8–31.5%)
N-ARSC	Super rare	0.367 (0.11–3.85)	7.8 (6.1–11.1)	19.8% (10.4–67.3%)	3.54 (3.14–3.80) × 10⁻⁵
	Rare	7.82 (5.06–15.5)	2.8 (2.2–3.7)	54.1% (33.6–72.1%)
	Frequent	634 (274–1610)	0.2 (0.1–0.3)	26.1% (17.1–33.8%)

Table 3. Model statistics of pulsed evolution in different bacterial traits.

The 95% confidence interval for each model statistic is listed in the parentheses after the statistic.

To evaluate the effect of the tree topology on our model fitting, we fitted models on the genome tree of the family Enterobacteriaceae (with 748 genomes) made using either FastTree or RAxML. We found that the fitted model parameters are highly similar using these two trees (table S5). We also validated our fitted model with the bacterial phylogeny downloaded from Genome Taxonomy Database (GTDB) (17) and found that pulsed evolution models with at least two Poisson processes are also the best models (table S6). Because all the models tested in this study are time reversible, rooting of the phylogeny at different points does not affect the results.

Rare jumps are correlated between traits and with cladogenesis in bacteria

We use simulation to test whether we can detect specific jumps along the phylogeny using the posterior probability estimated under pulsed evolution models. Because pulsed evolution models with at least two Poisson processes are the best models for bacterial genomic trait evolution, we simulated trait evolution under the PE2 model. We found that posterior probability calculated under the pulsed evolution model can predict both frequent and rare jumps very well, with the receiver operating characteristic (ROC) curves having an average area under curve (AUC) of 0.97 for both frequent and rare jumps (fig. S6). Using a posterior probability cutoff of 0.9, we can achieve a specificity of 0.99 for detecting both frequent and rare jumps. In addition, we can detect rare jumps on short branches (defined as branches with a prior probability of having at least one jump < 0.1) with a sensitivity of 0.6. As a comparison, we also tested the performance of BayesTraits, which assumes the continuous variable rate model VRG. BayesTraits failed to accurately capture the frequent and rare jumps, yielding an average AUC below 0.70 (fig. S6). This shows that the continuous variable rate model is inadequate to capture the mode of pulsed evolution.

Using a posterior probability cutoff of 0.9, we mapped the rare jumps of the genomic traits onto the bacterial phylogeny. We found that jumps occurred throughout the phylogeny (Fig. 2), again indicating that pulsed evolution is prevalent in bacterial evolution history. We detect recent rare jumps in genome size that happen on short branches separating recently diverged species. These jumps correspond to events of very recent genome reduction and expansion (table S7). We also detect rare jumps in more ancient branches. Some of the rare jumps are associated with known key evolutionary adaptations, validating our predictions. For instance, a classic example of adaptation to endosymbiosis occurred within the family Enterobacteriaceae, in the lineage leading to a clade of insect endosymbionts that includes the genera Buchnera, Wigglesworthia, and Candidatus Blochmannia (18). Our model detects large rare jumps at the base of the clade in the genome size and genomic GC% (posterior probability > 0.9; Fig. 2). The recently described order Candidatus Nanopelagicales within the phylum Actinobacteria makes up the most abundant free-living bacteria in fresh water. Nanopelagicales have adapted to live in the nutrient-poor environment by streamlining their genomes (19). Compared to its high-GC Gram-positive sister clade, Nanopelagicales has markedly reduced genome size (~1.4 Mbp) and genomic GC% (~48%). We detect large rare jumps in all genomic traits at the base of the order with extremely high confidence (posterior probability >0.99), suggesting that the genomic streamlining process happened not gradually but by jumps. Similar patterns have also been observed in the branch leading to the most abundant free-living marine bacteria Pelagibacterales and the intracellular bacteria Rickettsiales and Holosporales. Our model also predicts large rare jumps at higher taxonomic levels such as those at the base branch leading to the α-, β-, γ-, and δ-proteobacteria (posterior probability > 0.96) and the branch that separates the γ-proteobacteria from the rest of the proteobacteria (posterior probability > 0.99). Our results suggest that these key evolutionary adaptations evolved in rapid bursts instead of through slow divergence of species over prolonged periods of time as proposed by the gradual evolution model.

Fig. 2. Rare jumps are widely distributed throughout the bacterial phylogeny.

For clarity, clades have been collapsed at the taxonomic rank order, and therefore, the vast majority of the short branches in the tree are not shown in the figure. A collapsed order is represented by a gray circle at the tip whose diameter represents the number of genomes in the order. Colored dots are placed on branches where the posterior probability of having at least one rare or super rare jump event is greater than 0.9. Arrows point to branches leading to (1) the order *Candidatus* Nanopelagicales; (2) the α-, β-, γ-, and δ-proteobacteria; (3) the orders Pelagibacterales, Rickettsiales, and Holosporales; (4) γ-proteobacteria; and (5) the genera *Buchnera*, *Wigglesworthia*, and *Candidatus* Blochmannia within the family Enterobacteriaceae. The PVC group includes the phyla Planctomycetota, Verrucomicrobiota, and Chlamydiota. The FCB group includes the phyla Fibrobacterota, Chlorobiota, and Bacteroidota.

We tested the pairwise correlation of rare jumps’ occurrence between traits using the posterior probability of jumps. We find that rare jumps’ occurrence is significantly positively correlated between all pairs of traits (P < 0.001; table S8), except between genome size and N-ARSC (P = 0.060; table S8). Because frequent jumps are saturated in most part of the phylogeny, we do not have enough power to test the correlation of frequent jumps’ occurrence between traits.

Next, we tested whether jumps are correlated with cladogenesis in bacteria by comparing the predicted frequency of jumps to the expected frequency, for which we assume no correlation of jumps with cladogenesis (the null hypothesis). For example, if jumps happen significantly more frequently between two congener sister nodes than expected, then jumps are considered correlated with the speciation event (cladogenesis at the species level). For frequent jumps, simulations indicated that we lacked the statistical power to reject the null hypothesis at every taxonomic level, and therefore, we excluded them from this analysis. For rare and super rare jumps, we tested their correlation with cladogenesis from the species to order levels. We found that rare and super rare jumps occur more frequently than expected for all traits at the genus, family, and order levels, except for N-ARSC at the genus level (Table 4). This increase in frequency is significant for rRNA GC%, genomic GC%, and genome size at the genus and family levels (P ≤ 0.050) and for rRNA and genomic GC% at the order level (P < 0.001). We found that rare and super rare jumps happen less frequently than expected for all traits at the species level, although it is significant only for ribosomal GC% (P < 0.001). Our results suggest that rare and super rare jumps are correlated with cladogenesis at higher taxonomic ranks.

Trait	Species	Genus	Family	Order
Ribosomal GC%	−2.3%* (P < 0.001, β > 0.999)	+9.5%* (P < 0.001, β > 0.999)	+17.1%* (P < 0.001, β = 0.985)	+21.3%* (P < 0.001, β = 0.870)
Genomic GC%	−2.1% (P = 0.060, β > 0.999)	+8.1%* (P < 0.001, β = 0.995)	+8.9%* (P < 0.001, β = 0.610)	+11.5%* (P < 0.001, β = 0.205)
Genome size	−1.2% (P = 0.150, β > 0.999)	+3.5%* (P = 0.050, β > 0.999)	+4.9%* (P < 0.001, β > 0.999)	+4.1% (P = 0.075, β > 0.999)
N-ARSC	−0.2% (P = 0.630, β > 0.999)	−1.8% (P = 0.240, β > 0.999)	+4.1% (P = 0.06, β = 0.990)	+3.3% (P = 0.290, β = 0.770)

Table 4. Differences in the percentage of contrasts with at least one rare or super rare jump between those inferred from the empirical data and the expectation from the null hypothesis.

Significant differences are marked with asterisks. P values and power (β) are listed in parentheses.

Jumps in bacterial traits can be mediated by HGT

HGT is one of the most important processes in bacterial and archaeal evolution. To determine whether the jumps identified in this study could be due to HGT, we manually examined the top candidates of rare jumps in rRNA GC% (posterior probability > 0.9 and prior probability < 0.1) and found clear evidence of HGT. For example, we detected a rare, large jump between two strains of obligate endosymbiont of whiteflies Candidatus Portiera aleyrodidarum. Figure S7 shows that strain China 1 acquired nearly one-third of its 16S and 23S rRNA genes from the lineage of Candidatus Hamiltonella defensa, which is also an endosymbiont in whiteflies but belongs to a different order. GC% of the acquired 16S rRNA fragment increased from 46 to 52%, while GC% of the acquired 23S rRNA fragments increased from 44 to 51%, leading to a jump in the rRNA stem GC%.

DISCUSSION

The central question that we try to address in this study is the tempo and mode of microbial trait evolution: whether the traits evolve mainly by gradual or pulsed evolution. Large trait differences between bacterial lineages are well known (13, 14), but it is less clear whether these large trait changes arose gradually over time or rapidly by jumps (the mode). Using an ML framework, we explicitly test the mode of genomic trait evolution in bacteria and archaea, and we show that pulsed evolution explains the patterns significantly better than both the constant and variable rate gradual evolution models. Our analysis suggests that pulsed evolution is not only present but also prevalent and dominant in microbial genomic trait evolution.

Microbes are known for rapid evolution. Why are these genomic traits constrained for millions of years before they diverge? The stasis at the species level can be explained by stabilizing selection that eliminates variants falling outside of a stable niche (20). Alternatively, it can be maintained by gene flow, as suggested by Futuyma’s ephemeral divergence theory (21). Futuyma proposes that novel adaptive trait variation arises frequently in local populations, but the spatial and temporal mosaic nature of niches prevents such local adaptations from spreading to the entire species because they are wiped out by the gene flow from the prevailing intervening ancestral populations. As a result, trait changes perish and do not accumulate over time, resulting in stationary fluctuations, until speciation interrupts the gene flow. Although reproducing asexually, microbes do exchange genes through homologous recombination, and there is evidence that gene flow plays a critical role in bacterial speciation at least under certain conditions (22–24). The transient trait variation in the initial stasis phase when jumps are absent (the ε term in our model) approximately matches the intraspecific trait variation.

At longer time scales or higher taxonomic levels, trait evolution can be constrained through stabilizing selection exerted by the adaptive zone (25), defined as a set of ecological niches to which a group of species is adapted (2). This will generate the pattern of phylogenetic conservatism where organisms in a clade tend to have similar traits (synapomorphy) and occupy similar habitats. Accordingly, both genome analyses and ecological studies support that ecological coherence exists at higher taxonomic levels in bacteria (26). For example, different bacterial clades have their unique set of genes (27), and analysis of thousands of cultured microbial strains showed that strains related at the genus, family, or order level occupy the same habitat more frequently than expected by chance (28).

Using simulation, we show that our ML framework can detect jumps with extremely high specificity (0.99) when a posterior probability cutoff of 0.9 is used to predict jumps. On the other hand, the continuous variable rate model struggles to capture the mode of pulsed evolution and performs poorly when benchmarked using the ROC curve. We detected two types of jumps in one dataset: small frequent jumps and large rare jumps. This is possible because the large bacterial dataset spans a wide range of macroevolutionary time scales. For example, the bacterial genome tree in our study has a total branch length of 442.9 substitutions per site. For super rare jumps (e.g., genome size jumps with a rate of 0.17 jump per lineage per unit branch length), it is estimated that there are still 75 events in the entire phylogeny. On the other hand, the resolution of our bacterial genome tree is 5 × 10⁻⁵ substitutions per site, meaning that we can detect jumps that occur as frequently as 20,000 jumps per lineage per unit branch length on average. The large difference in the frequency and size of the jumps suggests that they represent different kinds of evolutionary events. Although our modeling does not stipulate the coupling of cladogenesis and pulsed evolution (as in the classical punctuated equilibrium theory), the rate of the frequent jumps in bacteria (115 to 634 jumps per lineage per unit branch length or 0.06 to 0.32 jumps per lineage per million years) approximates the recently estimated bacterial speciation rate (0.03 to 0.05 speciation per lineage per million years) where species is defined as having 99% identical 16S rRNAs (29), suggesting that frequent jumps and speciation events may be correlated. Two features of the rare jumps fit the description of quantum evolution. First, the rare jumps are fairly large in magnitude, most likely resulting from shifting between major adaptive zones. Second, our test shows that rare jumps happen less frequently than expected at the species level but significantly more frequently than expected at higher taxonomic levels (genus, family, and order), suggesting that there is a correlation between rare jumps and the origination of higher taxa. Furthermore, some of the predicted rare jumps coincide with known major evolutionary adaptations in bacterial evolution history. A key insight from this observation is that major evolutionary adaptations in bacteria and the origination of major bacterial lineages may happen in quick bursts (quantum evolution) instead of through slow divergence of species over prolonged periods of time (gradual evolution) (30), which is consistent with earlier findings of rapid expansion of major microbial lineages (31, 32). We found that rare jumps of the four traits tend to happen together.

Microbial genomes are highly dynamic (33, 34). They can change by mutation, gene loss, gene duplication, and HGT. Whatever the mechanism is, our study suggests that large changes happen in episodes of bursts rather than gradually and slowly. These large changes are not due to the simple gain and loss of plasmids, as we have excluded plasmids in our study. Chromosomes are in constant exchange with phages, plasmids, and other mobile elements and can change by “quantum leaps” in the form of genomic islands (15). It is also well known that rRNA genes can be horizontally transferred (35), which we show in this study can result in an instant jump in rRNA GC%. In contrast to animals and plants, HGT plays an important role in the evolution of bacteria and archaea, therefore providing an additional avenue for microbial pulsed evolution. It is worth pointing out that jumps in our model represent trait changes that persist over time, not the processes that drive the changes. The rarity of detected jumps does not mean that the evolutionary processes (e.g., selection and population bottleneck) that drive the jumps are rare. It merely means that the success rate of such jumps is low. The rarity of jumps can result from adaptation to a large environmental shift that happens infrequently, or it can be a manifestation of multiple frequent small jumps occurring in quick succession, which is also rare.

In conclusion, our modeling of phylogenetic comparative data shows that pulsed evolution is both prevalent and dominant in bacterial and archaeal genomic traits evolution. The signatures of pulsed evolution detected in this study are consistent with both the punctuated equilibrium and quantum evolution theories. More broadly, our results suggest that pulsed evolution is the rule rather than the exception across the tree of life, despite the drastically different population genetic properties of micro- and macroorganisms.

MATERIALS AND METHODS

Bacterial and archaeal phylogeny and genomic traits

We downloaded 10,616 complete bacterial genomes and 263 complete archaeal genomes from the NCBI RefSeq database on 6 September 2018 (table S9). From each genome, we identified either 31 bacterial or 104 archaeal protein-coding marker genes using AMPHORA2 (36) with the default options and constructed a bacterial and an archaeal genome tree based on the concatenated and trimmed protein sequence alignment of the marker genes. We reconstructed the archaeal genome tree using RAxML (version 8.2.11) (37) with the option -m PROTCATLG. Because of its large size, it is impractical to make the bacterial genome tree using RAxML. Instead, we inferred the bacterial genome tree using FastTree (version 2.1.11) (38) with the option -wag -gamma. For better resolution, we reoptimized the branch length of the genome trees with the DNA sequence alignments of the marker genes using RAxML with the option -m GTRGAMMA. We removed genomes with identical alignments, extremely long branches, ambiguous bases, or unreliable annotations from the genome trees. For each of the 6668 bacterial and 247 archaeal genomes that remained, we calculated four traits: the rRNA stem GC% (rRNA GC%), genomic GC%, genome size (excluding plasmids), and the average N-ARSC. We transformed these traits (logit transformation for rRNA GC% and genomic GC%; log transformation for genome size and N-ARSC) to make them comply with the assumption of continuous trait evolution. For conversion from rRNA GC% to the optimal growth temperature, we used the empirical formula determined by Wang et al. (9)

T_{optimal} (° C) = 3.75 \times {GC}_{rRNA} (%) - 216.27

(1)

To test the effect of phylogeny uncertainty on our model fitting, we downloaded the bacterial reference phylogeny (release 202) from the GTDB (17). In addition, we inferred the phylogeny for the bacterial family Enterobacteriaceae using both FastTree and RAxML.

Calculating PIC with time-independent variation

PIC assumes a BM model in which trait variance increases linearly with time (39). However, we observed variation in trait values between genomes that are separated by zero branch length (Fig. 1). Therefore, we introduce time-independent variation into the BM model and denote its variance with ε. When time-independent variation is normally distributed, the PIC between a pair of sister tips is calculated as

PIC = \frac{x_{1} - x_{2}}{\sqrt{(l_{1} + l_{2}) σ_{BM}^{2} + ε}}

(2)

where x₁ and x₂ are the trait values of the tips, l₁ and l₂ are their branch lengths to the parent node, and

σ_{BM}^{2}

is the rate of BM. The uncertainty of the parent node’s trait value is calculated as

ε_{0} = \frac{(l_{1} σ_{BM}^{2} + 0.5 ε) (l_{2} σ_{BM}^{2} + 0.5 ε)}{(l_{1} + l_{2}) σ_{BM}^{2} + ε}

(3)

Calculating pseudo-PIC under pulsed evolution model

When the distribution of trait change over a branch is not normally distributed, the assumption that PIC will follow the standard normal distribution does not hold. To remedy this assumption violation, we calculate pseudo-PIC, an analog of PIC under non-BM models. Specifically, the pseudo-PIC satisfies the equation

Ψ_{norm} ({PIC}_{pseudo}) = Ψ_{m} (x, l)

(4)

where Ψ_norm is the standard normal cumulative probability function and Ψ_m(x, l) is the cumulative probability function of trait change x under the non-BM model given the branch length l and its model parameters.

Testing the pairwise correlation between the four genomic traits

To remove dependence among extant trait values due to shared ancestry, we selected all 2003 tip pairs in the bacterial genome tree and calculated their PICs for each trait. Using the PICs, we calculated Pearson correlation coefficient r and coefficient of determination R² for each trait pair. To reduce the bias in correlation introduced by non-normality of the PICs, we also calculated the above statistics for each trait pair using pseudo-PICs that are normally distributed under the pulsed evolution model.

Quantifying the frequency of extreme trait changes in bacterial evolution history

We tested whether extremely rapid trait changes happen throughout the bacterial evolutionary history. We calculated the relative distance from the root (last common ancestor of bacteria) to a node i as

{\hat{d}}_{root} = \frac{d_{root}}{d_{root} + \bar{d_{tip}}}

(5)

where d_root is the branch length of the node i to the root and

\bar{d_{tip}}

is the average branch length of the node i to all its descending tips. It should be noted that a PIC at the node i measures the trait difference between its two immediate descending nodes. We binned the PIC based on the relative distance to the root of the node into seven bins with exponentially distributed boundaries and calculated the frequency of extremely rapid trait changes (|PIC| ≥ 3) for each bin.

Segmented linear regression of absolute trait change on branch length

To analyze the trend of trait change, we applied segmented linear regression of the absolute trait changes over branch length as described by Uyeda et al. (6). For each trait, we added a small fixed value (0.001) to the absolute trait changes and log-transformed it to obtain approximately normal distribution. To account for uncertainty introduced by ancestral state reconstruction, we adjusted the branch length as described by Felsenstein (39) and log-transformed it as well. When regressing the log-transformed absolute trait changes against the log-transformed adjusted branch lengths, we constrained the slope of the first segment to be zero (to capture the stasis) and allowed the slopes of the remaining regression lines to change at certain break points, but the regression lines had to be continuous (connected). We compared linear regression models with one or two break points, selected the one with the lowest AIC, and used the break points to mark the transitions between different evolution tempos.

Evaluating pulsed evolution in bacteria and archaea

Using the ML method, we tested seven models of trait evolution (Table 1). For pulsed evolution models with more than one compound Poisson process, we restricted the variances of jump sizes between any two jumping processes to be at least threefold different. We fitted the models to trait changes between sister nodes given their branch lengths. For internal nodes, we reconstructed their trait values with Felsenstein’s method (39) but took time-independent variation into account. We removed contrasts of zero in rRNA GC% because of their excessiveness in the trait to avoid biased model fitting for pure pulsed evolution (i.e., PE1, PE2, and PE3) models. For all other traits, contrasts of zero were included in the fitting. We calculated confidence intervals for model parameters and statistics by bootstrapping with 50 replicates. We selected the best model using AIC.

We estimated two parameters for each compound Poisson process: frequency λ_i and variance of jump sizes

σ_{i}^{2}

, where i is the rank of the Poisson process. For further evaluation of pulsed evolution, we calculated the contribution and the relative jump size of each compound Poisson process in pulsed evolution. The contribution of a Poisson process (as PVE) was calculated by

PV E_{i} = \frac{λ_{i} σ_{i}^{2}}{σ_{BM}^{2} + \sum_{j = 1}^{n} λ_{j} σ_{j}^{2}}

(6)

where i and j are the ranks of Poisson processes and n is the total number of Poisson processes in the model. The relative jump size was calculated as

{\hat{σ}}_{i} = \frac{σ_{i}}{\sqrt{ε}}

. To roughly compare the overall rate of the frequent jumps to the bacterial speciation rate estimated in million years (29), we calculated the phylogenetically weighted average branch length of all tips to the root in the tree and then calibrated time assuming that the average branch length is equivalent to 3.5 billion years of evolution (40, 41).

Evaluating the power of detecting jumps in pulsed evolution

To evaluate the power of detecting jumps and distinguishing various models tested in this study, we simulated trait evolution under the seven models in Table 1 with the parameters in table S2 and fitted these models to the simulated data. We simulated 20 replicates for each model, selected the best models on the basis of AICs of fitted models, and counted the frequency that each model was selected as the best model. To determine the effect of sample size on model fitting, from the data simulated using the PE2 model, we randomly sampled 2250, 750, 250, and 100 pairs of sister nodes without replacement from the full bacterial phylogeny that contains 6667 sister pairs. We did model selection as described above and counted the frequency that each model was selected as the best model.

Identifying rare jumps using pulsed evolution model and BayesTraits

We evaluated the performance of using the pulsed evolution model and BayesTraits (42) to detect jumps in simulation. Using 10 replicates of data simulated under the PE2 model, we calculated the posterior probability of having at least one jump between sister nodes as described in Supplementary Text. Using different posterior probability cutoff values, we plotted the ROC curve and calculated its AUC. To test whether the continuous variable rate VRG model can also capture jumps, we applied BayesTraits to estimate the relative rate of evolution for each branch and used a cutoff of the relative rate to predict jumps. Similarly, we used different cutoffs to determine the ROC and AUC of BayesTraits. BayesTraits does not model time-independent trait variation directly. To eliminate the effect of time-independent variation on its rate estimation, for each tip branch in the phylogeny, we added a branch length so that the process variance over the added branch length is equal to half of the time-independent variation between tips. We then applied BayesTraits on the adjusted phylogeny with the VRG model and Bayesian method enabled (modes 7 and 2, respectively).

Testing the correlation between rare jumps and cladogenesis in bacteria

We identified all contrasts between two congener sister nodes. Using a posterior probability threshold of 0.75, we calculated the frequency of having at least one rare jump in these contrasts for each trait (predicted frequency). We computed the expected distribution of this frequency through simulations using the estimated model parameters of pulsed evolution under the null hypothesis that there is no correlation between jumps and cladogenesis. By comparing the predicted frequency to the expected distribution of the null hypothesis, we calculated the two-sided P value of the null hypothesis being true at the species level. We repeated the same statistical test at the genus, family, and order levels.

Testing the correlation between rare jumps of four traits in bacteria

For each pair of traits, using a posterior probability threshold of 0.75, we calculated the frequency of a pair of sister nodes between which rare jumps occur for both traits or none at all (F_empirical). To test the correlation of jumps’ occurrence, we compared this frequency to a null distribution. To generate the null distribution, we simulated trait evolution independently for each trait over the bacterial phylogeny using the best model estimated for each trait and repeated the simulation 1000 times. P value of this test was calculated as the probability that F_empirical fell outside of [2.5 to 97.5%] of the null distribution.

Acknowledgment

Funding: The authors acknowledge that they received no funding in support of this research.

Author contributions: Y.G. and M.W. conceived the study, designed the analyses, and wrote the manuscript. Y.G. wrote the codes and performed the data analysis.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data and scripts needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

Supplementary Materials

This PDF file includes:

Supplementary Text

Figs. S1 to S7

Download
1.62 MB

Other Supplementary Material for this manuscript includes the following:

Tables S1 to S9

Download
201.22 KB

Supplementary Files

Download
22.51 MB

View/request a protocol for this paper from Bio-protocol.

REFERENCES AND NOTES

N. Eldredge, S. J. Gould, in Models in Paleobiology, T. J. M. Schopf, Ed. (Cooper & Co, 1972), pp. 82–115.

Google Scholar

G. G. Simpson, Tempo and Mode in Evolution (Columbia University Press, 1944).

Google Scholar

S. M. Stanley, A theory of evolution above the species level. Proc. Natl. Acad. Sci. U.S.A. 72, 646–650 (1975).

Crossref

PubMed

ISI

Google Scholar

J. B. C. Jackson, A. H. Cheetham, Phylogeny reconstruction and the tempo of speciation in cheilostome Bryozoa. Paleobiology 20, 407–423 (1994).

Crossref

ISI

Google Scholar

G. Hunt, M. A. Bell, M. P. Travis, Evolution toward a new adaptive optimum: Phenotypic evolution in a fossil stickleback lineage. Evolution 62, 700–710 (2008).

Crossref

PubMed

ISI

Google Scholar

J. C. Uyeda, T. F. Hansen, S. J. Arnold, J. Pienaar, The million-year wait for macroevolutionary bursts. Proc. Natl Acad Sci. U.S.A. 108, 15908–15913 (2011).

Crossref

PubMed

ISI

Google Scholar

P. Duchen, C. Leuenberger, S. M. Szilágyi, L. Harmon, J. Eastman, M. Schweizer, D. Wegmann, Inference of evolutionary jumps in large phylogenies using Lévy processes. Syst. Biol. 66, 950–963 (2017).

Crossref

PubMed

ISI

Google Scholar

M. J. Landis, J. G. Schraiber, Pulsed evolution shaped modern vertebrate body sizes. Proc. Natl. Acad Sci. U.S.A. 114, 13224–13229 (2017).

Crossref

PubMed

ISI

Google Scholar

H.-C. Wang, X. Xia, D. Hickey, Thermal adaptation of the small subunit ribosomal RNA gene: A comparative study. J. Mol. Evol. 63, 120–126 (2006).

Crossref

PubMed

ISI

Google Scholar

F. M. Lauro, D. McDougald, T. Thomas, T. J. Williams, S. Egan, S. Rice, M. Z. DeMaere, L. Ting, H. Ertan, J. Johnson, S. Ferriera, A. Lapidus, I. Anderson, N. Kyrpides, A. C. Munk, C. Detter, C. S. Han, M. V. Brown, F. T. Robb, S. Kjelleberg, R. Cavicchioli, The genomic basis of trophic strategy in marine bacteria. Proc. Natl. Acad Sci. U.S.A. 106, 15527–15533 (2009).

Crossref

PubMed

ISI

Google Scholar

C. A. Martinez-Gutierrez, F. O. Aylward, Strong purifying selection is associated with genome streamlining in epipelagic Marinimicrobia. Genome Biol. Evol. 11, 2887–2894 (2019).

Crossref

PubMed

ISI

Google Scholar

S. F. Elena, V. S. Cooper, R. E. Lenski, Punctuated evolution caused by selection of rare beneficial mutations. Science 272, 1802–1804 (1996).

Crossref

PubMed

ISI

Google Scholar

N. A. Moran, Accelerated evolution and Muller’s rachet in endosymbiotic bacteria. Proc Natl. Acad Sci. U.S.A. 93, 2873–2878 (1996).

Crossref

PubMed

ISI

Google Scholar

J. O. Andersson, S. G. Andersson, Genome degradation is an ongoing process in Rickettsia. Mol. Biol. Evol. 16, 1178–1191 (1999).

Crossref

PubMed

ISI

Google Scholar

E. A. Groisman, H. Ochman, Pathogenicity islands: Bacterial evolution in quantum leaps. Cell 87, 791–794 (1996).

Crossref

PubMed

ISI

Google Scholar

G. Xun, H.-E. David, L. Wen-Hsiung, in Mutation and Evolution, Contemporary Issues in Genetics and Evolution, R. C. Woodruff, J. N. Thompson, Eds. (Springer, 1998), pp. 383–391.

Google Scholar

D. H. Parks, M. Chuvochina, C. Rinke, A. J. Mussig, P.-A. Chaumeil, P. Hugenholtz, GTDB: An ongoing census of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res. 50, D785–D794 (2021).

Crossref

ISI

Google Scholar

J. P. McCutcheon, N. A. Moran, Extreme genome reduction in symbiotic bacteria. Nat. Rev. Microbiol. 10, 13–26 (2012).

Crossref

ISI

Google Scholar

S. M. Neuenschwander, R. Ghai, J. Pernthaler, M. M. Salcher, Microdiversification in genome-streamlined ubiquitous freshwater Actinobacteria. ISME J. 12, 185–198 (2018).

Crossref

PubMed

ISI

Google Scholar

B. Charlesworth, R. Lande, M. Slatkin, A neo-Darwinian commentary on macroevolution. Evolution 36, 474–498 (1982).

PubMed

ISI

Google Scholar

D. J. Futuyma, Evolutionary constraint and ecological consequences. Evolution 64, 1865–1884 (2010).

Crossref

PubMed

ISI

Google Scholar

C. Fraser, W. P. Hanage, B. G. Spratt, Recombination and the nature of bacterial speciation. Science 315, 476–480 (2007).

Crossref

PubMed

ISI

Google Scholar

J. Friedman, E. J. Alm, B. J. Shapiro, Sympatric speciation: When is it possible in bacteria? PLOS ONE 8, e53539 (2013).

Crossref

PubMed

ISI

Google Scholar

L.-M. Bobay, H. Ochman, Biological species are universal across life’s domains. Genome Biol. Evol. 9, 491–501 (2017).

Crossref

ISI

Google Scholar

S. Estes, S. J. Arnold, Resolving the paradox of stasis: Models with stabilizing selection explain evolutionary divergence on all timescales. Am. Nat. 169, 227–244 (2007).

Crossref

PubMed

ISI

Google Scholar

L. Philippot, S. G. E. Andersson, T. J. Battin, J. I. Prosser, J. P. Schimel, W. B. Whitman, S. Hallin, The ecological coherence of high bacterial taxonomic ranks. Nat. Rev. Microbiol. 8, 523–529 (2010).

Crossref

PubMed

ISI

Google Scholar

N. Segata, L. Waldron, A. Ballarini, V. Narasimhan, O. Jousson, C. Huttenhower, Metagenomic microbial community profiling using unique clade-specific marker genes. Nat. Methods 9, 811–814 (2012).

Crossref

PubMed

ISI

Google Scholar

C. V. Mering, P. Hugenholtz, J. Raes, S. G. Tringe, T. Doerks, L. J. Jensen, N. Ward, P. Bork, Quantitative phylogenetic assessment of microbial communities in diverse environments. Science 315, 1126–1130 (2007).

Crossref

PubMed

ISI

Google Scholar

S. Louca, P. M. Shih, M. W. Pennell, W. W. Fischer, L. W. Parfrey, M. Doebeli, Bacterial diversification through geological time. Nat. Ecol. Evol. 2, 1458–1467 (2018).

Crossref

PubMed

Google Scholar

E. Mayr, Speciation and macroevolution. Evolution 36, 1119–1132 (1982).

Crossref

PubMed

ISI

Google Scholar

L. A. David, E. J. Alm, Rapid evolutionary innovation during an Archaean genetic expansion. Nature 469, 93–96 (2011).

Crossref

PubMed

ISI

Google Scholar

S. Nelson-Sathi, F. L. Sousa, M. Roettger, N. Lozada-Chávez, T. Thiergart, A. Janssen, D. Bryant, G. Landan, P. Schönheit, B. Siebers, J. O. McInerney, W. F. Martin, Origins of major archaeal clades correspond to gene acquisitions from bacteria. Nature 517, 77–80 (2015).

Crossref

PubMed

ISI

Google Scholar

H. Ochman, L. M. Davalos, The nature and dynamics of bacterial genomes. Science 311, 1730–1733 (2006).

Crossref

PubMed

ISI

Google Scholar

E. V. Koonin, Y. I. Wolf, Genomics of bacteria and archaea: The emerging dynamic view of the prokaryotic world. Nucleic Acids Res. 36, 6688–6719 (2008).

Crossref

PubMed

ISI

Google Scholar

K. Kitahara, K. Miyazaki, Revisiting bacterial phylogeny: Natural and experimental evidence for horizontal gene transfer of 16S rRNA. Mob. Genet. Elem. 3, e24210 (2013).

Crossref

PubMed

Google Scholar

M. Wu, A. J. Scott, Phylogenomic analysis of bacterial and archaeal sequences with AMPHORA2. Bioinformatics 28, 1033–1034 (2012).

Crossref

PubMed

ISI

Google Scholar

A. Stamatakis, RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

Crossref

PubMed

ISI

Google Scholar

M. N. Price, P. S. Dehal, A. P. Arkin, FastTree 2 – approximately maximum-likelihood trees for large alignments. PLOS ONE 5, e9490 (2010).

Crossref

PubMed

ISI

Google Scholar

J. Felsenstein, Phylogenies and the comparative method. Am. Nat. 125, 1–15 (1985).

Crossref

ISI

Google Scholar

P. P. Sheridan, K. H. Freeman, J. E. Brenchley, Estimated minimal divergence times of the major bacterial and archaeal phyla. Geomicrobiol. J. 20, 1–14 (2003).

Crossref

ISI

Google Scholar

H. C. Betts, M. N. Puttick, J. W. Clark, T. A. Williams, P. C. J. Donoghue, D. Pisani, Integrated genomic and fossil evidence illuminates life’s early evolution and eukaryote origin. Nat. Ecol. Evol. 2, 1556–1562 (2018).

Crossref

PubMed

Google Scholar

C. Venditti, A. Meade, M. Pagel, Multiple routes to mammalian diversity. Nature 479, 393–396 (2011).

Crossref

PubMed

ISI

Google Scholar

(0)eLetters

eLetters is a forum for ongoing peer review. eLetters are not edited, proofread, or indexed, but they are screened. eLetters should provide substantive and scholarly commentary on the article. Embedded figures cannot be submitted, and we discourage the use of figures within eLetters in general. If a figure is essential, please include a link to the figure within the text of the eLetter. Please read our Terms of Service before submitting an eLetter.

Information & Authors

Information

Published In

Science Advances

Volume 8 | Issue 28
July 2022

Copyright

Copyright © 2022 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works. Distributed under a Creative Commons Attribution NonCommercial License 4.0 (CC BY-NC).

https://creativecommons.org/licenses/by-nc/4.0/

This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial license, which permits use, distribution, and reproduction in any medium, so long as the resultant use is not for commercial advantage and provided the original work is properly cited.

Submission history

Received: 9 November 2021

Accepted: 2 June 2022

Permissions

See the Reprints and Permissions page for information about permissions for this article.

Acknowledgments

Funding: The authors acknowledge that they received no funding in support of this research.

Author contributions: Y.G. and M.W. conceived the study, designed the analyses, and wrote the manuscript. Y.G. wrote the codes and performed the data analysis.

Competing interests: The authors declare that they have no competing interests.

Data and materials availability: All data and scripts needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.

Authors

Affiliations

Yingnan Gao https://orcid.org/0000-0002-0960-4519

Department of Biology, University of Virginia, Charlottesville, VA 22094, USA.

Roles: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing - original draft, and Writing - review & editing.

View all articles by this author

Martin Wu^* https://orcid.org/0000-0003-3093-4077 [email protected]

Department of Biology, University of Virginia, Charlottesville, VA 22094, USA.

Roles: Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Supervision, Validation, Writing - original draft, and Writing - review & editing.

View all articles by this author

Notes

Corresponding author. Email: [email protected]

Metrics & Citations

Metrics

Article Usage

Altmetrics

Citations

Cite as

Yingnan Gao,
Martin Wu

Microbial genomic trait evolution is dominated by frequent and rare pulsed evolution.Sci. Adv.8,eabn1916(2022).DOI:10.1126/sciadv.abn1916

Export citation

Select the format you want to export the citation of this publication.

Cited by

- Wenkai Teng,
- Bin Liao,
- Mengyun Chen,
- Wensheng Shu,
Genomic Legacies of Ancient Adaptation Illuminate GC-Content Evolution in Bacteria, Microbiology Spectrum, 11, 1, (2023).https://doi.org/10.1128/spectrum.02145-22
Crossref
- John E. Hallsworth,
- Zulema Udaondo,
- Carlos Pedrós‐Alió,
- Juan Höfer,
- Kathleen C. Benison,
- Karen G. Lloyd,
- Radamés J. B. Cordero,
- Claudia B. L. de Campos,
- Michail M. Yakimov,
- Ricardo Amils,
Scientific novelty beyond the experiment, Microbial Biotechnology, (2023).https://doi.org/10.1111/1751-7915.14222
Crossref

View Options

View options

PDF format

Download this article as a PDF file

Download PDF

Check Access

Log in to view the full text

AAAS ID LOGIN

AAAS login provides access to Science for AAAS Members, and access to other journals in the Science family to users who have purchased individual subscriptions.

Log in via OpenAthens.

via OpenAthens

Log in via Shibboleth.

via Shibboleth

More options

As a service to the community, this article is available for free. Login or register for free to read this article.

Abstract

SIGN UP FOR THE SCIENCEADVISER NEWSLETTER

INTRODUCTION

RESULTS

Gradual evolution does not explain microbial genomic trait evolution

Modeling microbial genomic trait evolution

Microbial trait evolution is dominated by frequent and rare jumps

Rare jumps are correlated between traits and with cladogenesis in bacteria

Jumps in bacterial traits can be mediated by HGT

DISCUSSION

MATERIALS AND METHODS

Bacterial and archaeal phylogeny and genomic traits

Calculating PIC with time-independent variation

Calculating pseudo-PIC under pulsed evolution model

Testing the pairwise correlation between the four genomic traits

Quantifying the frequency of extreme trait changes in bacterial evolution history

Segmented linear regression of absolute trait change on branch length

Evaluating pulsed evolution in bacteria and archaea

Evaluating the power of detecting jumps in pulsed evolution

Identifying rare jumps using pulsed evolution model and BayesTraits

Testing the correlation between rare jumps and cladogenesis in bacteria

Testing the correlation between rare jumps of four traits in bacteria

Acknowledgment

Supplementary Materials

This PDF file includes:

Other Supplementary Material for this manuscript includes the following:

REFERENCES AND NOTES

(0)eLetters

Information

Published In

Copyright

Submission history

Permissions

Acknowledgments

Authors

Affiliations

Notes

Metrics

Article Usage

Altmetrics

Citations

Cite as

Export citation

Cited by

View options

PDF format

Check Access

Log in to view the full text

More options

Figures

Multimedia

Share

Share article link

Share on social media

A dynamic in vitro model of Down syndrome neurogenesis with trisomy 21 gene dosage correction

Somatic mutations associate with clonal expansion of CD8+ T cells

Nuclear factor κB overactivation in the intervertebral disc leads to macrophage recruitment and severe disc degeneration

Somatic mutations associate with clonal expansion of CD8⁺ T cells