Computational prediction and experimental evolution of non‐native carbon source utilizations
Based on our knowledge of underground metabolism, we utilized a genome‐scale model of
E. coli metabolism that includes a comprehensive network reconstruction of underground metabolism (Notebaart
et al,
2014) to test our ability to predict evolutionary adaptation to novel (non‐native) carbon sources. This model was previously shown to correctly predict growth on non‐native carbon sources if a given enabling gene was artificially overexpressed in a growth screen (Notebaart
et al,
2014). This previous work identified a list of ten carbon sources that the native
E. coli metabolic network is not able to utilize for growth in simulations but that can be utilized for growth
in silico with the addition of a single underground reaction (
Appendix Table S1). Based on this list—as well as substrate cost, availability, and solubility properties to maximize compatibility with our laboratory evolution procedures—we selected seven carbon sources (D‐lyxose, D‐tartrate, D‐2‐deoxyribose, D‐arabinose, ethylene glycol, m‐tartrate, monomethyl succinate) that cannot be utilized by wild‐type
E. coli MG1655 but are predicted to be growth‐sustaining carbon sources after adaptive laboratory evolution.
Next, we initiated laboratory evolution experiments to adapt
E. coli to these non‐native carbon sources. Adaptive laboratory evolution experiments were conducted in two distinct phases: first, a “weaning/dynamic environment” (Copley,
2000; Mortlock,
2013) stage during which cells acquired the ability to grow solely on the non‐native carbon sources and, second, a “static environment” (Barrick & Lenski,
2013) stage during which a strong selection pressure was placed to select for the fastest growing cells on the novel carbon sources (Fig
1A).
During the “weaning/dynamic environment” stage of laboratory evolution experiments (Fig
1A, see
Materials and Methods),
E. coli was successfully adapted to grow on five non‐native substrates individually in separate experiments. Duplicate laboratory evolution experiments were conducted in batch growth conditions for each individual substrate and in parallel on an automated adaptive laboratory evolution (ALE) platform using a protocol that uniquely selected for adaptation to conditions where the ancestor (i.e., wild type) was unable to grow (Fig
1A; LaCroix
et al,
2015). In the weaning phase,
E. coli was dynamically weaned off of a growth‐supporting nutrient (glycerol) onto the novel substrates individually (Fig
1A,
Appendix Table S2). A description of the complex passage protocol is given in the Fig
1 legend and expanded in the methods for both phases of the evolution. This procedure successfully adapted
E. coli to grow on five out of seven non‐native substrates, specifically, D‐lyxose, D‐2‐deoxyribose, D‐arabinose, m‐tartrate, and monomethyl succinate. Unsuccessful cases could be attributed to various experimental and biological factors such as experimental duration limitations, the requirement of multiple mutation events, or stepwise adaptation events, as observed in an experiment evolving
E. coli to utilize ethylene glycol (Szappanos
et al,
2016).
The “static environment” stage of the evolution experiments consisted of serially passing cultures in the early exponential phase of growth in order to select for cells with the highest growth rates (Fig
1A). Cultures were grown in a static media composition environment containing a single non‐native carbon source. Marked and repeatable increases in growth rates on the non‐native carbon sources were observed in as few as 180–420 generations (
Appendix Table S1). Whole‐genome sequencing of clones was performed at each distinct growth rate “jump” or plateau during the static environment phase (see arrows in Fig
1B,
Appendix Fig S1). Such plateaus represent regions where a causal mutation has fixed in a population and it was assumed that the mutation(s) enabling the jump in growth rate were stable and maintained throughout the plateau region (LaCroix
et al,
2015). Thus, clones were isolated at any point within this plateau region where frozen stock samples were available (LaCroix
et al,
2015).
Modeling with underground metabolism accurately predicted key genes mutated during laboratory evolution experiments
To analyze genotypic changes underlying the nutrient utilizations, clones were isolated and sequenced shortly after an innovative growth phenotype was achieved; mutations were identified (see
Materials and Methods) and analyzed for their associated causality (Fig
1B,
Appendix Fig S1,
Dataset EV1). Strong signs of parallel evolution were observed at the level of mutated genes in the replicate evolution experiments (Fig
1B,
Appendix Fig S1, Table
1,
Dataset EV1). Such parallelism provided evidence of the beneficial nature of the observed mutations and is a prerequisite for predicting the genetic basis of adaptation (Bailey
et al,
2015). Mutations detected in the evolved isolated clones for each experiment demonstrated a striking agreement with such predicted “underground” utilization pathways (Notebaart
et al,
2014). Specifically, for four out of the five different substrate conditions, key mutations were linked to the predicted enzyme with promiscuous activity, which would be highly unlikely by chance (
P < 10
−8, Fisher's exact test; Table
1,
Appendix Fig S2). Not only were the specific genes (or their direct regulatory elements) mutated in four out of five cases, but few additional mutations (0–2 per strain,
Dataset EV1) were observed directly following the weaning phase, indicating that the innovative phenotypes observed required a small number of mutational steps and the method utilized was highly selective. For the one case where the prediction and observed mutations did not align—D‐arabinose—a detailed inspection of the literature revealed existing evidence that three
fuc operon‐associated enzymes can metabolize D‐arabinose—FucI, FucK, and FucA (LeBlanc & Mortlock,
1971). The mutations observed in the D‐arabinose evolution experiments after the weaning stage were in the
fucR gene (Table
1), a DNA‐transcriptional activator associated with regulating the expression of the transcription units
fucAO and
fucPIK (Podolny
et al,
1999). Thus, it was inferred that the strains evolved to grow on D‐arabinose in our experiments were utilizing the
fuc operon‐associated enzymes to metabolize D‐arabinose in agreement with prior work (LeBlanc & Mortlock,
1971). In this case, the genome‐scale model did not identify the promiscuous reactions responsible for growth on D‐arabinose because the promiscuous (underground) reaction database was incomplete (see section “Mutations in regulatory elements linked to increased expression of underground activities: D‐arabinose evolution” for more details on D‐arabinose metabolism).
In general, key mutations observed shortly after strains achieved reproducible growth on the non‐native substrate could be categorized as regulatory (R) or structural (S) (Table
1). Of the fifteen mutation events outlined in Table
1, eleven were categorized as regulatory (observed in all five successful substrate conditions) and four were categorized as structural (three of five successful substrate conditions). For D‐lyxose, D‐2‐deoxyribose, and m‐tartrate evolution experiments, mutations were observed within the coding regions of the predicted genes, namely yihS, rbsK, and dmlA (Table
1,
Appendix Fig S1). Regulatory mutations occurring in transcriptional regulators or within intergenic regions—likely affecting sigma factor binding and transcription of the predicted gene target—were observed for D‐lyxose, D‐2‐deoxyribose, m‐tartrate, and monomethyl succinate (Table
1). Observing more regulatory mutations is broadly consistent with previous reports (Mortlock,
2013; Toll‐Riera
et al,
2016). The regulatory mutations were believed to increase the expression of the target enzyme, thereby increasing the dose of the typically low‐level side activity (Guzmán
et al,
2015). This observation is consistent with “gene sharing” models of promiscuity and adaptation where diverging mutations that alter enzyme specificity are not necessary to acquire the growth innovation (Piatigorsky
et al,
1988; Guzmán
et al,
2015). Furthermore, although enzyme dosage could also be increased through duplication of genomic segments, this scenario was not commonly observed shortly after the weaning phase of our experiments. The one exception was observed in the D‐2‐deoxyribose evolution experiment where two large duplication events (containing 165 genes (
yqiG‐
yhcE) and 262 genes (
yhiS‐
rbsK), respectively) were observed (
Appendix Fig S3). Notably, one of these regions did include the
rbsK gene with the underground activity predicted to support growth on D‐2‐deoxyribose (Table
1).
To identify the causal mutation events relevant to the observed innovative nutrient utilization phenotypes, each key mutation (Table
1) was introduced into the ancestral wild‐type strain using the genome engineering method pORTMAGE (Nyerges
et al,
2016). This genome editing approach was performed to screen for mutation causality (Herring
et al,
2006) on all novel substrate conditions, except for monomethyl succinate, which only contained a single mutation (Table
1). Individual mutants were isolated after pORTMAGE reconstruction, and their growth was monitored in a binary fashion on the growth medium containing the non‐native substrate over the course of 1 week. These growth tests revealed that single mutations were sufficient for growth on D‐lyxose, D‐arabinose, and m‐tartrate (
Appendix Table S3). Interestingly, in the case of D‐2‐deoxyribose, an individual mutation (either the RbsK N20Y or the
rbsR insertion mutation) was not sufficient for growth, thereby suggesting that the mechanism of adaptation to this substrate was more complex. To address this, a pORTMAGE library containing the RbsK N20Y and
rbsR insertion mutations individually and in combination was grown on three M9 minimal medium + 2 g l
−1 D‐2‐deoxyribose agar plates alongside a wild‐type MG1655 ancestral strain control. The large duplications in the D‐2‐deoxyribose strain (Table
1) could not be reconstructed due to the limitations of the pORTMAGE method. After 10 days of incubation, visible colonies could be seen resulting from the reverse engineered library, but not from the wild‐type strain (
Appendix Fig S4A). Subsequently, 16 colonies were chosen and colony PCR was performed to sequence the regions of rbsK and
rbsR where the mutations were introduced (
Appendix Fig S4B). All 16 colonies sequenced contained both the RbsK N20Y and
rbsR insertion mutations. Fifteen of the 16 colonies showed an additional mutation at RbsK residue Asn14—7 colonies showed a AAT to GAT codon change resulting in an RbsK N14D mutation and 8 colonies showed a AAT to AGT codon change resulting in an RbsK N14S mutation. The Asn14 residue has been previously associated with ribose substrate binding of the ribokinase RbsK enzyme (Sigrell
et al,
1999). Only one of the 16 colonies sequenced did not acquire the residue 14 mutation, but instead acquired a GCA to ACA codon change at residue Ala4 resulting in an RbsK A4T mutation. It is unclear if the additional mutations occurred spontaneously during growth prior to plating, but it is possible that these Asn14 and Ala4 residue mutations were introduced at a low frequency during MAGE‐oligonucleotide DNA synthesis (< 0.1% error rate at each nucleotide position) (Isaacs
et al,
2011; Nyerges
et al,
2018). In either case, these results suggested that the observed mutations in rbsK and rbsR enabled growth on the non‐native D‐2‐deoxyribose substrate and that there was a strong selection pressure on the ribokinase underground activity. Further, there were multiple ways to impact
rbsK, as both duplication events and structural mutations (Table
1) or multiple structural mutations were separately observed in strains which grew solely on D‐2‐deoxyribose. Overall, these causality assessments support the notion that underground activities can open short adaptive paths toward novel phenotypes and may play prominent roles in innovation events.
Structural mutations linked to shifting substrate affinities: D‐lyxose evolution
A clear example of mutations involved with optimization was those acquired during the D‐lyxose experiments that were linked to enhancing the secondary activity of YihS (Tables
1 and
2). Structural mutations were hypothesized to improve the enzyme side activity to support the growth optimization state, and this effect was experimentally verified. The effects of structural mutations on enzyme activity were examined for the YihS isomerase enzyme that was mutated during the D‐lyxose evolution (Fig
1B, Table
1). The activities of the wild‐type YihS and three mutant YihS enzymes (YihS R315S, YihS V314L + R315C, and YihS V314L + R315S) were tested
in vitro. A cell‐free
in vitro transcription and translation system (Shimizu
et al,
2001; de Raad
et al,
2017) was used to express the enzymes and examine conversions of D‐mannose to D‐fructose (a primary activity; Itoh
et al,
2008) and D‐lyxose to D‐xylulose (side activity) (Fig
2A,
Appendix Fig S5). The ratios of the turnover rates of D‐lyxose to the turnover rates of D‐mannose were calculated and compared (Fig
2B). Although the single‐mutant YihS enzyme did not show a significant change compared to wild type, the double‐mutant YihS enzymes showed approximately a 10‐fold increase in turnover ratio of D‐lyxose to D‐mannose compared to wild type (
P < 0.0003, ANCOVA). These results suggest that the mutations shifted the affinity toward the innovative substrate (enzyme side activity), while still retaining an overall preference for the primary substrate, D‐mannose (ratio < 1). This is in agreement with “weak trade‐off” theories of the evolvability of promiscuous functions (Khersonsky & Tawfik,
2010) in that only a small number of mutations are sufficient to significantly improve the promiscuous activity of an enzyme without greatly affecting the primary activity.
Mutations in regulatory elements linked to increased expression of underground activities: D‐arabinose evolution
An important growth rate optimizing mutation was found in the D‐arabinose experiments and occurred as a result of an
araC gene mutation, a DNA‐binding transcriptional regulator that regulates the
araBAD operon involving genes associated with L‐arabinose metabolism (Bustos & Schleif,
1993). Based on structural analysis of AraC (Fig
3A), the mutations observed in the two independent parallel experiments likely affect substrate binding regions given their proximity to a bound L‐arabinose molecule (RCSB Protein Data Bank entry 2ARC; Soisson
et al,
1997), possibly increasing its affinity for D‐arabinose. Expression analysis revealed that the
araBAD transcription unit associated with AraC regulation (Gama‐Castro
et al,
2016) was the most highly upregulated set of genes (expression fold increase ranging from approximately 45–65× for Exp 1 and 140–200× for Exp 2,
q < 10
−4, FDR‐adjusted
P‐value) in both experiments (Fig
3B). Further examination of these upregulated genes revealed that the ribulokinase (AraB) has a similar kcat on four 2‐ketopentoses (D/L‐ribulose and D/L‐xylulose) (Lee
et al,
2001) despite the fact that
araB is consistently annotated to only act on L‐ribulose (EcoCyc) (Keseler
et al,
2013) or L‐ribulose and D/L‐xylulose (BiGG Models; King
et al,
2016). It was thus reasoned that AraB was catalyzing the conversion of D‐ribulose to D‐ribulose 5‐phosphate in an alternate pathway for metabolizing D‐arabinose (Fig
3C) and this was further explored.
The role of the AraB pathway in optimizing growth on D‐arabinose was analyzed both computationally and experimentally. Parsimonious flux balance analysis (pFBA; Feist & Palsson,
2010; Lewis
et al,
2010) simulations demonstrated that cell growth with AraB had a higher overall metabolic yield than growth with FucK (in simulations where only one of the two pathways was active,
Appendix Fig S6). This supported the hypothesis that mutants with active AraB can achieve higher growth rates than those in which it is not expressed. This simulation result signaled the possibility of a growth advantage for using the AraB pathway and thus was explored experimentally. Experimental growth rate measurements of clones carrying either an
fucK knockout or
araBAD gene knockouts showed that the FucK enzyme activity was essential for growth on D‐arabinose for all strains analyzed (strains isolated after initial growth on the single non‐native carbon source and strains isolated at the end of the static environment phase) (Fig
3D,
Appendix Table S4). However, removal of
araB from endpoint strains reduced the growth rate to the approximate growth rate of the initially adapted strain (Fig
3D). This finding suggested that the proposed AraB pathway (Fig
3C) was responsible for enhancing the growth rate and therefore qualified as fitness optimization.
Putting these computational and experimental results in the context of previous work, a similar pathway has been described in mutant
Klebsiella aerogens W70 strains (St Martin & Mortlock,
1977). It was suggested that the D‐ribulose‐5‐phosphate pathway (i.e., the AraB pathway) is more efficient for metabolizing D‐arabinose than the D‐ribulose‐1‐phosphate pathway (i.e., the FucK pathway), because the FucK pathway requires that three enzymes (FucI, FucK, and FucA) recognize secondary substrates (St Martin & Mortlock,
1977). The conclusion of St Martin and Mortlock supports the role of the mutations observed here in
araC. In summary, enzymatic side activities of both the
fuc operon (innovative mutations) and
ara operon (optimizing mutations) encoded enzymes were important for the adaptation to efficiently metabolize D‐arabinose.
Computational and expression analyses suggested that a similar mechanism of amplification of growth‐enhancing promiscuous activities played a role in the m‐tartrate optimization regime. Similar to the D‐arabinose experiments, both independent evolutions on m‐tartrate possessed a mutation in the predicted transcription factor,
ygbI. This mutation was associated with the overexpression of a set of genes (
ygbI, ygbJ, ygbK, ygbL, ygbM, and ygbN) with likely promiscuous activity (
Appendix Supplementary Text and
Appendix Fig S7). Further experiments, however, are required to better elucidate the mechanism and involvement of
ygb operon‐associated enzymes in the metabolism of m‐tartrate.
Genome‐scale modeling suggests a role of segmental genome duplication and deletion in adaptation: D‐2‐deoxyribose and D‐lyxose evolutions
Large genome duplications and deletions were observed in the D‐lyxose and D‐2‐deoxyribose evolution experiments. These events were examined using a genome‐scale metabolic model to understand their potential impact on strain fitness. First, we considered whether the large deletion event in the D‐2‐deoxyribose evolution Exp. 1 (Table
2,
Appendix Fig S8) contained genes involved in metabolism. The 171 deleted genes were compared to those genes included in the genome‐scale model of metabolism used in this study. It was found that 44 metabolic genes were located in this region of deletion (
Dataset EV3). Flux variability analysis (FVA; Mahadevan & Schilling,
2003) simulations revealed that none of the 44 genes are individually necessary for optimal growth under these conditions (
Dataset EV3). In fact, all 44 genes can be deleted at once from the genome‐scale model without affecting the simulated growth rate. It was also interesting to note that 18 of the 44 genes were highly expressed in the initially evolved population after weaning and thus significantly down‐regulated (log
2(fold change) < −1,
q‐value < 0.025) by the large deletion event in the endpoint evolution population (
Dataset EV3 and
Appendix Fig S9). These observations are in agreement with previously reported findings that cells acquire mutations that reduce the expression of genes not required for growth during evolution and thus allow the cell to redirect resources from production of unnecessary proteins to increasing growth functions (Utrilla
et al,
2016). Furthermore, there was an additional mutation observed at the same time as the large deletion, namely a smaller 902 bp deletion spanning a major part of the rbsB gene and into the intergenic region upstream of rbsK (Table
2,
Dataset EV1, and
Dataset EV2). The perceived impact of this deletion was to further increase the expression of rbsK, the gene associated with the underground activity required for growth on D‐2‐deoxyribose. The concept of removing enzymatic activities, and potentially multiple simultaneously, to increase fitness is an interesting avenue which, in this case, necessitates a significant number of additional experiments to confirm given the multiple genes affected.
While the YihS structural mutations appeared to be the primary mutations responsible for optimizing growth on D‐lyxose (Fig
2), a genome duplication event observed in Exp. 2 could play a role in improving the growth rate (Fig
1B, Table
2). The genome duplication event spanned a 131 kilobase pair region (
Appendix Fig S10A) resulting in significant up‐regulation of 76 genes (
Appendix Fig S10B). Included in this gene set were
pyrE and
xylB, two genes identified by modeling as important for metabolizing D‐lyxose. The first gene,
pyrE, could enhance growth by increasing nucleotide biosynthesis (Conrad
et al,
2009), and this gene is important for achieving optimal growth in genome‐scale model simulations (
Appendix Fig S10C). The
pyrE gene might have also played a role in improving growth fitness in the m‐tartrate evolution experiments where intergenic mutations upstream of the
pyrE gene were observed in both replicate evolving endpoint populations (Table
2,
Appendix Supplementary Text and Fig S7C, and
Dataset EV2). Another gene in the large duplication event was
xylB, encoding a xylulokinase, which might be catalyzing the second step in the metabolism of D‐lyxose (
Appendix Fig S10D). Simulating increased flux through the xylulokinase reaction in an approach similar to a phenotypic phase plane analysis (Ibarra
et al,
2003) improved the growth rate on D‐lyxose (
Appendix Fig S10C). Thus, increased expression of
xylB and
pyrE as a result of the duplication event in the Exp. 2 endpoint strains could be important for enhancing growth on the non‐native substrate D‐lyxose. While follow‐up experiments over‐expressing these genes individually are necessary to conclusively establish the causal role of increased
pyrE and
xylB expression, this study provides a high‐level picture of the complex mechanisms at work in adaptation to new carbon sources, from structural and regulatory mutations to large‐scale deletions and duplications.