Main

Change in the functions and properties of proteins is a fundamental process in the evolution of life1. So far, most functional studies of natural protein evolution have focused on changes in substrate specificity2,3 or mechanism3, and were mostly performed on a narrow phylogenetic sampling. Unlike changes in protein function, metal-preference evolution can reflect adaptations to changes within an existing or new niche (for example, metal availability) but with retention of the ancestral function (for example, superoxide dismutation). Many metalloenzymes are highly metal specific4,5, especially those that use their co-factor to catalyse redox transformations, and loading of metalloproteins with suboptimal metals decreases their activity and fitness of their host organism6,7,8. Mis-metalation is a major mechanism in metal toxicity7, neurological disorders9 and nutritional immunity6, and is a promising tool in combating pathogenic microbes10. As it is possible to experimentally evaluate environmental, cellular and protein-bound metal abundance, metal co-factor utilization provides an important and tractable system for studying natural protein evolution.

Evidence shows that metal specificities within protein families have frequently changed during their evolutionary history11. These changes have been important in the evolution of commensals into pathogens as they adapt to metal restriction in the host12,13, and in the adaptation of organisms to changes in terrestrial conditions such as altered elemental bioavailability11,14 induced by oxygenation of the atmosphere and oceans by the emergence of photosynthesis. In contrast to metal-specific proteins, metal-interchangeable (cambialistic) proteins can function using alternative metals15,16. Only a few metalloenzymes have been rigorously demonstrated to be cambialistic in vivo, including some iron (Fe)- or manganese (Mn)-dependent superoxide dismutases (SODs; SodFMs)16; metalloenzymes that play an important role in cellular reactive oxygen species defence17.

In this Article, we sought to investigate the mechanisms and extent of metal-preference evolution across the tree of life in a widely distributed and highly conserved metalloenzyme family. To study this process, we leveraged the existence of naturally Fe-specific, Mn-specific and flexible Fe/Mn-cambialistic SodFM homologues, which differentially utilize these metals to catalyse the detoxification of superoxide.

SodFM metal preference is not phylogenetically constrained

Bioinformatic analyses of 3,058 genomes sampled across the tree of life (146 eukaryotic, 2,613 bacterial and 281 archaeal genomes) showed that SodFMs are the most widely distributed and most highly conserved of the canonical superoxide detoxification enzymes (Extended Data Fig. 1a,b). Phylogenetics (Extended Data Fig. 2b,d), sequence (Extended Data Fig. 3a) and structural comparisons (Extended Data Fig. 2a,c,e) enabled us to subdivide the SodFM protein family into five main subfamilies: SodFM1–5 (Fig. 1a–c and Extended Data Fig. 2). Members of the two main subfamilies, SodFM1 and SodFM2, are commonly annotated in databases as being either Mn or Fe specific, respectively14. To test this, we heterologously expressed 64 representative SodFMs (Extended Data Fig. 4) sampled from across their protein family tree (Fig. 2a) and characterized their activity with Fe and Mn. This revealed that neither subfamily was homogeneous for metal utilization (Fig. 2a) and should no longer be referred to as Fe or Mn specific on the basis of homology searches alone. The robustness of the SodFM1 and SodFM2 groupings (Extended Data Fig. 3), and their wide distribution across the tree of life (Fig. 1a), is consistent with the hypothesis that the split between the SodFM1 and SodFM2 subfamilies is ancient14. On the basis of the distribution of the different metal preferences across their protein trees, ancestral sequence reconstruction (Fig. 2a) and estimations of ancestral metal utilization (Extended Data Fig. 2f,g), it seems that the ancient last common ancestors (LCAs) of SodFM1s and SodFM2s were probably Mn and Fe specific, respectively, but these ancestral states were later changed multiple independent times throughout the SodFM family’s evolution (Fig. 2a).

Fig. 1: Evolutionarily distinct SodFM subfamilies have wide and variable distribution across the tree of life.
figure 1

a, Pattern of variable distribution of the five SodFM subfamilies (Extended Data Fig. 2) observed across the tree of life. The coloured semi-circles represent the distribution of SodFM1s (1,105 sequences, orange), SodFM2s (823 sequences, blue), SodFM3s (359 sequences, red), SodFM4s (354 sequences, brown) and SodFM5s (179 sequences, orange), mapped onto a phylogeny of 146 eukaryotes, 2,577 bacteria and 288 archaea. The presence of species with more than a single representative of a particular SodFM subfamily (black, Supplementary Data 18) reveals multiple independent lineage-specific SodFM1–5 subfamily expansions. b, Tables containing numbers and mean percentage of protein sequence identity of the SodFM1–5 subfamily members found within the analysis of 3,011 genomes sampled across the tree of life (a). c, Sequence logos illustrate frequencies of key residues found across the analysed SodFMs, including four universally conserved metal-coordinating residues, three histidines and a single aspartate (green triangles). The key distinction among SodFM1–5 subfamilies is the identity and position of a water-coordinating residue (orange triangles) spatially located within the enzymes’ catalytic centres: SodFM1/5s have a C-terminal Gln, SodFM2s have an N-terminal Gln, and most SodFM3–4s (75% sequences) have a C-terminal His and the rest of the SodFM3s and SodFM4s (25% each group) contained C-terminal Gln instead of His, often reflecting secondary His/Gln switches (Extended Data Fig. 2a). The key distinction between SodFM3s and SodFM4s is the loss of the conserved C-terminal His in the HXXXHH motif in SodFM4s. The key distinction between SodFM1s and SodFM5s is that the studied representatives of the former are homodimeric, whereas those of the latter are tetrameric (Extended Data Figs. 2a and 3a). Residue XD-2 represents an important determinant of Fe/Mn metal preference (pink triangle).

Fig. 2: Enzymes with diverse metal preferences can be found across the SodFM subfamilies.
figure 2

a, The aCR values for the 64 characterized SodFMs were mapped onto the protein tree inferred from 2,688 SodFM sequences identified across the tree of life (Fig. 1a). A range of aCRs, from higher Fe preference (aCR >2, red), through cambialistic (0.5 < aCR < 2, green), to higher Mn preference (aCR <0.5, orange), were found across the SodFM subfamilies. Sensitivity to peroxide inhibition (+) was detected for Fe-preferring and Fe-loaded cambialistic enzymes, but not in the Mn-preferring enzymes, which were resistant to peroxide treatment (−). Identity of the metal’s two key secondary coordination sphere residues, namely water-coordinating Gln/His and the XD-2 residue located close to the Asp ligand (Fig. 1c), are displayed next to aCRs. b, Distribution of the aCRs for the SodFM subfamilies is presented as box and whiskers plots (left), and the frequencies of different amino acid residues found at the XD-2 position are presented as pie charts (right). SodFM1s and SodFM5s were grouped together here as they both contained C-terminal Gln and displayed similar metal preference and XD-2 distribution patterns (SodFM5s XD-2, G: 89%; A: 7%; T: 1.7%). The high proportion of Mn-preferring SodFM1s and Fe-preferring SodFM2s, and the predicted Gln/His and the XD-2 residues in the reconstructed sequences of the ancestral nodes (a, dashed outlines), are consistent with the hypothesis that the LCAs of these two subfamilies were Mn and Fe preferring, respectively (Extended Data Fig. 2f,g). c, Distribution of the aCR values for SodFMs (top) with different water-coordinating residues (C-terminal Gln, N-terminal Gln or C-terminal His) and different XD-2 residues (bottom) are displayed as box and whiskers plots on a logarithmic scale, colour coded to reflect apparent metal specificities (Mn: orange; cambialistic (camb): green; Fe: red). Amino acid residues were colour coded to enable comparisons among a, b and c.

Uncovering untapped evolutionary diversity of SodFMs

We also identified three further subfamilies of distinct SodFMs, SodFM3–5 (Extended Data Fig. 2). SodFM3s and SodFM4s seem to be restricted mostly to prokaryotes as none were identified in our sampling of eukaryotes (Fig. 1b). SodFM4s are the most divergent of all SodFMs (Fig. 1b,c and Extended Data Fig. 2b) but are largely uncharacterized, with only a single available crystal structure (Extended Data Fig. 2a). Like the SodFM1s and SodFM2s, metal preference is not conserved across the SodFM3 and SodFM4 subfamilies (Fig. 2a). One of the key structural distinctions between these subfamilies is that SodFM1/2s are homodimeric and SodFM3/4s are homotetrameric (Extended Data Fig. 2a). The subfamily of SodFM5s provides an interesting intermediate group of tetrameric enzymes (Extended Data Fig. 2a) with some sequence features, including the frequency of XD-2 residue (second residue towards the N-terminus from Asp metal ligand), more akin to those of dimeric SodFM1s (Figs. 1c and 2b). SodFM5s form a sister group to SodFM3s in phylogenetic trees (Extended Data Fig. 2b,e) and structural comparisons (Extended Data Fig. 2c,d), and consist mostly of eukaryotic mitochondrial SodFMs (for example, human mitochondrial Mn-SOD) and a few bacterial sequences including that of Parachlamydiaacanthamoebae (Fig. 2a). Grouping of the highly distinct mitochondrial SODs with sequences from Chlamydia could represent functional convergence, eukaryote to prokaryote lateral gene transfer (LGT) or be evidence of ancient LGT between the bacterial ancestor of mitochondria and that of Chlamydia.

SodFM metal preference is continuous rather than discrete

Within subfamilies SodFM1, SodFM2 and SodFM4, we identified examples of cambialistic enzymes12,18, and four subfamilies (SodFM1–4) had members that exhibited multiple different metal utilization preferences (Fig. 2a). We thus leveraged our large set of natural enzymes sampled from across the tree of life (Fig. 2a) to investigate if metal preference is a discrete three-state property (Fe, Mn or cambialistic), or rather forms a range of values between perfect Fe and Mn specificities. Although the term ‘metal-specificity’ is commonly used in reference to the disproportional activity of a SOD enzyme with Fe and Mn, the cambialistic SODs are inherently non-specific for either metal as they can use both13. Across the tested SodFMs, metal preferences, represented by approximate cambialism ratios (aCR, equal to the Fe-dependent activity divided by the Mn-dependent activity), formed a range of values (Fig. 2a–c) between higher Fe preference (aCR >2) and higher Mn preference (aCR <0.5), with perfect cambialism at the midpoint (aCR of 1), indicating that metal preference is best thought of as a continuum. We propose the term ‘metal preference’ instead of ‘metal specificity’ to better reflect the observation that most SodFMs are not completely metal specific. Note that the terms metal preference and metal specificity reflect the relative activity levels with alternative metal co-factors, and should not be confused with metal selectivity (metal-binding preference19), which refers to differences in the affinity of metal binding20.

Secondary sphere is key to SodFM metal-preference evolution

To test if aCR values matched with the identity of previously identified residues spatially located within the metal’s secondary coordination sphere, XD-2 (refs. 13,21,22) and HCterm/QCterm/QNterm23,24,25,26,27 (Fig. 1c), we analysed the distribution of aCRs with respect to the residue found at these positions across SodFMs found in nature (Fig. 2). Higher Fe preference was usually associated with the presence of two alternative water-coordinating residues, HCterm (69% FeSODs, 9/13 enzymes) or QNterm (72% FeSODs, 18/25 enzymes), whereas the enzymes containing QCterm were mostly Mn preferring (78% MnSODs, 21/27 enzymes). At the position XD-2, the two smallest amino acids, glycine (GD-2) or alanine (AD-2) (Fig. 2b), were frequently found in SodFMs with higher Mn preference (Fig. 2c). More cambialistic outliers included the Bacteroidesfragilis (aCR 0.98) and Coprobacterfastidiosus (aCR 0.99) SodFM2s, which seem to have evolved higher Mn activity from their most likely Fe-preferring ancestor. This confirmed that the presence of GD-2 and AD-2 within the context of otherwise predominantly Fe-preferring SodFMs can indicate potential evolutionary metal-preference modulation events. In contrast, either threonine (TD-2) or valine (VD-2) at this site (Fig. 2b) were usually found in SodFMs with higher Fe preference (Fig. 2c). The only two non-Fe-preferring outliers were two Bacteroidales SodFM1s from Anaerophagathermohalophila (TD-2; aCR 0.87) and Bacteroidetes RBG_13_43_22 (TD-2; aCR 0.46). However, these two SodFM1s have probably evolved from highly Mn-preferring ancestors towards increased activity with Fe. This suggests that mutations of XD-2 and HCterm/QCterm/QNterm residues were involved in the identified evolutionary changes of metal preference. Importantly, these isozymes also demonstrated that understanding an enzyme’s evolutionary history can aid predictions of their properties.

Switched metal preference evolved multiple independent times

To better understand the context of metal-preference evolution, we mapped the distribution of the identified residues and measured aCRs onto the trees of life (Fig. 3a–e and Extended Data Figs. 5 and 6). Clear patterns of metal-preference modulation were identified during the evolution of bacterial lineages important for human health including Bacteroidales (Fig. 3a and Extended Data Fig. 5), Mycobacteria (Fig. 3d) and Staphylococci (Fig. 3e); and enigmatic uncultured Wolfebacteria (Fig. 3c) from bacterial candidate phyla radiations (CPR). In human-pathogenic Mycobacteria, acquisition of more Fe-preferring tetrameric SodFM3s has occurred at least twice (Fig. 3d), whereas a single neofunctionalization event could be identified in the ancestor of Staphylococcusaureus and Staphylococcusargenteus (Fig. 3e). The largely anaerobic Bacteroidales are a taxonomic group in which metal preferences have frequently switched. We identified at least two independent emergences of cambialism in this group, once from Fe-SodFM2 and a second time from Mn-SodFM1 ancestors. CPR Wolfebacteria represented a striking and highly unusual emergence of dimeric (Extended Data Fig. 7) Fe-preferring SodFM1 containing SodFM3/4-like (Fig. 3c) HCterm and VD-2 residues (Fig. 1c). Importantly, our extensive analysis of SodFM metal utilization evolution provided further evidence demonstrating that the identity of XD-2 and HCterm/QCterm/QNterm combined with phylogenetic analysis can enable predictions of their metal preferences more accurate than those on the basis of sequence analysis or phylogenetics alone.

Fig. 3: SodFM metal preference and peroxide sensitivity have been changed independently many times within different evolutionary contexts.
figure 3

ae Distribution of SodFM1–4 subfamilies (orange–brown heat maps in a and b, or coloured circles in ce) was mapped onto the species trees of Bacteroidota (a), Proteobacteria (b), CPR Wolfebacteria (c), Mycobacteriaceae (d) and Staphylococcaceae (e). In Gram-negative bacteria, SodFM repertoires consisted mainly of SodFM2s in Proteobacteria (b) and SodFM1s in Bacteroidota (a). Within the Bacteroidota, a subset of the Bacteroidales have switched to SodFM2 (a). SodFM1s and SodFM4s were found in uncultured CPR Wolfebacteria (c). In Gram-positive Mycobacteriaceae only SodFM3s were identified (d), whereas Staphylococcaceae contained only SodFM1s (e). The identities of the water-coordinating Gln/His, the residue XD-2, and verified aCR and peroxide inhibition for the SodFMs characterized in this study were mapped onto the trees and annotated as previously described (Fig. 2). Inferred multiple independent evolutionary shifts towards higher Fe preference and peroxide sensitivity (red circles), higher Mn preference and peroxide resistance (orange circles), emergences of cambialism (green circles) and likely SodFM repertoire changes via LGT (magenta circles) were annotated onto the trees (ae). f, The average phylogenetic distances between the inferred LCA nodes (coloured circles in ae) and their descendent tree tips were calculated and presented on a colour-coded scale, where the minimum value represents the extant homologues (tree tips) and the maximum value represents the average distances of all studied archaea, bacteria and eukaryotes from their LCA (universal LCA). Emergence of alphaproteobacterial SodFM2.4 was the most ancient and that of the cambialistic SodFM1 within staphylococci the most recent of the identified metal-preference evolutionary modulation events. Scale bars represent the number of substitutions per site.

Ancient SodFM metal-preference switch in Alphaproteobacteria

Based on predictions derived from our analysis, we identified a distinct group of Mn-preferring SodFM2s in Alphaproteobacteria (SodFM2.4; Fig. 3b and Extended Data Figs. 6 and 8) constituting a major component of marine and terrestrial ecosystems. On the basis of protein trees (Fig. 2a and Extended Data Fig. 6b), species trees (Fig. 3b and Extended Data Fig. 6a), amino acid correlation analysis (Extended Data Fig. 8) and distribution of the key amino acid residues (Extended Data Figs. 6b and 8), SodFM2.4s have evolved from ancestral Fe-preferring SodFM2 (Extended Data Figs. 6b and 8) in the LCA of all Alphaproteobacteria following the split from Rickettsiales (Extended Data Fig. 6a), and represent the most ancient identified evolutionary metal-preference switch (Fig. 3f) other than that at the split between SodFM1s and SodFM2s (Fig. 2a). Later, five divergent SodFM2.4 sequences reacquired more typical SodFM2 residues, including T/V/CD-2 in place of the SodFM2.4 hallmark residues GD-2, and display Fe preference (for example, Methylocapsaacidiphila; Fig. 3b and Extended Data Fig. 6a), suggesting they have evolutionarily reverted to their ancestral metal preference on at least three independent occasions (Fig. 3b and Extended Data Figs. 6a,b and 8a). Extensive sequence analysis-driven mutagenesis of Agrobacteriumtumefaciens SodFM2.4 (Extended Data Fig. 9m) revealed that, compared with changing from preferential utilization of a single metal co-factor to cambialism, as observed in S.aureus and Bacteroidetes (Fig. 3a,e), reversion from high Mn to high Fe preference may require more than a single mutation. It is therefore striking that SodFM2.4s from multiple species have evolved back to the ancestral Fe preference (Extended Data Fig. 6) rather than acquiring a canonical Fe-preferring SodFM2 via LGT.

SodFM metal preference co-evolves with peroxide sensitivity

In addition to higher Mn preference, similarly to MnSodFM1s, MnSodFM2.4s were resistant to hydrogen peroxide inhibition (Extended Data Figs. 4c,k and 8a). It is well established that metal preference is strongly associated with the sensitivity of SodFMs to peroxide28. Mn-preferring and Mn-loaded cambialistic SODs tend to be resistant, whereas the activity of Fe-preferring and Fe-loaded cambialistic SODs is typically inhibited by peroxide. Results from our characterization of isozymes sampled across the tree of life (Fig. 2a and Extended Data Fig. 4) showed that this biochemical feature was universally conserved across all studied SodFMs, consistent with it co-evolving with metal preference. Consequently, as in the case of metal preference, peroxide inhibition is not constrained to phylogenetic subfamilies, but has changed multiple independent times. This suggests that, in addition to adapting to changing metal bioavailability, metal preference could evolve in response to high concentrations of hydrogen peroxide, which is often used as a potent anti-microbial agent, both artificially in healthcare and by living organisms in nature.

SodFMs vary in susceptibility to metal-preference modulation

To verify our hypothesis that SodFMs can evolve their metal preference and peroxide inhibition by mutating secondary coordination sphere residues XD-2 and HCterm/QCterm/QNterm, we performed biochemical characterization of 50 synthetically generated mutants of the natural enzymes. Mutations were designed to alter metal preference by introducing residues from isozymes with opposing preference, guided by our analysis of the frequencies of the amino acids in these positions in natural enzymes (Fig. 2).

Of all tested mutants, 35 (70%) exhibited a significant shift in aCR value, indicating change in specific activity with one metal relative to that with the other metal. Of these 35 mutants, a total of 26 variants (52% of all mutants) displayed changes in the preferred metal relative to the wild type (WT) (Extended Data Fig. 9 and Supplementary Data 8), further supporting the significance of these residues in metal-preference determination. Decreased expression levels were observed for nine variants (18% of all mutants), suggesting that mutation of these residues can also have deleterious effects on protein stability and/or folding. No significant effects were observed in 15 (30%) mutants (Extended Data Fig. 9 and Supplementary Data 8), thus mutagenesis of XD-2 and HCterm/QCterm/QNterm does not always result in metal preference change. This indicates the importance of other regions of the protein fold for metal-preference modulation by XD-2 and HCterm/QCterm/QNterm mutagenesis. This is particularly evident in specialized metal-preferring SODs such as Escherichiacoli SodA, which does not acquire Fe when homologously overexpressed in our E.coli strain in aerobic conditions (Extended Data Fig. 10f); and in Bacillusanthracis SodA2 (Fe-preferring SodFM1) or A.tumefaciens Mn-preferring SodFM2.4, where mutagenesis of XD-2 had no effect on their metal preference, although it played a clear role in the metal preference switch of their ancestor. This suggests that, following the metal-preference switch, subsequent changes in other regions of the fold can de-correlate later effects of the XD-2 and HCterm/QCterm/QNterm mutations from their initial effects, a process termed epistatic drift29. Alternatively, metal-preference changes via XD-2 and HCterm/QCterm/QNterm in some SodFMs could require preceding changes in other regions of the fold (intramolecular pre-adaptations).

Regardless of the underlying evolutionary mechanisms, the residues XD-2 and HCterm/QCterm/QNterm constitute a strong indicator of metal preference in natural enzymes (Fig. 2). Although the susceptibility of SodFMs to metal-preference modulation by mutating these residues was not universal, the majority of variants in which these residues were mutated exhibited significant modulation of metal utilization.

Cambialism can be a positively selected property in nature

Next, we leveraged the cambialistic SODs from S.aureus and B.fragilis to investigate the specific role of the XD-2 position in determining metal preference (Fig. 4a,b and Extended Data Fig. 9a–d). We investigated multiple variants in which their XD-2 residues were mutated (Extended Data Fig. 9a–d), guided by our analysis of which amino acid appears at this position in natural SodFM sequences (Fig. 2b). In the S.aureus cambialistic SodFM1, the magnitude of aCR change was dependent on which amino acid was introduced at XD-2 (Fig. 4a), and was consistent with the correlation between the aCR and XD-2 residues of SodFMs sampled across the tree of life (Fig. 2c). Importantly, while three S.aureus mutants shifted back towards the ancestral Mn preference, three mutants showed greater Fe preference (Fig. 4a). The shift towards Fe preference was observed for the two amino acids (Thr and Val) most often present in Fe-preferring SODs, while a strong change towards the ancestral Mn specificity was observed for the two amino acids (Ala and Gly) most commonly found in Mn-preferring SODs. This capacity for bidirectionally changing metal preference in S.aureus SodM was unique across the tested SODs, suggesting it can provide a powerful model system for future studies of how metal preference is determined. This contrasted with the B.fragilis cambialistic SodFM2 (Fig. 4b and Extended Data Fig. 9c,d) where the majority of mutants, regardless of the amino acid introduced at the XD-2 position, were shifted towards the ancestral Fe preference. This can probably be explained by the structure of the amino acid (glycine) found in B.fragilis XD-2 position (Fig. 4e), which has no sidechain, thus replacement with any sidechain found in other amino acids affects its local environment and influences metal preference (Fig. 4b). Therefore, our results suggest that the S.aureus cambialistic SodFM1 has potential for evolution towards higher Fe preference, whereas the B.fragilis cambialistic SodFM2 has reached the highest possible level of Mn preference via mutagenesis of the XD-2 residue alone. Crucially, the cambialistic sequence is conserved in all analysed S.aureus and S.argenteus genomes, despite our observation that only a single XD-2 mutation is required to switch S.aureus cambialistic SodFM1 into an Fe-preferring SOD with substantially increased Fe-dependent activity, providing evidence for the selection of cambialism rather than Fe preference in S.aureus. We thus propose that cambialism need not strictly be an evolutionary intermediate stage between two metal specificities but rather the desired property that can be selected for in nature, probably in this case for circumventing nutritional immunity12.

Fig. 4: Functional studies reveal multiple molecular pathways to metal-preference modulation in SodFM metalloenzymes.
figure 4

a,b, Residues at the XD-2 position in cambialistic S.aureus SodFM1 (LD-2, a) and B.fragilis SodFM2 (GD-2, b) were mutated to those found commonly at this position in natural SodFMs (Fig. 2b). The average specific activities of the WTs and mutants loaded with either Mn (x axis) or Fe (y axis) were plotted as colour-coded circles. In S.aureus (a), residues most commonly found in Mn-preferring SODs, AD-2 and GD-2 (Fig. 2a,b), shifted metal preference towards Mn (orange circles), whereas those found mainly in Fe-preferring SODs, VD-2 and TD-2 (Fig. 2a,b), shifted towards higher Fe preference (red circles). In the cambialistic SodFM2 from B.fragilis (b), all mutants displayed lower activity with Mn and the majority had higher activity with Fe, representing change towards its most likely ancestral Fe-preferring phenotype. c, For Fe-preferring CPR Wolfebacteria SodFM1 (CPR-His), mutations in both the XD-2 and water-coordinating H/Q residues were necessary to shift towards the inferred ancestral higher Mn activity. The likely ancestral phenotype was proxied by the closely related SodFM1 (Fig. 2a) from more basal Wolfebacteria (Fig. 3c). g,h, Single mutants, but not double mutants, shifted metal preference of N.alba SodFM3 (g, Q/H mutant) and Woesearchaeota SodFM4 (h, VD-2A). df, Crystal structures of S. aureus SodFM1 LD-2G (d), B. fragilis SodFM2 GD-2T (e) and CPR-His SodFM1 VD-2G H/Q double mutant (f) were analyzed to investigate structural effects of these mutations. The structures in e and f were solved in this study. Cartoon representation of the active sites were superimposed onto corresponding WT. Apart from the mutated residues, there were no significant changes to the position of metal (manganese, blue, and iron, orange spheres), catalytic water molecules (red spheres) and primary metal coordination bonds (dashed lines) within the resolution limit of the solved structures (backbone root mean squared deviation (RMSD) of 0.4 Å, and metal primary coordination sphere RMSD below 0.1 Å). i, Plot represents fold changes in activity with Mn and Fe of the tested SodFM mutants relative to the WTs (Supplementary Data 8), where arrows indicate significant increase in activity with only one metal (Fe, red; Mn, orange), or decline in activity with both metals (grey). Neither of the increases in activity with both metals in a single mutant (brown) were significant. The fold changes in activity are represented on a log2 scale.

Multiple evolutionary pathways can modulate metal preference

To further test the wider evolutionary significance of the secondary coordination sphere residues, we investigated double mutant variants combining mutations at positions XD-2 and the water-coordinating Q/HCterm. Our characterization of these variants revealed multiple evolutionary pathways towards metal-preference switching. In some cases, such as Nesterenkoniaalba SodFM3 Q/H, a single residue has a major impact on the preferred metal co-factor (Fig. 4g and Extended Data Fig. 9k,l), while in others, such as CPR SodFM1 H/Q, mutation at the same position had a negligible effect (Fig. 4c and Extended Data Fig. 9e,f). In some, such as CPR SodFM1 H/Q VD-2G, only a cumulative effect of the two mutations affected the metal-preference (Fig. 4c and Extended Data Fig. 9e,f), whereas no significant effect was observed in either single or double mutants of nanoarchaeon Nanopusillusacidilobi SodFM4 (Extended Data Fig. 9i,j). The latter observation suggests that additional loci in the SodFM sequences must also regulate metal preference, which await identification. SodFM metal preference is probably modulated via fine tuning of the metal co-factor’s redox potential13,26,27,30. Interestingly, in all XD-2 and Q/HCterm mutants with significantly changed metal preference, activity with at least one metal decreased (Fig. 4i). Significantly increased activity with both metals at the same time was not observed in any of these mutants. This suggests that a substantial and concomitant increase in activity with both metals via XD-2 and HCterm/QCterm/QNterm mutagenesis is impossible due to biophysical constraints. This is consistent with the ‘redox tuning’ hypothesis31 under the assumption that structural changes that modulate the reduction potential of one metal should also change the reduction potential of the alternative metal within the same active site. This also indicates that SodFMs have evolved to maximize their metal-specific activity via XD-2 and Q/H mutagenesis, but within limitations enforced by likely epistatic effects32 of other sites within their protein structure. Importantly, consistent with previous reports13,26,27,33, comparison of four protein crystal structures of the WT and mutant SodFMs we determined here (Fig. 4e,f and Extended Data Fig. 7) indicated that metal-preference modulation by XD-2 and Q/H occurs without any significant changes to the secondary metal coordination sphere within the resolution limits of the solved structures. This provides further support for ‘redox tuning’ via subtle structural changes13,34. Better understanding of the residues that determine metal preference and its evolvability35 in SodFMs can enable development of models describing how evolutionary changes in metalloprotein structure can shape their biophysical properties.

Discussion

Despite substantial and continuing progress in our understanding of protein functions and the evolutionary processes that shape them, a vast gap exists between the number of known natural protein sequences and the number of those that have been investigated experimentally. Bridging this gap is required to further our understanding of the fundamental mechanisms involved in the functioning and evolution of all living systems, and to apply this knowledge in medicine, biotechnology and synthetic biology.

Although SodFMs are a well-studied protein family, with over 50 WT protein crystal structures available from various organisms, we were able to uncover additional diversity via application of bioinformatics and phylogenetics on a dataset of genomes sampled across the tree of life. This included the novel categorization of the SodFM protein family into five distinct subfamilies, the first comparison of multiple members of the largely unstudied group of SodFM4s, and identification of atypical isozymes such as the Fe-preferring SodFM1 from CPR Wolfebacterium. It also enabled us to identify multiple independent evolutionary events, both ancient and more recent, in which their key biophysical property of Fe/Mn metal preference was switched. S.aureus cambialistic SodFM1, involved in evasion of the mammalian immune response12,13, has emerged from the most recent of these switches, and is susceptible to modulation of its metal preference in both directions. This contrasted with the Agrobacterium Mn-preferring SodFM2.4, which descended from the most ancient identified metal-preference switch, where modulation via mutagenesis of the identified key residues was no longer possible. This is probably due to epistatic drift, which can de-correlate the later effects of mutations from their initial effects29. We also demonstrated that Agrobacterium Mn-SodFM2.4, and all other tested SodFMs active with Mn, are resistant to hydrogen peroxide inhibition, which could provide additional selective advantage, for example, to organisms involved in peroxide scavenging within microbial communities36,37,38.

Metal preferences of SodFMs are commonly misannotated in databases and cannot be predicted by sequence analysis or phylogenetics alone, but our application of both methodologies in combination improved prediction accuracy. Many other protein families include metalloproteins, and although metal preference has been established for some members of some families, its diversity has probably been frequently underappreciated. For example, the unexpected discovery of copper-only SODs, devoid of the requirement for zinc, among the Cu/Zn-dependent SOD family requires reassessment of the annotation of these distinct superoxide detoxification enzymes39,40. Similarly, a carbonic anhydrase isozyme from marine diatoms has evolved to utilize cadmium, in place of the usual Zn, as an adaptation to oceanic Zn deficiency41. In SodFMs, metal preference can be changed within short evolutionary timeframes. Although the selection pressures driving their evolution remain to be evaluated, we hypothesize that environmental conditions affecting intracellular metal homeostasis are probably frequent drivers. The same selection pressures would affect other metalloproteins, making the groups of organisms with signatures of evolutionary SodFM metal preferences changes, such as Alphaproteobacteria and Bacteroides, prime candidates for investigations of metal-preference evolution in other protein families. This could have particularly important implications in healthcare as metal homeostasis and correct protein metalation are important in human disease9 and in microbial infections6, and have been suggested to be promising anti-microbial therapeutic targets10. Furthermore, the ability to engineer cambialistic synthetic proteins can enable the utilization of different metals to perform the same activity, probably providing economic and environmental benefits in biotechnological applications.

The existence of Fe-preferring, Mn-preferring and cambialistic SodFMs is well established, and their metal-preference changes were previously hypothesized as being related to ancient geological changes such as Earth’s atmosphere oxygenation11,14 and adaptation of pathogens to nutritional immunity12,13. Here we identified multiple additional metal-preference changes and demonstrated that, rather than switching between three discrete states of metal specificity, SodFM metal preference evolves by sliding on a continuous scale of cambialism ratio (CR). This evolutionary fine tuning probably occurs through the proposed biophysical mechanism of redox tuning31, a hypothesis supported by our biochemical and structural analysis of natural SodFMs and their metal-preference switched mutants. It was previously hypothesized that ancient ancestral SodFMs were cambialistic14,42, but in light of our data, we hypothesize that the ancient SodFMs would have been susceptible to metal-preference modulation, similarly to the extant homologues. A physiological role has only been previously demonstrated for one cambialistic SodFM, in S.aureus pathogenesis. Yet our data also indicate that on modern Earth, cambialism can be more common and more important than previously anticipated as, for example, in the Bacteroidales, cambialism has evolved multiple independent times from different ancestral backgrounds. However, the wider biological significance and frequency of cambialism and evolutionary metal-preference modulation in other protein families remains to be explored. Metals are indispensable for modern cellular life, probably played important catalytic functions at its origin, and their bioavailability has changed dramatically over the Earth’s history. However, it is not always clear why one metal was selected over another to play a particular biological role or whether the selected metal can be easily changed for another metal with similar biophysical properties in the course of evolution. Our findings suggest that in the case of SodFMs, the preference for one of the two possible metal co-factors can and has been modulated multiple independent times in the course of the SodFM protein family’s evolution, and remains susceptible to further changes in the extant family members.

Methods

Generation of ΔsodAΔsodB expression strains of E. coli BL21(DE3) and pLysSRARE2 (DE3) pLysS

The sodFM gene knockout (KO) strain of E.coli BL21(λ DE3) was generated following a standard protocol43 with minor modifications. First, the sequences of sodA and sodB genes in the BL21 genome (GCF_000022665.1, ASM2266v1) were compared with that of the E.coli K-12 strain (GCF_000005845.2, ASM584v2) where successful gene KO strains were previously generated44. Specificity of designed primers with homology to the targeted genes was verified using local BlastN (-task blastn-short, e-value 0.1)45 against the whole BL21 genome. E. coli BL21(λDE3) cells expressing λ Red recombinase (BL21-RED) from pKD46 plasmid were induced with 10 mM arabinose at OD600 of 0.2 and made electrocompetent at OD600 of 0.6 by three washes in ice cold 10% (vol/vol) glycerol. A kanamycin resistance cassette with recombination overhangs was amplified from pKD4 plasmid template with Q5 Hot Start DNA Polymerase (NEB), purified with Monarch DNA gel extraction kit (NEB), and electroporated (500 ng DNA, 250 ng μl−1) into 90 μl electrocompetent BL21-RED cells at 2.5 kV, 200 Ohm, 0.025 F for 4.5 ms (BTX ECM 630) in 0.1 cm electroporation cuvettes (VWR), followed by 2 h recovery in super optimal broth with catabolite repression (SOC) medium. All bacterial cultures (except recovery following electroporation in SOC medium) were grown in Luria-Bertani medium supplemented with 0.2% (wt/vol) glucose, with 1.5% (wt/vol) agar for solid medium. The remaining steps of the protocol were performed as described in the original publication43. The BL21(λDE3) ΔsodB mutant was used for subsequent generation of the BL21(λDE3) ΔsodAΔsodB double mutant following the same protocol. Correct antibiotic resistance cassette insertions were verified using diagnostic polymerase chain reaction (PCR) with OneTaq Quick-Load 2× Master Mix (NEB). The gene deletion was verified using PCR with sodA(B)_screen_F(R) primers complementary to regions upstream and downstream from the deleted loci, and its phenotype confirmed using the in-gel SOD activity assay on soluble protein extracts from the WT and KO strains (Extended Data Fig. 10c). The BL21(DE3) pLysSRARE2 ΔsodAΔsodB strain was constructed by transforming the BL21(λDE3) ΔsodAΔsodB with pLysSRARE2 plasmid isolated from the commercially available ROSETTA2 (λDE3) pLysS strain (Novagen). All primers used for KO strain generation and verification were synthesized by Eurofins Genomics, with sequences provided in Supplementary Data 1.

SOD genes synthesis, cloning and site-directed mutagenesis

Amino acid sequences of SodFMs of interest identified in bioinformatic analyses were used to generate gene sequences codon optimized for expression in Escherichiacoli B using commercial Codon Optimization Tool (IDT). Predicted N-terminal targeting sequences identified with SignalP5 (ref. 46), TargetP 2.0 (ref. 47), ChloroP48 and Phobius49, and fusion domains identified with InterProScan 5 (ref. 50), were manually verified in protein alignments as not being a part of the structural and functional conserved core of the SOD fold, and were excluded from the construct sequences. Sequence extensions complementary to the cloning site within pET-22b(+) (Novagen) were added to each codon-optimized gene and resulting constructs (Supplementary Data 2) were synthesized as gBlocks gene fragments (IDT). The amino acid sequences of the 82 WT SOD constructs investigated in this study are listed in Supplementary Data 3. The synthetic constructs were used directly for enhanced Gibson assembly cloning51,52 into the pET-22b expression vector backbone. Ratios of 4 fmol plasmid (15 ng): 24 fmol insert (usually 10 ng) in DNA mix, and 2.5 μl of the DNA mix: 7.5 μl enzyme mix were used for the enhanced Gibson assembly. Site-directed mutagenesis of the SodFMs in pET-22b was performed using Q5 Hot Start DNA Polymerase (NEB) and the KLD kit (NEB) following the manufacturer’s protocol with the single exception of using 5 μl instead of 10 μl final reaction volume. Sequences of all primers used in site-directed mutagenesis are provided in Supplementary Data 1. All plasmid propagations steps were performed in E.coli DH5α.

Expressing SodFMs in the presence of Fe or Mn in 24-well culture plates

For screening of soluble expression and aCR verification experiments, E. coli BL21(λDE3) ΔsodAΔsodB cells transformed with pET-22b SOD expression constructs were cultured in 4–5 ml M9 medium (0.1 mM CaCl2, 2 mM MgSO4, 0.4% glucose, 0.2% casamino acids, 1 mM thiamine, 12.8 g l−1 Na2HPO4 × 7H2O, 3 g l−1 KH2PO4, 0.5 g l−1 NaCl and 1 g l−1 NH4Cl) in 24-well culture plates (Whatman Uniplate) at 37 °C with 180 r.p.m. orbital shaking. Pre-cultures were grown for 16 h in M9 medium containing 100 µg ml−1 ampicillin without metal supplementation. Aliquots (200 μl) of the pre-cultures were used to inoculate 4 ml M9 medium with ampicillin (100 μg ml−1) supplemented with either 500 μM MnCl2 or 100 μM Fe(NH4)2SO4 and cultured until OD600 reached 0.4–0.6. At that point, additional Mn (1 mM final concentration) or Fe (200 μM final concentration) and isopropyl β-d-1-thiogalactopyranoside (1 mM final concentration) were added, and culture was continued for 4 h. Cultures in Fe- or Mn-supplemented media were always performed in separate 24-well plates to avoid metal cross-contamination. The final OD600 was measured, and the 24-well culture plates were centrifuged for 10 min at 4,000g. Spent media were removed, and cell pellets were resuspended in cell lysis solution (20 mM Tris pH 7.5, 100 μg ml−1 lysozyme, 10 μg ml−1 DNAse I, 1× Roche cOmplete ethylenediaminetetraacetic acid (EDTA)-free protease inhibitors), volume normalized on the basis of their final OD600 measurements and transferred to 1.5 ml Eppendorf tubes. E.coli cells were lysed by 10 s sonication (MSE Soniprep 150 with exponential probe at 10–14 μm amplitude) in a cold room and centrifuged for 20 min at 21,100g. The same protocol was followed for BL21(DE3) pLysSRARE2 ΔsodAΔsodB except for 16 h expression at 16 °C in M9 media containing 100 μg ml−1 ampicillin and 30 μg ml−1 chloramphenicol. The supernatants (soluble protein extracts) containing the expressed SodFMs were used in the subsequent bicinchoninic acid protein concentration assay (Pierce), liquid and in-gel SOD activity assays, and sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS–PAGE). Metal purity of culture media were verified using inductively coupled plasma mass spectrometry (ICP–MS).

Protein expression and purification

Proteins were expressed in E.coli BL21(DE3) ΔsodAΔsodB in 1 l M9 medium supplemented with 200 μM Fe(NH4)2SO4 or 1 mM MnCl2 and induced with 1 mM isopropyl β-d-1-thiogalactopyranoside for 4 h at 37 °C with 220 r.p.m. orbital shaking in 2 l baffled flasks. Bacterial cells were lysed by sonication of washed cell pellets in 20 mM Tris pH 7.5, 1× cOmplete EDTA-free protease inhibitor (Roche), 100 μg ml−1 lysozyme, 10 μg ml−1 DNAse I, followed by centrifugation at 19,000 g at 4 °C. Cleared cell lysates were subjected to chromatographic separation using an ÄKTA purification system (Cytiva). Standard protocol for most characterized recombinant SOD proteins consisted of purification using anion exchange chromatography (HiTrap Q HP column, Cytiva) in 20 mM Tris pH 7.5 buffer with 0–1 M NaCl linear gradient elution, and subsequent size exclusion chromatography in 20 mM Tris pH 7.5, 150 mM NaCl buffer on a Superdex 200 16/600 (Cytiva) column. For purification of B.fragilis SodFM2 and its mutants, 20 mM Tris pH 8.5 was used. The E.coli SodFM1 (SodA) was collected in the flowthrough during anion exchange chromatography and was subsequently subjected to additional cation exchange chromatography (5 ml SP FF column (Cytiva) in 20 mM MES pH 5.5, 0–1 M NaCl linear gradient elution), where it was also collected in the flowthrough. Metal content of protein extraction buffers as well as metal loading of the purified protein preparations were verified using ICP–MS.

Analytical size exclusion chromatography using a Superdex 200 10/300 increase column (Cytiva) was performed to estimate the molecular weight of protein preparations in non-denaturating conditions based on elution volume. The column was calibrated using a mixture of molecular weight standards ranging from 1,350 to 670,000 Da (Bio-Rad) and blue dextran (2,000 kDa, Sigma) in 20 mM Tris pH 7.5, 150 mM NaCl buffer.

SOD activity assays and (a)CR calculations

SOD activity was assessed quantitatively using the riboflavin-nitro blue tetrazolium (NBT) liquid assay53 with minor modifications. The assay was performed in 96-well plates, in which 20 μl of a sample was mixed in each well with 180 μl assay solution (10 mM methionine, 1.4 μM riboflavin, 66 μM NBT and 10 μM EDTA in 50 mM potassium phosphate buffer, pH 7.8), immediately before 10 min incubation on a white light box followed by immediate measurement of absorbance at 560 nm. Serial twofold dilutions of tested samples were assayed to identify enzyme concentrations within the linear range of the assay that were close to 1 U in reference to a standard curve obtained using commercial bovine SOD standard (S5639, Sigma). To calculate specific SOD activities, the assay was performed in triplicate using samples diluted to the concentrations within the assay’s linear range. CRs were calculated as a ratio of SOD activities of the verified Fe-loaded protein preparation divided by that of the verified Mn-loaded protein preparation. aCRs were calculated the same way but using SOD activity values of soluble protein extract samples from the 24-well culture experiments after lysis and centrifugation. CR values >0.5 were considered as evidence of higher Fe preference, CR <0.5 as evidence of higher Mn preference and enzymes with 0.5 < CRs < 2 were considered cambialistic.

For qualitative SOD activity assessment, approximately 15 μg (determined by bicinchoninic acid assay) of soluble protein extracts were separated by 15% (wt/vol) acrylamide native polyacrylamide gel electrophoresis (PAGE) and assayed using an in-gel assay as described previously13. The in-gel peroxide inhibition assay was performed as described previously13. For protein staining, samples were resolved on 15% (wt/vol) acrylamide SDS-PAGE gels and stained with Quick Coomassie (Protein Ark). All gels were imaged using a ChemiDoc imaging system (Bio-Rad), using the same settings (including exposure time) for all gels compared within a single experiment. Band intensity was quantified with Image Lab 6.1 software (Bio-Rad) using a rectangle volume tool to outline activity bands with local background subtraction. aCRs from the in-gel assay were calculated using the measured band intensities following the formula: aCR = (intFe − intFeH2O2)/intMnH2O2, where intFeH2O2 is band intensity of the Fe-loaded form following peroxide inhibition, intFe refers to the band intensity of Fe samples in non-peroxide-treated gels, and intMnH2O2 is the band intensity of the Mn-loaded form following peroxide inhibition. Data available in Supplementary Data 8, 9 and 10.

Metal concentration verification using ICP–MS

Aliquots of protein extraction buffers or purified protein samples (20 μM) in 20 mM Tris pH 7.5, 150 mM NaCl were each diluted to 5 ml with 2% HNO3 for elemental analysis. Elemental composition of the resulting acid solutions were quantified using an iCAP RQ ICP–MS instrument (Thermo Fisher Scientific) according to the manufacturer’s specifications through comparison with matrix-matched elemental standard solutions, using In and Ir (10 μg l−1) as internal standards (University of Plymouth Enterprise Ltd).

Protein crystallography

Purified and concentrated protein samples were subjected to crystallization with commercially available matrix screens: PACT, JCSG+, Structure, Morpheus (Molecular Dimensions) and Index (Hampton Research) in 96-well Swissci MRC crystallization plates (Molecular Dimensions) using a Mosquito liquid handling robot (TTP Labtech), with the sitting drop vapour-diffusion method at 20 °C. CPR-His Wolfebacteria SodFM1 crystallized in 100 mM sodium acetate trihydrate pH 4.5 and 25% wt/vol polyethylene glycol 3350; CPR-His Wolfebacteria SodFM1 mutant crystallized in 200 mM sodium chloride, 100 mM sodium acetate pH 5.0 and 20% wt/vol polyethylene glycol 6000; B. fragilis SodFM2 crystallized in 200 mM sodium chloride, 100 mM Tris pH 8.5 and 25% wt/vol polyethylene glycol 3350; and B. fragilis SodFM2 mutant crystallized in 10 mM zinc chloride, 100 mM sodium acetate pH 5.0 and 20% wt/vol polyethylene glycol 6000. All crystals were cryo-protected with the addition of 20% polyethylene glycol 400 to the crystallization condition. Diffraction data were collected on the I03 and I24 beamlines at the Diamond Light Source Synchrotron (Didcot, UK) at 100 K. Data processing and refinement statistics are given in Supplementary Data 16. Data were indexed and integrated with the pipeline Xia2 (ref. 54) with either 3dii XDS55 or DIALS56 and scaled with Aimless57. Space group determination was confirmed with Pointless58. Five per cent of observations were randomly selected for the Rfree set. The phase problem was solved by molecular replacement using Phaser59. The 1UES was used as a search model in molecular replacement for B. fragilis SodFM2 structures, whereas 1IX9 and 1GV3 were used as search models in molecular replacement for CPR-His SodFM1 and its mutant, respectively. Initial model building was done with CCP4build task on CCP4cloud60. The model were refined with Refmac61 and manual model building was done with Coot62 (Supplementary Data 16). Models were validated with Molprobity63 and Coot. Structural representations were prepared with PyMOL (Schrödinger, LLC).

Structural comparison was performed for global secondary structure matching in Coot and for least square comparison (LSQ-kab) of metal-binding residues in CCP4i. To visualize and compare electron density between structures, the 2Fo-Fc map was displayed at 1σ level.

Bioinformatic protein family identification and characterization

Our previously described dataset of genomes sampled across the tree of life13 was expanded with additional Bacteroidales and Alphaproteobacteria genomes to improve sampling of the lineages identified as metal-preference switching hotspots. The final dataset contained 3,058 genomes including 2,613 bacteria, 281 archaea and 146 eukaryotes (full list in Supplementary Data 4). Members of the SOD protein families were identified using PFAM64 hidden Markov model (HMM) profiles (Sod_Fe_C, PF02777.21; Sod_Fe_N, PF00081.25; Sod_Cu, PF00080.23; SodNi, PF09055.14) as queries against a local database of 10,304,216 protein sequences encoded within all 3,058 analysed genomes in hmmsearch (-E 1e-5) profile search implemented in HMMER3.3 (ref. 65). Superoxide reductase (SOR) protein family members were identified with hmmersearch using PFAM64 profiles (Ferric_reduct, PF01794; and Desulfoferrodox, PF01880), as well as BlastP (e-value threshold of 0.005)45 searches using sequences of structurally characterized SORs (protein data bank (PDB): 1DO6, and 1DFX). Data available in Supplementary Data 17.

The identified 2,820 SodFM homologues (including 116 sequences with solved crystal structures available in the PDB) were aligned in MAFFT66 with a set gap opening threshold (–op 2), and trimmed with trimAL67 with a set gap threshold (−gt 0.1) to a final alignment length of 244 positions (Supplementary Data 5). The SodFM phylogeny (Supplementary Data 6) was generated with 1,000 ultrafast bootstrap68 replicates in IQ-TREE69, with the best fitting phylogenetic model WAG + R10 selected with ModelFinder70 according to Bayesian information criterion implemented in IQ-TREE. Available solved SodFM crystal structures (listed in Supplementary Data 1115) were downloaded from the PDB;71 aligned, manually inspected and displayed in PyMOL (Schrödinger, LLC.); protein 3D structural comparisons were performed using the ‘All against all’ function implemented in DaliLiteV5.1 on a local workstation72. Groups of correlated amino acid residues within SodFM alignments were identified using an amino acid correlation algorithm implemented in PFstats73.

Generation of the tree of life

HMM profiles were generated from a curated set of 21 single-copy orthologues shared between bacteria, archaea and eukaryotes74. Sequences of the orthologues were downloaded from the supplementary material of the previous study of the universally conserved orthologues74, aligned with MAFFT66, trimmed with trimAL67 (gappyout mode) and used for the generation of HMM profiles in HMMER3.365. The profiles were subsequently used in hmmsearch (-E 1e-5) profile search65 against the local database of 10,304,216 protein sequences encoded within 3,058 genomes sampled from across the three domains of life. Each of the identified orthologue groups were aligned with MAFFT66, trimmed with trimAL67 (gappyout mode) and used for the generation of phylogenies in IQ-TREE69 under LG + G4 model with 1,000 ultrafast bootstraps68. All alignments and phylogenies were manually inspected, eukaryotic organellar sequences were filtered out on the basis of the tree topologies and verified with BLASTP searches against the nr database, and seven widely distributed orthologous groups were selected for the final analysis. The final dataset contained the following eukaryotic (prokaryotic) orthologues: Ribosomal 40S S3 (Ribosomal 30S S3), Ribosomal 40S S15 (Ribosomal 30S S19), Ribosomal 60S L23 (Ribosomal 50S L14), Ribosomal 40S S11/14 (Ribosomal 30S S11), Ribosomal 40S S5/S7 (Ribosomal 30S S7), Ribosomal 40S S0/S2 (Ribosomal 30S S2) and RNA polymerase II subunit 3 (DNA-directed RNA polymerase subunit D). The trimmed alignments of the seven orthologues groups were concatenated with module Nexus of the Biopython package75 to a final alignment of 1,097 amino acid sites, and was used to generate the tree of life (Supplementary Data 7) with 1,000 ultrafast bootstrap68 replicates in IQ-TREE69 under LG + G4 model.

Ancestral sequence prediction and ancestral aCR state estimation

Ancestral sequence reconstruction (ASR) was used to infer the most likely residues at the identified metal-preference determination positions (XD-2 and H/QCterm or QNterm) for the key ancestral nodes of SodFM1, SodFM2, SodFM3, SodFM4 and SodFM5 LCAs.

ASR was performed with the maximum likelihood phylogeny of 2,820 SodFMs (Supplementary Data 6) and the alignment used for the generation of the phylogeny (Supplementary Data 5). The reconstruction was performed using IQ-TREE69 (-asr option) with the best fitting model used for the tree reconstruction (WAG + R10), and FastML76 with WAG model. Both approaches gave similar results with only a second decimal place variation in posterior probability value between the two methods. Marginal and joint reconstruction in FastML predicted the same residues at the analysed positions of interest.

The possible ancestral aCR values for the key nodes of SodFM1 and SodFM2 ancestors were estimated using an IQ-TREE69 maximum likelihood phylogeny of a representative sample of SodFMs with experimentally verified aCR values. The ancestral aCRs were estimated from the consensus phylogeny (of 1,000 ultrafast bootstrap68 replicates) with a maximum likelihood approach implemented in fastAnc function of phytools package77 and a Bayesian Markov chain Monte Carlo (MCMC) approach implemented in anc.Bayes77 from the same package. To account for the uncertainty of the phylogenetic reconstruction, additional Bayesian estimation was performed in BayesTraits V4.0.0 (continuous random walk model A and MCMC analysis)78,79 using 20,000 locally optimal trees (IQ-TREE -wt option)69.

Other bioinformatics methods

Mean sequence identities were calculated as an average of trimAL67 pairwise identity comparisons (-sident) on MAFFT66 alignments trimmed with trimAL67 (gappyout mode). Sequence logos were generated with WebLogo380 and displayed so that the total height of each residue corresponds to the information content of the residue in the given position, and the relative size of multiple letters within each position corresponds to their relative frequency at this position. The phylogenetic trees were visualized and annotated with the protein presence/absence in the analysed genomes or with co-evolving amino acid residues using GraPhlAn81. Multiple sequence alignments were visualized and annotated using Jalview82. Phylogenetic trees were visualized and annotated using FigTree (http://tree.bio.ed.ac.uk/software/figtree/), Archaeopteryx83.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.