Journal list menu

Volume 584, Issue 2 p. 252-264
Review
Free Access

Deciphering synonymous codons in the three domains of life: Co-evolution with specific tRNA modification enzymes

Henri Grosjean

Corresponding Author

Henri Grosjean

Université Paris-Sud, CNRS, UMR8621, Institut de Génétique et de Microbiologie, Orsay F-91405, France

Corresponding author. Fax: +33 (0) 169154629.Search for more papers by this author
Valérie de Crécy-Lagard

Valérie de Crécy-Lagard

Department of Microbiology and Cell Science, University of Florida, P.O. Box 110700, Gainesville, FL 32611-0700, USA

Fax: +1 352 392 5922. Search for more papers by this author
Christian Marck

Christian Marck

Institut de Biologie et de Technologies de Saclay (iBiTec-S) Bât 144, CEA/Saclay, F-91191 Gif-sur-Yvette Cedex, France

Fax: +33 (0) 169084712. Search for more papers by this author
First published: 19 November 2009
Citations: 218
Abbreviations and symbols: COG, Cluster of Orthologous Groups; HGT, horizontal gene transfer; N stands for any of the four bases; Number 34 after a letter, like in C34 AU means cytidine in position 34 of anticodon CAU in the tRNA sequence, AUG3 means guanosine in position 3 of codon AUG in mRNA. Amino acids are indicated using the standard three letters code. The common names of modified nucleosides indicated in text by their conventional symbols can be found at http://www.modomics.genesilico.pl.

Abstract

The strategies organisms use to decode synonymous codons in cytosolic protein synthesis are not uniform. The complete isoacceptor tRNA repertoire and the type of modified nucleoside found at the wobble position 34 of their anticodons were analyzed in all kingdoms of life. This led to the identification of four main decoding strategies that are diversely used in Bacteria, Archaea and Eukarya. Many of the modern tRNA modification enzymes acting at position 34 of tRNAs are present only in specific domains and obviously have arisen late during evolution. In an evolutionary fine-tuning process, these enzymes must have played an essential role in the progressive introduction of new amino acids, and in the refinement and standardization of the canonical nuclear genetic code observed in all extant organisms (functional convergent evolutionary hypothesis).

1 Introduction

During protein synthesis, the successive mRNA codons direct the synthesis of the polypeptide. The correspondence between each of the 64 (43) possible codon-triplets and the 20 amino acids of cellular proteome (22 in some organisms) is called the genetic code. Of these 64 codons, 61 are usually sense codons. Most of these are organized in so-called degenerated codon family boxes where synonymous triplets code for the same amino acid. Only a few codons are unassigned, usually UAA, UAG and UGA that are used as termination codons (Fig. 1 A).

figure image
Part A – Standard genetic code. The Standard Genetic Code (so called “Universal”) used by most organisms of the three kingdoms to encode the proteins of their genome is presented in the traditional way (16 boxes including each four codon/anticodon combinations using the same two first bases of the codon). The first, second and third bases of the codon are shown in blue, red and cyan colors. Light blue background denotes the “4-codon” boxes (all four codon code for the same amino acid), all located in the left part except the Arg CGN and Gly boxes. They all involved at least one G:C base pairing (indicated by dashed axes in A). Green background denotes the case where an amino acid is encoded by a single NNG3 codon corresponding to a single C34NN anticodon, in this case this tRNA is compulsory since it cannot be spared (replaced by its cognate U34-containing tRNA). Yellow background indicates the three stop codons. Bold boxes and italics show the special decoding modes arising in the Leu CUN, Ile/Met, Cys/Trp and Arg CGN boxes (discussed in the text and illustrated in Fig. 3B). Part B – The anticodon/codon appariement. Base 36 and 35 of the anticodon are paired to bases 1 (blue) and 2 (red) of the codon through Watson–Crick (WC) base pairing, respectively. The first base of the anticodon (cyan), the “wobble” base (position 34, often modified), is paired to position 3 of the codon either with Watson–Crick base pairing or U:G, G:U or I:C pairing. The wobble base 34 and the “dangling” base 37 are often modified while bases 32, 38 and 39 are occasionally modified. Part C – Phylogenetic distribution of modified nucleosides at wobble position 34 of anticodon. All symbols are those conventionally used in scientific literature. The chemical adducts are indicated in Fig. 2. The corresponding full scientific names and chemical characteristics can be found in [136] and at: http://www.modomics.genesilico.pl, while information on corresponding enzymes can be found at: http://www.theseed.uchicago.edu/FIG/index.cgi/. In red are the modified nucleosides found in N34 of tRNAs mostly involved in decoding two synonymous codons (duet decoding boxes), in blue are those found in N34 of tRNAs mostly involved in decoding four synonymous codons (quartet decoding boxes), and in black are those found in N34 of tRNAs involved in decoding the single codon AUA for Ile. For details see text. When a symbol like ‘m’ for methyl, or ‘cm’ for carboxymethyl, or ‘s2’ for thio is in braquet, it means that depending on the tRNA, the N34 compound may exist with or without this adduct on the parent nucleoside. Modified nucleosides present in tRNAs of an eukaryotic organelle (mitochondria and chloroplastes) are not mixed with those found in the eukaryal cytoplasmic tRNAs, as they have a bacterial origin (endosymbiosis). Only a few tRNAs from Archaea have been sequenced or analyzed so far for their modified nucleosides content, therefore information concerning this domain is incomplete. U? and C? are modified uridine(s) and cytosine(s) of yet unknown structures (see also Ref. [64]).

Central to the decoding process is the interaction between a specific codon triplet (bases in positions 1, 2 and 3) 3 in mRNA and the three anticodon bases of cognate aminoacyl-tRNA (bases in positions 36, 35 and 34, respectively, all three belonging to a seven nucleotide anticodon loop – Fig. 1B). In this codon:anticodon recognition process, the 1st and 2nd base of the codon and the 3rd and 2nd base of the anticodon interact following the Watson–Crick pairing rules (A:U, U:A, G:C, C:G). In contrast, the interaction between the 3rd base of the codon and the 1st base of the anticodon (position 34, the so-called wobble base) is more ‘relaxed’, and non-standard base pairings (as initially proposed by Crick [1]), or base oppositions are permitted (reviewed in [2, 3]). As a result, a given tRNA may read more than one synonymous codon and therefore less than 61 anticodons are needed to decode the 61 sense codons in mRNAs. However, the type and number of spared tRNAs harboring a given anticodon vary with the synonymous decoding box and the organism considered (see examples in [4]). In addition, the modified nucleosides found at the ‘wobble’ base-34 in orthologous tRNA species are often different in Bacteria, Eukarya and Archaea (1, 2 , Refs. [5-7]). How extant organisms accurately decode the same 61 sense codons into proteins containing the same 20 amino acids clearly differs according to their evolutionary history.

figure image
Summary of all types of chemical adducts found at the wobble position-34 in naturally occurring RNAs (cytoplasmic and mitochondrial). In each boxes are the various types of chemical groups that can be enzymatically attached to selected atoms of a pyrimidine ring (left part) or of a purine ring (right part) at the wobble position during maturation of RNA precursors in Bacteria, Eukarya or Archaea. For G-transglycosylation of deazaguanine derivatives (with chemical group attached to C7 instead of N7 as in guanine) and further hexosylations of the G-derivatives, see [87] and Supplemental Fig. SS5. All other information are as in Fig. 1C.

2 Methodology and goals

In this review, we examine the anticodon usage in cytoplasmic tRNAs in different organisms belonging to the three biological domains. The information on the repertoire of cytoplasmic tRNAs was derived from mining nuclear genes coding for isoacceptor tRNAs harboring distinct anticodons as described in previous analyses (50 genomes from Bacteria, Archaea and Eukarya [4], 12 hemiascomycetes genomes + Schizosaccharomyce pombe, see in Supplementary data SS3 and Ref. [8], and 17 genomes of Mollicutes [9]). The generality of our conclusions was verified by consulting two additional tDNA databases: the Genomic tRNA Database GtRNA-DB [10] that contains more than 800 genomes (40 Eukarya, 54 Archaea and 749 Bacteria) and tRNADB-CE [11], which contains 737 complete and 616 draft genomes of Bacteria and Archaea. These databases include unconventional tRNA genes recently discovered: split tRNA genes (the two halves of the gene are separated in the genome) [12], circularly permuted tRNA genes (the 3′ half of the gene lies upstream of the 5′ half in the genome) [13] and archaeal tRNA genes containing up to three introns [14]. The presence of modified nucleotides at the wobble position 34 of anticodon was also examined. This information was compiled from different sources ([15, 17-19], http://www.modomics.genesilico.pl). Unfortunately, the identity of modified nucleotides for complete tRNA sets is known only in a few model organisms (mainly Escherichia coli, Saccharomyces cerevisiae, Haloferax volcanii and Mycoplasma capricolum). However, once the sequence of a gene coding for a given tRNA modification enzyme has been identified (and experimentally confirmed) in one organism, the presence of a given modification in a specific tRNA can be inferred from the presence of the corresponding modification enzyme sequence in a given genome ([20, 21], see also SEED database at http://www.theseed.uchicago.edu/FIG/index.cgi). These sequence-based predictions are quite accurate when genomes of closely related organisms are analyzed but are more perilous when comparing evolutionary distant organisms (see for examples [21-24]).

The focus of this review is on mRNA decoding on the cytosolic ribosomes. Indeed, notable deviations of the ‘standard’ nuclear genetic code have been recorded in organelle mRNAs translation (for details see for examples [25-27]). Importantly, as the majority of organisms analyzed are self-living, with only a few having a parasitic lifestyle (Mollicutes, Encephalitozoon cuniculi for examples), none appear to import extragenetic tRNAs gene products, except if they are infected with specific viruses or transformed with plasmids carrying tRNA genes (a situation not considered here).

2.1 The special case of the two initiator/elongator tRNAsMet

Decoding by cytoplasmic tRNAMet is special case that will be discussed first. Nuclear genomes of all organisms always contain genes coding for two different types of tRNA harboring C34AU anticodon (“eMet” and “iMet” in the Ile/Met split decoding box – 1, 3 B and Supplementary Fig. SS1, part A). One gene (often in multicopy) encodes the regular elongator tRNAeMet used to incorporate internal methionines, the other encodes the initiator tRNAiMet used to initiate the synthesis of the polypeptidic chain. Although both cytoplasmic tRNAs are charged by the same cognate Met-tRNA synthetase (MetRS), their sequences are markedly different. Indeed, the specific role of tRNAiMet is enlightened by the conservation of a number of sequence features that make it the most widely conserved tDNA throughout the three kingdoms of life (for details refer to Fig. 4 in [4]). Therefore, the actual number of different codons the tRNA set of any cell has to read in cytoplasmic mRNAs is not 61 (=64 − 3 stop codons) but 61 + 1, as the two iMet and eMet codons are read by two different tRNAs. Thus here, the same codon AUG3 is used for two different purposes, yet for the same amino acid.

figure image
Cytoplasmic decoding strategies in Eukarya, Archaea and Bacteria. Light blue, green and yellow background colors have the same meaning as in Fig. 1A. Missing (spared) tRNAs are symbolized by small colored rectangles (blue, red, yellow and green for A34, G34, U34 and C3 spared tRNAs, respectively). The same conventional colored symbols are used in Supplemental Figs. SS1–SS4. Part A – The four major tRNA sparing strategies. This part illustrates the decoding (which tRNA reads which codon in a given box – 2 first bases of the anticodon/codon appariement given) of the “standard boxes” of Genetic Code (see Fig. 1A): that is all boxes except Ile/Met and Cys/Stop/Trp and other changes related to genetic code alterations. Arrows symbolize the pairing of the first base of the anticodon (position 34) with the third base of the codon (position 3). An horizontal arrow denotes a Watson–Crick base-pairing while an oblique arrows denotes a wobble base pairing (e.g. G34:U3). The wobble base-pair involving A34 (in fact I34 in the mature tRNA) and A3-ending codons are indicated by dashed arrows to indicate this base-pairing is inefficient during translation (details in text). Note that Eukarya differ from Archaea and Bacteria in the way they decode the 4-codon boxes (in blue background in Fig. 1A). Indications #1, #2, #3 and #4 refer to the four major sparing strategies (see corresponding sections in text). Sparing strategy #4 applies only to Arg-CUN3 box (in most bacteria and hemiascomycete yeasts) and to the special Leu/Ser decoding boxes in hemiascomycete yeasts of the Candida clade. Part B – Special decoding boxes. While the decoding of the Ile/Met-AUN3 box follows a regular decoding pattern in Eukarya, Archaea and Bacteria use a third type of (C34AU) anticodon tRNA to read the AUA3 Ile-codon. In Bacteria, a modification of C34 performed by the modification enzyme TilS switches both the anticodon/codon recognition and the tRNA-synthetase recognition of this tRNA (shown as a red arrow, see text for details). A similar mechanism is hypothesized in Archaea. In the Cys/Stop/Trp UGN3-box, (a) stands for Mycoplasma; in the Tyr/Gln-box, (b) indicates ciliate protozoan (or Euplotes) and (c) Tetrahymena.

3 Major decoding strategies of synonymous codons: universal and idiosyncratic features

The main strategies used by contemporaneous organisms to read the 61 + 1 sense codons of cytoplasmic mRNAs are summarized in Fig. 3A. The special decoding modes used in certain organisms for decoding Ile/Met (codons AUN3), Cys/Stop/Trp (UGN3), Tyr/Stop > Gln (UAN3) and Arg (CGN3), as well as one remarkable exception in the CUN3 decoding box in certain yeasts of the Candida clade are shown in Fig. 3A and B. These rules are deduced from comparison of the complete set of nuclear tRNA genes corresponding to tRNA isoacceptors harboring a distinct anticodon in genomes of more than ∼500 organisms spanning Bacteria (abbreviated in ‘B’ in Fig. 3), Eukarya (abbreviated in ‘E’) and Archaea (abbreviated in ‘A’). Specific examples are given in Fig. SS1, parts A + B (see also Refs. [4, 8, 9]).

3.1 Sparing strategy #1: depletion of A34-or-G34 containing tRNAs

Without exceptions, a tRNA harboring an anticodon A34NN never co-exists with another tRNA harboring an anticodon G34NN in any of the 16 mixed and unmixed amino acid family boxes. In the cases of 2-synonymous codon family boxes (Fig. 3A and Fig. SS1, parts A + B) corresponding to Phe, Tyr, His, Asn, Asp, Cys and Ser-AGY3 (Y stands for pyrimidine, U or C), a G34-containing tRNAs read both codons NNC3 and NNU3 with a wobble G34:U3 base pairing in the latter case. In the cases of 4-synonymous codon family boxes (Fig. 3A and Fig. SS1, part A) corresponding to Leu-CUN3, Val, Ser-UCN3, Pro, Thr and Ala, a G34-containing tRNAs is used to decode pyrimidine-ending codons in Bacteria and Archaea, while in Eukarya, a A34-containing tRNAs is systematically used instead. In this latter case, A34 in precursor tRNAs is enzymatically deaminated into I34 (I stands for inosine, see below) in the mature and functional tRNAs, thus allowing a wobble I34:C3, I34:U3 base pairings and exceptionally I34:A3 base pairing during translation [28]. In Bacteria, A34 is generally found in tRNAArg (Figs. SS1, part B and SS2). It is also enzymatically deaminated into I34 during tRNA maturation. A34 occurs in tRNAs other than tRNAArg only in a few eubacteria (as in tRNAThr of M. capricolum or tRNALeu of M. synovia). In these cases, it is not deaminated and can decode all four synonymous codons of a family box ([29], reviewed in [30] and see below). In Archaea, including tRNAArg, A34 is never found. Only G34-containing tRNAs decode pyrimidine-ending codons in all synonymous decoding boxes in this domain of life (Fig. SS1, parts A + B).

The same A34-or-G34 sparing strategy applies to the Ile decoding box (AUU3/C3, Fig. 3B). A34-containing tRNAIle is always used in Eukarya, while a G34-containing tRNAIle is always used in Bacteria and Archaea (Fig. SS1, part A). Only the Gly pyrimidine-ending codons (GGU3/GGC3) are always read by a G34-containing tRNAGly in all three kingdoms (Fig. SS1, part B). The few exceptions are the Arg (CGN3) and Leu (CUN3) decoding boxes in eubacteria and hemiascomycetes, respectively (see below). Lastly, the avoidance of A34-containing tRNA in all 2-synonymous codon family boxes, over the systematic usage of G34-containing tRNAs results from the impossibility of a wobble base A34 (or I34) to avoid cross-box miscoding of purine-ending codons [28, 31, 32].

Thus, if no other tRNA sparing strategy is used (see below), the minimal number of tRNAs with distinct anticodons an organism requires to decode the 61 + 1 sense codons of their mRNAs and compatible with a proteome using the 20 canonical amino acids (the tRNA repertoire) is 62 − 16 = 46 tRNAs. This is the case in almost all Archaea (as Pyrococcus abyssi and Sulfolobus solfataricus), some Eukarya (as Homo sapiens and Arabidopsis thaliana) and rarely in Bacteria (for example Thermotoga maritima, see also Supplementary Fig. SS1).

3.2 Sparing strategy #2: additional depletion of C34-containing tRNAs

tRNAs harboring a C34NN anticodon are very restrictive for reading codons ending only with G3. Therefore, sparing a C34-containing tRNA will require that another tRNA isoacceptor coding for the same amino acid reads the NNG3 codon in addition to its own. The only possibilities are tRNAs that harbor U34 in their anticodons. However, while a G34-containing tRNA can easily read an U3-ending codon by wobbling (see above), U34-containing tRNAs read NNG3 codons less efficiently. The efficacy of U34:G3 wobbling will strongly depend on the presence of chemical adducts on the C5 atom of U34 mediated by specific enzymes ([33], reviewed in [34, 35]). Because these enzymatic modifications of U34 in naturally occurring tRNAs differ in the three biological kingdoms (Fig. 1C and below), the C34-sparing strategy is not equally distributed. It is rarely used in Eukarya and Archaea, while widely used in Bacteria (Fig. 3A, strategy #2; details are in Supplementary Fig. SS1, parts A + B).

There are three cases (out of 42 = 16 possibilities) for which the C34-sparing rule cannot apply. The initiator and elongator tRNA-Met (anticodon C34AU) read exclusively the Met-AUG3 codon. The tRNATrp (anticodon C34CA) reads the unique Trp-UGG3 codon. Finally, the stop codon UAG3 is not recognized by any tRNA but is read by a specific termination factor (indicated by ‘+RF’ in Fig. 3B). Thus, in organisms that are using both ‘A34-or-G34’ and ‘C34-’ sparing strategies, the tRNA repertoire is now 46 − 13 = 33 tRNAs. Depletion of C34-containing tRNA is used in many eubacteria (as Bacillus halodurans), in a few Archaea (as Methanopyrus kandleri) and exceptionally in Eukarya (as S. cerevisiae).

3.3 Sparing strategy #3: total depletion of both tRNA harboring A34-or-G34 and C34

Here one single U34-containing tRNA reads all four synonymous codons, with U34 not posttranscriptionally modified (Fig. 3A, strategy #3). This type of sparing strategy is also known as the ‘two-out-of-three decoding’ [36], ‘4-way wobbling’ [37] or ‘superwobbling’ [38] because the third base of the codon is meaningless in the decoding process. The ability of such tRNAs harboring unmodified U34 to read each of the four codons within a 4-fold degenerated codon boxes, depends on the presence of at least one G or C in positions 35 and/or 36 of anticodon (corresponding to dashed axes in Fig. 1A), and also on other elements of the anticodon loop, such as (among others) a cytosine in position 32 (C32) of the anticodon loop ([39]). This ‘minimalist’ decoding system is used only in Bacteria, never in Archaea or in Eukarya. It is used mainly for reading codons Val-GUN3, Pro-CCN3 and Ala-GCN3, less frequently for Leu-CUN3, Ser-UCN3, Thr-ACN3 and exceptionally for Gly-GGN3 (Fig. 3A, box #3 and Fig. SS1, parts A + B). Had this third strategy been used in all the 8 theoretical cases (blue background in Fig. 1A), then the tRNA repertoire could be reduced to 33 − 8 = 25 tRNAs. However, the actual lowest number of tRNAs found among all the organisms examined is 28, as in several Mollicutes [9, 40]. These organisms have adapted to specific hosts by massive genome reduction.

Only mitochondrial (and chloroplast) genomes have been shown to encode a lower number of decoder tRNAs with only 22 distinct anticodons [37, 41]. However in mitochondria, the unique tRNAMet has a dual function in initiation and elongation, thus sparing one tRNA. Also, the AUA3 codon is often used as a Met-codon instead of Ile-codon as in cytoplasmic mRNAs, again sparing one additional tRNAIle. Finally, codons AGA3/G3 can be unassigned (=not used) and therefore does not require a cognate tRNA. In addition, two nuclear encoded tRNAs specific for Lys and Gln, respectively, harboring distinct anticodons from those found in the mitochondria, were shown to be imported from the cytoplasm into S. cerevisiae mitochondria. In extreme cases (such as in trypanosomatids), the whole set of mitochondrial tRNAs is encoded by the nuclear genome and imported into the organelle [42, 43].

3.4 Sparing strategy #4: simultaneous depletion of G34- and U34-containing tRNAs

Except for the stop codon UAA3 for which cognate U34-containing tRNAs is missing, the only tRNAs that contemporaneous organisms (and organelles) are reluctant to eliminate are U34-containing isoacceptor tRNAs. The main reason is that a G34- or a C34-containing tRNA cannot read A-ending codons efficiently enough. The only case where the G34- and U34-sparing strategy #4 has nevertheless been used concerns the quadruplet decoding box for Arg-CGN3 in most eubacteria and the same quadruplet Arg-CGN3 plus the Leu-CUN3 decoding boxes in few Hemiascomycetous yeasts (indicated by a red and yellow rectangles in Fig. 3A, strategy #4, see also Figs. SS2 and SS3, parts A + B). In these organisms, Arg-codons (CGU3/C3, CGA3), or Leu-codons (CUU3/C3 and CUA3) are read by a single tRNA harboring an anticodon A34CG and A34AG where A34 is modified to inosine (I34) in the mature tRNA species. In these cases decoding of A3-ending codon NNA3 depends on a I34:A3 wobble base pairing [1, 28]. For E. coli tRNAArg it has been experimentally shown that this type of wobbling is very inefficient (dashed arrow in Fig. 3A, decoding strategy #4 [32]). The Arg-codon CGA3 is indeed rarely used in all eubacteria lacking U34-containing tRNAArg, attesting that living cells avoid using this particular codon. Nevertheless, reading Arg-codon CGA (sparing strategy #4) is operational in vivo as the gene encoding the enzyme catalyzing the formation of I34 in tRNAArg is essential in E. coli [44]. The fourth codons CGG3 of the Arg-quadruplet is read by a second tRNAArg harboring an anticodon C34CG (green background in Fig. 3A and Fig. SS2). In Mollicutes this codon CGG3 is not used in mRNAs (unassigned codon [45]) and consequently the cognate C34-harboring tRNAArg is simply absent from the corresponding genomes (indicated by C3 and G3 in brackets in Fig. 3A, see also in [9]). In several Candida species, the codon CUA3 is read by a tRNALeu harboring an anticodon I34AG (see below). In all other organisms, the four Arg-codons CGN3 are usually read by a combination of I34-or-G34-containing tRNAArg, a U34-containing tRNAArg (where U34 is modified to mnm5U in eubacteria and mcm5U in eukaryotes; see below) and also by a third C34-containing tRNAArg (sparing strategies #2 and #1, respectively). This rule applies to all 4-degenerated codon families other than Arg (and Leu in some Candida).

The enzymes catalyzing the deamination of A34 in precursor tRNAArg are the homodimeric TadA in eubacteria and the heteromeric Tad2/Tad3 in eukaryotes (Supplementary Fig. SS2). These belong to a large superfamily of A-to-I (and C-to-U) RNA deaminases/editing enzymes [46, 47]. However, despite their common phylogenetic origin, the bacterial enzyme (as tested with the E. coli enzyme) deaminates exclusively A34 in pre-tRNAArg harboring anticodon A34CG, while eukaryal homologues (as tested with the S. cerevisiae enzyme) have a more relaxed specificity and deaminate all types of A34-containing tRNAs [48, 49]. Accordingly, genes coding for tad2/tad3 are present in genomes of all Eukarya, while genes coding for tadA are present only in the eubacteria that code for a A34-containing precursor tRNAArg and lack U34-containing tRNAArg in their genome. Consistent with the total absence of A34-containing tRNAs in genomes of Archaea, no archaeal homologues of tad2/tad3/tadA genes are found, demonstrating again the subtle interplay between codon usage, tRNA population and the repertoire of tRNA modification enzymes, a trilogy of elements that have shaped the genetic code during evolution.

3.5 The simultaneous depletion of G34- and U34-sparing strategy of the Candida clade in the Leu/Ser box: an exception

The genome sequences of some Candida species (including C. albicans) and Debaryomyces hansenii have revealed one remarkable exception to the ‘almost’ universal nuclear decoding rules explained above. Here the CUG3 codon of the usual 4-codon family box (Leu-CUN3) is assigned as Ser instead of Leu (green background in Fig. 3A, and Supplementary Fig. SS3, parts C + D), thus raising the total number of Ser-codons used to seven and reducing the usual Leu-codons to five (reviewed in [50] and by Manuel Santos in this volume). The resulting split Leu/Ser decoding pattern is similar to the bacterial Arg unsplit CGN3 decoding box (G34 + U34-sparing strategy # 4, Fig. 3A), except that codons ending with G3 now code for a different amino acid (Ser) compared to the three other codons of the same decoding box (Leu). Remarkably, C. albicans tRNALeu (I34AG) possesses an unusual G-residue located in position 32 of anticodon loop while a pyrimidine U32 or C32 is generally present in all other tRNAs examined. The identity of this nucleotide in position 32 (as well as in position 38, opposite N32 in the anticodon loop, Fig. 1B) was demonstrated to influence the discriminatory ability of a given tRNA to read synonymous codons varying by their third base [39, 51]. Thus, the presence of G32 (of different chemical structure than pyrimidine) probably helps C. albicans tRNALeu (anticodon I34AG) to read the rare Leu-codon CUA3, while avoiding ‘cross-box’ decoding of Ser-codon CUG3. This single Ser-codon CUG3 is read by another unusual C34-containing tRNASer harboring a G33, instead of the highly conserved U33, 3′-adjacent to wobble residue C34. In this case, the presence of the unusual G33 in C. albicans tRNASer was shown to decrease the decoding efficiency for cognate codon CUG3, thereby minimizing the negative impact of ‘cross-box’ miscoding of near-cognate Leu-codon CUA3 ([52], see also the ‘Missing triplet hypothesis’ [53]).

In other hemiascomycetous yeasts (e.g. S. cerevisiae and Candida glabrata), all CUN3 codons correspond to the standard Leu. However, to decode the Leu-pyrimidine-ending codons, a G34-containing tRNALeu is used, as in bacteria, instead of the usual ‘eukaryotic’-type of A34(I34)-containing tRNALeu (Fig. 3A, Fig. SS1, part A and SS3, parts C + D). Depending on the Hemiascomycetes considered, the other two Leu purine-ending codons are read either by an unique tRNALeu harboring a U34AG anticodon, in which U34 is not modified (at least in S. cerevisiae, [54]), or by two tRNALeu, one with U34AG and the other with C34AG anticodon (C34 being modified to m5C by Trm4p [55], more details are given in [8]). Inspection of their nucleotide sequences reveals that the G34-containing tRNALeu harbors an unusual C33, 3′ adjacent to anticodon, while the two other tRNALeu (U34AG and C34AG) harbor the normal anticodon loop ‘kinking’ U33. The observations above underscore the importance of nucleotide sequence, not only of the three bases of the anticodon, but also of nucleotide(s) within the anticodon loop for decoding split or unsplit CUN3 codon family box (‘Extended anticodon theory’ [56-58]).

4 Special decoding boxes: extreme dependence on wobble base modification enzymes

4.1 Ile/Met (AUN3) decoding box

As stated above, translation of Ile-AUU3/AUC3 codons in cytoplasmic mRNAs is performed by a G34-containing tRNAIle in both eubacteria and archaea (as well as in organelles) and by an I34-containing tRNAIle in all eukaryotes (Fig. 3B and Fig. SS4). Reading selectively the third (but rare) AUA3 codon as Ile in mRNAs without mispairing with the Met-codon AUG3 is trickier. Depending on the organism (or organelle), different strategies have emerged during evolution. In all eukaryotes, a second tRNAIle harboring a wobble U34 is used. In S. cerevisiae, this tRNAIle harbors a peculiar anticodon in which the wobble U34 and third anticodon base U36 are each enzymatically modified into pseudouridine (Fig. 4 A, [59]). The gene coding for the enzyme catalyzing these two pseudouridylation reactions (pseudouridine synthase-1, abbreviated in Pus1p [60]) is present in the majority of eukaryotic genomes analyzed (Cluster of Ortholog Groups COG0101).

figure image
(A–C) Enzymes and proteins involved in specific modification of nucleoside-34 in tRNAs. Symbolism and color code within each decoding boxes are the same as in 1, 3. Outside the boxes are the anticodon sequences with indication of the modified base-34 and/or 35. Along the corresponding arrows are indicated the conventional acronyms of enzymes implicated in the nucleoside modifications. The color-code for the modified nucleosides and names of enzymes is ‘red’ for bacterial and organelle enzymes, ‘blue’ for archaeal enzymes and ‘green’ for cytoplasmic prokaryotic enzymes. In yellow background are information concerning termination (release) factors (‘Mut’ means mutant). All other details are in the text.

Except in a few cases (as in Mycoplasma mobile [9, 61]), a tRNAIle harboring an anticodon U34AU is never found in eubacteria and archaea (yellow rectangle in 3, 4A, U34-sparing strategy #4). Instead, a tRNAIle harboring a C34AU anticodon is used (Fig. SS4, [4, 9, 61]). In Bacteria, the wobble C34 of the mature tRNAIle (anticodon urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0001) is modified to lysidine (C corresponds to k2C [62]). The enzyme (here designated TilS in Fig. 4A, [63]) catalyzing the formation of k2C34 is present, without exception, in all eubacteria lacking the U34-containing tRNAIle. This enzyme belongs to cluster of orthologs group COG0037. The chemical structure of the modified urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0002 in archaeal tRNAIle is unknown (but definitively distinct of eubacterial k2C, see in [64] and several abstracts of the 23th tRNA workshop, Aveiro, Portugal, January 2010). The gene coding for such an archaeal C34-modification enzyme of tRNAIle urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0003 has still to be identified (indicated by ‘Enz?’ in Fig. 4A). In both eubacteria and archaea, these C34-modifications have a dual function: they switch the amino acid identity of the tRNA from Met-to-Ile and allow exclusive decoding of the Ile-codon AUA3 (symbolized by a red arrow in 3, 4A).

For the single Met-codon (AUG3), a tRNAeMet harboring an anticodon C34AU is always used. However again, depending on the organism considered, the wobble C34 is enzymatically modified into different C34-derivatives. In a subset of Bacteria, such as Enterobacteriales or Vibrio, a tRNA acetyltransferase (designated TmcA, COG1444, Fig. 4A) catalyzes the formation of N4-acetylcytidine (ac4C [65]), while in cytoplasmic tRNAeMet of most Eukarya (as in mammals, but not in S. cerevisiae and probably other lower eukaryotes), C34 is modified into 2′-O-methylribose cytidine (Cm34) by a tRNA methyltransferase (Trm7p, Fig. 4A) that also acts on tRNAs specific for Phe, Leu and Trp [66]. In Archaea (as P. abyssi and Holoferax volcanii), a Cm34 is also found in tRNAeMet. However, here the enzymatic methylation of the ribose of C34 is catalyzed by a different enzymatic system, the box C/D ribonucleoprotein complex (box C/D RNP), including the fibrillarin enzyme (aFib [67], Fig. 4A). Both types of C34-modifications (ac4C and Cm) guarantee precise decoding of the Met-AUG codon by strengthening C:G base-pair interaction [68, 69] and concurrently prevent misreading of the near cognate Ile-AUA codon (a case of functional convergent evolution). Removal of the acetyl group from the ac4C34 in E. coli tRNAeMet allows misreading the Ile-codon AUA3 in vitro [70].

In mitochondria of vertebrates and insects, where AUA3 and AUG3 codons are both read as Met, the C34 of the mitochondrial tRNAMet is modified into 5-formylcytidine (f5C34 [71]). The gene coding for the mitochondrial formylating enzyme is still not known (designated by ‘Enz?’ in Fig. 4A). Here, the presence of a formyl group in C34 allows the mitochondrial f5C34-containing tRNAMet to decode both codons AUA3 and AUG3 ([71] and references therein). Altogether, these examples illustrate well the concept that partition of the Ile/Met into a 3/1 decoding box as in extant self-living organisms, or into a 2/2 decoding box as in many organelles, clearly depends on ad hoc U34-or-C34 enzymatic modification apparatus that obviously have emerged independently in different organisms, after the split into the three major biological kingdoms.

4.2 Cys/Stop/Trp (UGN3) decoding box

Like Met-codon AUG3, Trp-codon UGG3 has to be translated accurately by one single tRNATrp harboring the cognate anticodon C34CA. Like for Ile/Met, discrimination with the closely related codon UGA3 can be problematic. However, in this case, UGA3 is not used as sense codon but as a termination signal for polypeptide synthesis. All living organisms using UGA as stop codon, lack the tRNA harboring anticodon U34CA (Fig. 3B, except in special UGA-suppressor strains). In its place, a termination (or release) factor recognizing UGA3 is used (called RF2 in eubacteria, eRF1 in eukaryotes and aRF1 in archaea, reviewed in [72, 73]). To allow faithful translation of the single Trp-codon UGG3 and to limit accidental readthrough of near cognate stop codon UGA3 (leakiness or special recoding events), the wobble C34 of tRNATrp is usually modified to Cm34. In eubacteria, the 2′-O-methylation of ribose in C34 is catalyzed by a ‘spout’-fold methyltransferase (Fig. 4B, YibK in E. coli and MCAP 0364 in M. capricolum [9]) belonging to COG0219. According to the recently proposed, uniform nomenclature for all RNA modification enzymes (see in http://www.modomics.genesilico.pl), this YibK enzyme is designated TrMet(Cm34). In eukaryotes, formation of the same Cm34 in tRNATrp is catalyzed by Trm7p [66], while in archaea it is catalyzed by fibrillarin (aFib, within the box C/D snRNP complex [74], Fig. 4B), two ‘Rossmann’-fold-type of methyltransferases we mentioned already above. These represent other cases of functional conversion evolution of enzymes in relation to the decoding of the single Trp-codon UGG3 in organisms of the three domains of life.

Exceptions to the single codon Trp-decoding rules are found in most Mollicutes and some ciliate protozoan of the clades Euplotes, Colpoda and Heterotrichea. Here the UGA3 stop codon has been reassigned to Trp and is now read by a cognate tRNATrp harboring an anticodon urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0004 (where U34 is doubly modified on the base and on the ribose, see below). The usual tRNATrp (Cm34CA) is still used for decoding Trp-UGG3 (3, 4B). This reassignment of codon UGA from Stop-to-Trp involves two important adjustments of the translation machinery. First, in Mollicutes, the RF2-termination factor has been lost ([75], reviewed in [76]), whereas in Euplotes and Blepharisma, specific mutations in the N-terminal domain-1 of eRF1 (indicated by eRF1 in Fig. 4B) have occurred, so that the UGA3 codon is no longer recognized while preserving recognition of the other stop codons UAA/UAG [77-80]. Second, a duplication of the gene coding for tRNATrp (C34CA), followed by mutations (including a C34-to-U34 change) and subsequent base modification in the pre-tRNATrp of the newly generated wobble urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0005, lead to the gain of UGA decoding. In M. capricolum tRNATrp, urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0006 is doubly modified on C5 of the U-ring and on 2′-hydroxyl group of the ribose (formation of hypermodified cmnm5Um34 [81]). This is the same modification found in tRNALeu (anticodon urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0007) of the mixed Phe/Leu decoding box in M. capricolum. This type of hypermodified U34 has been demonstrated to restrict decoding of urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0008-containing tRNAs to only purine-ending codons, thus avoiding misreading of near-cognate pyrimidine-ending codons [34, 35]. A similar situation exists in mitochondria of vertebrates and insects, where both codons UGA and UGG also code for Trp. Here the unique mitochondrial U34-containing tRNATrp is modified into the hypermodified base xm5U (reviewed in [82]) and plays the same restrictive decoding role towards the Cys-pyrimidine-ending codons as cmnm5Um in Mollicutes. The absence in mitochondria of an enzyme catalyzing the formation of 2′-O-methyl derivative of U34, in addition to the enzymes catalyzing modification of the C5-atom in U34 in Mycoplasma tRNATrp may correspond to the lack of requirement for an extra tRNATrp (anticodon Cm34CA) reading exclusively codon UGG. Indeed, the presence of this methyl group on 2′-hydroxyl of the ribose, in addition to the modification at C5-atom of U34 induces a conformational rigidity of the nucleotide that was demonstrated to be incompatible with wobbling during translation [69, 83].

Finally, in ciliate Euplotes octacarinatus and E. crassus, UGA3 codes for Cys in addition to the ubiquitous pyrimidine-ending codons UGC3/UGU3 (dashed arrow in Cys/Trp UGN box in Fig. 4B, [84]). No cytosolic tRNACys harboring an anticodon starting with U34 seems to be present in these organisms, only a tRNACys with a wobble G34 [85], attesting that in Euplotes a wobble G34:A3 base pairing is used to read rare sense codon UGA3 during mRNA translation. This become possible only because mutations occurred in domains I of termination factor eRF1 (indicated by eRF1), altering its capacity to interact with UGA3, now a sense codon for Cys [78, 86].

Altogether, these examples illustrate again the subtle interplay between the set of tRNAs a given cell is using to decode mRNAs, the battery of tRNA modification enzymes that modify the wobble base-34 of their tRNAs, and the requirement for specific stop codon termination (release) factors.

4.3 Tyr/Stop-or-Gln (UAN3) decoding box

In all extant organisms, the two synonymous codons UAU3 and UAC3 are read by a unique tRNATyr. At the gene level, a G34 is always found. However, depending on the organisms, this genetically encoded wobble G34 in pre-tRNATyr can be replaced posttranscriptionally by a 7-deazaguanine derivative (preQ1 in eubacteria) or queuine (Q in eukaryotes) and further hypermodified into a variety of Q34-derivatives (symbolized by Q° and Q in Fig. 4C but not detailed here; for review see [87] and Supplementary Fig. SS5). Among the different enzymes involved in this complex metabolism, the key protein catalyzing the insertion of Q-derivative is a tRNA:guanine transglycosylase (designated Tgt, Fig. 4C and Fig. SS5). This enzyme acts on all tRNAs harboring a U35 in the middle position of anticodon, thus G34-containing tRNAs specific for His, Asn, Asp of the 2-synonymous codon family boxes. The Tgt enzyme (COG0345) is almost universally found in both Bacteria and Eukarya ([88]). However, in Archaea, the homologous enzyme catalyzes a similar transglycosylation reaction but at position 15 of the D-loop of tRNA molecules [89]. Therefore in all archaeal tRNAs specific for Tyr, His, Asn and Asp the wobble nucleotide is probably a non-modified G34 (Fig. 4C).

One remarkable feature of mature and functional tRNATyr in eukaryotes and in archaea is the almost ubiquitous presence of pseudouridine in the middle position 35 of the anticodon (Psi35). In yeast S. cerevisiae and archaeal P. abyssi and S. solfataricus, this Psi35 (as well as Psi13) is catalyzed by the phylogenetically related multisite-specific enzyme ePus7p and aPus7p, respectively (Fig. 4C). The pus7 homolog genes (COG0585) exist in nearly all eukaryotes and Archaea [90, 91]. However, in some archaea (four different Sulfolobales and Aeropyrum pernix), Psi35 in pre-tRNATyr appears to be also catalyzed by a different enzymatic system: the box H/ACA snRNA containing the enzyme Cbf5 (COG1258, Fig. 4C, [91]). The role of Psi35 in tRNATyr and of the Q34 derivatives in the tRNA decoding Tyr, His, Asn and Asp is reviewed in Namy et al. [92]. In short, as far as the eukaryotic tRNATyr is concerned, the Psi35 modification allows miscoding of the UAG3 stop codon, and the Q34 modification of the same tRNATyr (as well as release factors eRF1) counteracts the property of Psi35.

While triplets UAA3 and UAG3 are generally used as stop codons in the nuclear code, they are translated as Gln in some organisms belonging to ciliate genera (protozoan), as Tetrahymena, Paramecium, Oxytrichia and unicellular green algae as Acetabularia spp., leaving UGA as the only functional stop codon (see Fig. 2 in [93]). In Tetrahymena thermophila, two new species of tRNAGln, distinct from those needed to decode the canonical Gln-codons CAA3/G3, were identified. One harboring an anticodon Um34UA complementary to UAA3, the second harboring an anticodon CUA (with C34 apparently not modified) complementary to UAG [94]. Methylation of hydroxyl group of ribose of U34 is probably mediated by eTrm7p or a close homologue. This modification would favour decoding of UAA3 over UAG3. Moreover, secondary mutations in the primary sequence (domain 1) of termination factor eRF1 of T. thermophila (indicated by eRF∗∗1 in Fig. 4C) changing its specificity for the sole UGA stop codon, also contribute to the reassignment of UAA3/UAG3 as sense codons for Gln [78, 79, 95].

5 Decoding synonymous codons in the three domains of life: an evolutionary convergent adaptive process

5.1 Decoding 2-fold degenerate synonymous codons

In mixed amino acids decoding boxes (white background in 1, 3A), pyrimidine-ending codons are often read by tRNAs harboring a modified G34. In tRNAPhe, this G is usually modified to 2′-O-methyl-G34 (Gm34) by Trm7p in eukaryotes [66] and probably by YibK homolog [9] in some eubacteria (as in B. subtilis for example). For tRNAs specific for Tyr, His, Asn and Asp, various types of Q34 derivatives (see above) are found depending on the position of the organisms in the tree of life (1, 2 and Supplementary Figs. SS5–SS8). For tRNACys and tRNASer of the duet boxes, an unmodified G34 is always found. However in these cases, they are flanked with a modified pyrimidine at position 32 of anticodon loop: a pseudouridine (Psi32 catalyzed by RluA [96]) as in E. coli tRNACys or a 2-thiolated cytidine (s2C32 catalyzed by TtcA [96]) as in E. coli tRNASer. These modifications of wobble G34, together with other modifications in the anticodon loop and stem of tRNA (as N32, N37–N39, not discussed here, see however [51, 56-58]) guarantee a better discrimination between cognate pyrimidine-ending and non-cognate purine-ending codons corresponding to another amino acid, a situation that is reminiscent of the one discussed above in the case of mixed codon boxes for Ile/Met, Cys/Stop/Trp and Tyr/Stop-or-Gln.

The purine-ending codons of the same mixed amino acids decoding boxes are read by either a single U34-containing tRNA or by a combination of two isoacceptor tRNAs harboring U34 and C34, respectively (strategy #1 or #2 in Fig. 3A). In E. coli (probably in majority of – if not all – bacteria) and most mitochondria, U34-containing tRNAs specific for Gln, Lys, Glu, Arg (box AGA3/G3) and Leu (box UUA3/G3) all harbor a 5-iminomethyl-U34 derivatives (Xnm5U34 where X can be hydrogen = nm5U, most often a methyl group = mnm5U or an acetyl group = cmnm5U, 2, 5 A, Figs. SS6 and SS8, see also Fig. 1 in [33]). One exception is for E. coli tRNAGly (anticodon UCC) of the quartet boxes which harbors the same (c)mnm5U34 as of U34-containing tRNAs of the duet boxes (Fig. SS6). Depending on the bacteria, the insertion of these modifications requires the activity of three to five distinct enzymes and several cofactors (reviewed in [6]).

figure image
(A–C) Characteristic uridine-34 modifications. Among all modified nucleosides identified so far, those present at the wobble position of tRNAs is by far the most diversified ones. However, the type of chemical adducts at carbon-5 of uracil ring, the thio-group replacing oxygen in position 2 of uracil ring and presence of methylation at 2′-hydroxyl group of ribose are very much depending on the type of tRNA, (mainly those belonging to either the 2-codon set, see Fig 1A) and of the organism they were found. The large panoply of enzymes implicated in such hypermodification of U34 are not mentioned in the figure but can be found from the references cited in text. Lacking from the figure are mostly the type of U34 modification in archaeal tRNAs (comments in text).

In the corresponding cytosolic tRNAs of S. cerevisiae (possibly in majority of – if not all – eukaryotes), the wobble base U34 are modified to different derivatives (5-carbonylmethyl-U34 or Xcm5U34) where X correspond to –OCH3 (=mcm5U) or an amine group (=ncm5U) (2, 5B and Fig. SS7, see also Fig. 7 in [98]). The enzymes involved in the formation of these eukaryotic types of U34 modifications are phylogenetically unrelated to the bacterial ones catalyzing different types of reactions [99]. The precise chemical structure of the C5-adducts in the homologous archaeal U34-containing tRNAs is not yet known, however it seems to be closely related to eukaryotic types (discussed in [23]).

The only common posttranscriptional modification found in U34-containing tRNAs belonging to the 2-fold degenerate codon family boxes in all domains is the thio group (sometimes a seleno group, see Fig. 2, but not indicated in Fig. 5A and B) replacing the oxygen at position 2 of the uracil ring (Xnm5s2U/Xnm5se2U and Xcm5s2U/Xcm5se2U) in tRNAs specific for Gln, Lys and Glu, and a methyl group on the 2′-hydroxyl of ribose of uridine-34 for tRNALeu (anticodon UAA, where U is cmnm5Um in E. coli and Mycoplasma capricolum or ncm5Um in S. cerevisiae, Fig. 5A and B, compare Figs. SS6–SS8). However, the corresponding modification machinery are different in Bacteria and Eukarya/Archaea and phylogenetically unrelated (reviewed in [100]). The E. coli tRNAArg of the duet box is not thiolated at U34 (it harbors a simple mnm5U34), however it is thiolated at C32 (s2C32, as in tRNASer of the neighboring duet box – see above).

These various enzymatic modifications of nucleotides in the anticodon loop and particularly of the wobble U34, allow the resulting mature functional tRNAs to decode only purine-ending codons, with a strong preference for A-ending codons (functional convergent type of evolution, more details are in [2, 34, 35, 83, 98]). In several bacteria, a second C34-containing tRNA is also present (C34 usually not modified) to help more efficient decoding of the second synonymous codon NNG3 (no wobbling).

5.2 Decoding 4-fold degenerate synonymous codons

Post-transcriptional modification of U34 is not exclusive to the tRNA decoders of the mixed decoding boxes. As stated above, (c)mnm5U34 is also present in U34-containing tRNAGly (UCC) of E. coli (eubacteria and some organelles), while in S. cerevisiae U34 in tRNAGly is modified to mcm5U34, as U34 in S. cerevisiae tRNAArg of the neighboring duet box (Fig. SS7). However, in tRNAs specific for Leu (box CUN3), Val, Ser, Pro, Thr, Ala of majority of Bacteria (but not in Mycoplasma), a third type of U34 derivatives (Xo5U34) is present (Fig. 5C, Fig. SS6), again involving a panoply of distinct enzymes for its synthesis [33, 101]. In the cytoplasmic U34-containing tRNAs coding for the same amino acids Val, Ser, Pro, Thr and Ala in S. cerevisiae, a characteristic ncm5U34 is found instead (Fig. 5B), while U34 in tRNALeu (box CUN3) is not modified (Fig. SS7).

In Mycoplasma (as in mitochondria), U34 of tRNAs belonging to the quartet boxes are usually not modified, while the wobble uridine of all tRNAs belonging to the duet decoding boxes are invariably modified (see above, Fig. SS8 and [9]). Remarkably, modification at the C2 atom of U34 (e.g. formation of urn:x-wiley:00145793:media:feb2s001457930900965x:feb2s001457930900965x-math-0009 derivatives) or at the 2′-hydroxyl of ribose at position 34 (formation of Um34), as often found in tRNAs belonging to the duet synonymous codon family boxes (see above), are always absent in all tRNAs belonging to the 4-synonymous codon family boxes and examined so far. This allows tRNAs of the latter categories to be less stringent for the type of base at the third wobble position of codons than the former harboring ‘doubly’ modified U34 [102]. Also, in contrast with decoding systems that use a single non-modified U34-containing tRNAs to decipher the 4-fold degenerated synonymous codons (sparing strategy #4, Fig. SS8), the use of modified U34-tRNAs (U as summarized in Fig. 5) is always associated with other isoacceptor tRNA(s) specific for the same amino acid (sparing strategy #1 or #2): a G34-containing tRNA and eventually a third C34-containing tRNA (Figs. SS6 and SS7). Altogether these isoacceptor tRNAs harboring distinct anticodons allow a more efficient translation of the synonymous codons in mRNAs (discussed in [103, 104]). The other important aspect of mRNA decoding is the adequacy between the frequencies of each individual sense codons in mRNAs (especially in highly expressed mRNAs) and the relative amount, in term of cellular concentration and accessibility to the protein synthesis machinery, of each individual isoacceptor tRNAs of the whole cellular repertoire (not discussed here, for details see [105-109]).

In summary, accurate decoding of synonymous codons of the different amino acids is universally and strictly dependent on the existence of an impressive collection of very different tRNA modification enzymes acting at the wobble position of tRNAs (mainly U34 and G34, Fig. 2). These enzymes and the types of chemical modifications they catalyze depend of the organism considered, attesting they have been acquired progressively and independently in different phyla, possibly concomitantly with the later additions in the set of 20 amino acids used in the contemporaneous nuclear genetic code.

6 Conclusion

Comparing decoding strategies deduced from the analysis of the tRNA repertoire in the three domains of life unearthed universal rules of the decoding system as well as particularities that have evolved in a given kingdom, phylum or organism. Four main decoding strategies diversely used in Bacteria, Archaea and Eukarya were revealed. One prevalent trend is the dominant usage of tRNAs bearing the wobble base U34 for decoding synonymous codons of the unsplit quartet codon boxes. Similarly, the tRNA couples bearing the wobble bases U34 or G34, respectively, are universally required to decode the full set of purine-ending and pyrimidine-ending codons of the split (2/2) codon boxes in the three domains.

Posttranscriptional modifications of tRNAs, particularly at the wobble position N34 of anticodon, play a central role in reading the universal 20 amino acid code. Both decoding accuracy (limiting ‘cross-box’ misreading) and efficacy (increasing translation rates and avoiding frameshifts) are very much depending on specific tRNA modification enzymes transforming genetically encoded canonical A, C, G, U nucleotides of the precursor tRNA transcripts into a large variety of chemically altered derivatives with innovative structural and decoding potentialities (1, 2), for reviews see [3, 6, 57, 92, 110]. In contrast with the wobble base U34 of tRNAs decoding synonymous codons of the unsplit 4-codon boxes that can be unmodified (sparing strategy # 4), the same U34 base of tRNAs decoding the split (2/2) codon boxes (for Gln, Lys, Glu, Leu-CAR3, Arg-AGR3, R stands for purine) are, without exception, always chemically modified (Fig. 5). This trend applies also to the wobble base G34 of tRNAs decoding pyrimidine-ending codons of the same split (2/2) codon boxes (for Tyr, His, Asn, Asp and Phe) and C34 of the split (3/1) codon boxes (for Met, Trp, Fig. 4). Remarkably, many of these modifications are present only in specific domains, or follow an even narrower taxonomic distribution (Fig. 1C, [5, 7, 15, 17, 19]).

Because the decoding capacity and the presence of specific modifications at wobble N34 are tightly interdependent, the amazing diversity observed in modifications of extant tRNAs of Bacteria, Archaea and Eukarya suggests that at early stage in evolution, less than 20 amino acids were used and that their codon assignments were obviously not frozen [111]. Instead, the primordial code encompassing four canonical nucleotides (thus 64 codons) was probably read inefficiently by a minimalist tRNA repertoire with relax and possibly ambiguous decoding properties that were nevertheless compatible with cell viability and propagation [111]. The set of core amino acids with aliphatic, hydroxyl and positively charged side chains, such as Gly, Ala, Leu, Val, Pro, Ser, Thr, Asp and Glu, that are widely accepted as primordial, possibly naturally occurring amino acids (see for examples [113-116] and references therein), were probably all assigned within a generalized fourfold degenerate code (sparing strategy # 3, Fig. 3A). This simpler translational decoding system that depended less on N34 modification enzymes, could easily have existed before the split into the three extant domains of life (see for examples [117-119]). Only later, were some of the primordial four codon boxes split into (2/2) and (3/1) alternatives (discussed in [110]) to introduce new catalytically versatile amino acids into the cellular proteome (like His, Lys, Met, Cys, or Trp). The splitting of decoding boxes lead to inherent accuracy and specificity problems. These were solved using two major strategies. First, by the duplication of genes coding for additional isoacceptor tRNAs (expansion of the tRNA repertoire from decoding strategy #3 to strategy #2 and finally strategy #1). Second, by the simultaneous introduction of new posttranscriptional chemistries and the invention of had hoc RNA modification enzymes. Even if cases of orthologous genes displacements cannot be ruled out, our analysis suggests that many N34 modification enzymes evolved after the divergence of the three domains. Indeed, phylogenetically unrelated enzymes are used in different organisms to catalyze formation of a given modification at wobble U34 or G34 of tRNAs belonging to the split (2/2 and 3/1) codon boxes. In addition, cases of functional convergent evolution are plentiful with identical N34 chemical modifications introduced in a given isoacceptor tRNA by different enzymes in different organisms (4, 5).

We believe that the driving force and selective pressure that lead to having identical codons assigned to the same amino acids independently after the domain split was the necessity to “communicate”, e.g. use the same proteome and exchange genetic information based on the same code [120-122]. Organisms with non-standard or exchangeable genetic code were probably progressively eliminated. Hence it is only by a subtle and stepwise co-adaptation of the different elements of the translation machinery (genetic code, amino acids, tRNAs, amino acyl-tRNA synthetases, t + r + mRNA modification enzymes and termination-release-factors, to cite only the major elements) and the requirement for genetic exchange within a community of interdependent cells, that a complex, standardized and quasi-universal genetic code as we know to date was able to evolve.

Because of the multiple biochemical and physical constraints in decoding accurately mRNAs in modern cells, any codon reassignment, or introduction of new amino acids (such as selenocysteine or pyrrolysine not discussed here – for reviews see [123-125]) is trickier. While the genetic code was probably never frozen, in practice it appears nowadays very reluctant to change. Limited deviations to the universal code are found nevertheless in Candida (Leu > Ser), certain Euplotes (Stop > Gln, Stop > Trp or Cys), Mollicutes (Stop > Trp) and mitochondria (many different types of changes, for reviews see [25-27]). These reassignments required the gain or loss of a modification enzymes (like TilS or Pus1p, Fig. 4A), with the concomitant reorganization of anticodon stem-loop (such as the use of G32, G33 or C33 instead of C32/U32 or U33 in anticodon loop, or s2C32 in tRNASer/Arg) and/or change of codon specificity of a release factor (RF2 in bacteria, eRF1 in eukaryotes, Fig. 4B and C). These codon reassignments could have been driven by specific physiological constraints or particular ecological niches allowing the cells to become less dependent on a collective metapopulation of interdependent, eventually multi-cellular living cells. It could also have been a way for a lineage to create a biological barrier, thus limiting the possibility of genetic exchange with the risk of becoming ‘isolated’ and eventually disappear. The use of a slightly variant code may also act as a powerful antiviral strategy [126]. The case of mitochondria is interesting: here the repertoire of tRNAs can be minimalist and far less modified than the cytoplasmic tRNA counterparts and changes in codon reassignments appear more frequently than for the nuclear code (Ref. [126]). This simplified translation system, characteristic of many mitochondria and also found in some Mollicutes [9], is probably not as efficient or precise as the ‘standard’ nuclear translation system (discussed in [127]). This situation could be reminiscent of the primordial genetic code and may consist an example of a non-collective “retrograde” evolution.

Finally, our model of progressive expansion and standardization of the genetic code is not in opposition but complementary to previous models on the very early emergence and progressive elaboration of a genetic code based on amino acid physicochemical properties, amino acids biosynthetic pathways, G + C pressure, transcriptional constraints at the DNA level, minimization of translational errors (for details see for examples [128-134]).

In conclusion, deciphering the genetic code is a highly evolutionary adaptive process, and the introduction of additional amino acids into the code, especially those of the mixed amino acids decoding boxes, has co-evolved with the progressive acquisition of genes coding for tRNA modification enzymes acting mainly at the wobble nucleotide N34 of anticodon. The near universal codon assignment of the present-day genetic code could be in part a result of the need cells had (and still have) to communicate using a common genetic language.

Acknowledgements

This work was supported by the (MCB-05169448) and by the National Institutes of Health (R01 GM70641-01) to V.dC. H.G. hold a position of Emeritus Scientist at the University of Paris-XI, in the laboratory of Prof. Jean-Pierre Rousset and Dr. Olivier Namy, with whom we regularly discussing about genetic code and translation process. This review article is dedicated to our colleague and friend Nicolas Glansdorff who recently deceeded accidently [135].

    Appendix A A

    Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.febslet.2009.11.052.