The dairy industry relies heavily on the use of microbial starter culture systems. Among domesticated bacteria widely used in industrial applications,
Streptococcus thermophilus is a key species involved in the acidification of milk and the development of texture in various fermented dairy products (
3). Recent advances in genomics have provided novel insights into the many critical physiological functions carried out by
S. thermophilus in fermentation processes (
12). Specifically, unraveling the full genome sequences of three different strains has led to a better understanding of genes involved in acidification, texture development, and flavor enhancement (
3,
12,
20). Also, comparative genomic analyses have identified specific loci involved in particular traits attributed to selected strains and pointed out differential content, notably with regard to exopolysaccharide synthesis and phage resistance (
3,
12). Recent studies have also established a correlation between distinctive genomic content such as clustered regularly interspaced short palindromic repeats (CRISPR) and resistance to phages (
2,
4).
CRISPRs are a peculiar family of DNA repeats widely distributed in
Bacteria and
Archaea (
8,
9,
11,
13,
18). CRISPR loci usually consist of short and highly conserved DNA repeats, typically 21 to 48 bp, repeated up to 250 times (
9). The repeated sequences, typically specific to a given CRISPR locus, are interspaced by variable sequences of constant and similar length, called spacers, usually 20 to 58 bp depending on the species or the CRISPR locus. Several distinct CRISPR loci can be located on a particular prokaryotic genome (
18); for example, in
Methanocaldococcus jannaschii, 18 distinct CRISPR loci have been identified on the chromosome, totaling almost 1% of the genome (
5). In addition,
cas (CRISPR-associated) genes are often present in the direct vicinity of CRISPR loci (
8,
11,
14). Based on similarities between CRISPR spacers and phage or plasmid sequences (
4,
21,
23), it was proposed in the literature that CRISPR and
cas genes might be involved in conferring immunity to the host cell against foreign DNA (
19,
21). In the
S. thermophilus chromosome, two distinct CRISPR loci have been identified, namely, CRISPR1 and CRISPR2 (
3,
4). Comparative analysis of CRISPR1 sequence between various
S. thermophilus strains has revealed polymorphisms (
4). In addition, it was recently reported that CRISPR provides acquired resistance against viruses in prokaryotes, notably in
S. thermophilus (
2). This is consistent with the putative CRISPR-
cas immunity system based on RNA interference (RNAi) proposed by Makarova et al. (
19), although the mechanism of action remains uncharacterized.
The correlation between CRISPR spacer content and phage susceptibility suggests that spacer content might provide a historical perspective of phage exposure. Since the CRISPR system is reactive to the environment, it might play a critical role in the adaptation of the host to its surroundings and explain the persistence of particular bacterial strains in ecosystems where phages are present. CRISPRs may also provide insight into the codirected genomic evolution of the phage and its host. Here, we provide a comparative analysis of CRISPR activity and diversity across a collection of S. thermophilus strains. We specifically analyzed the relationship between spacer hypervariability and CRISPR locus activity across three distinct CRISPR loci. In addition, we provide insight into the functional coupling between a given CRISPR repeat and the accompanying cas gene set. Finally, we investigated the various origins of CRISPR spacers and discuss the evolutionary modifications of CRISPR loci in direct response to genetic elements present in the environment, notably bacteriophages and plasmids.
DISCUSSION
In addition to the two CRISPR loci previously described in
S. thermophilus (
3), we report here the identification of CRISPR3 in the LMD-9 genome (
20). Interestingly, this particular CRISPR locus is not ubiquitous in
S. thermophilus genomes. Between the three CRISPR loci present in
S. thermophilus genomes, diversity is observed at many levels, including (i) the typical CRISPR repeat sequence; (ii) the
cas gene content, organization, and sequence; (iii) locus architecture and content; and (iv) spacer content, arrangement, and sequence. Diversity was observed across the three CRISPR loci between 124 different
S. thermophilus strains. Specifically, CRISPR1 was ubiquitous, whereas CRISPR2 was present in 59 of 65 strains, and CRISPR3 was present in 53 of 66 strains. A total of 49 strains (39.5%) carried all three loci.
Comparative genome analysis of CRISPR content in streptococci and various bacterial genera and species indicates that the three
S. thermophilus CRISPR loci are distributed differently. Notably, CRISPR1 is present in only a few streptococci, whereas CRISPR3 can be found in most
Streptococcus species. The distribution of these three CRISPR loci suggests that CRISPR1 may have recently become more specific to a few streptococcal species, whereas CRISPR3 is more widespread across streptococci, and CRISPR2 may be a vestige of a gram-positive ancestor. This is consistent with the absence of CRISPR2 and/or CRISPR3 in various
S. thermophilus strains. In fact, detailed sequence analysis of distinct CRISPR3 locus architectures in various
S. thermophilus strains suggests that deletions may have occurred via homologous recombination events involving CRISPR3 repeats, likely including the degenerate repeat in the vicinity of
serB (Fig.
1).
When equivalent CRISPR loci between strains are compared, a high degree of polymorphism is observed for spacer content and sequences. Specifically, 105 of 124, 7 of 59, and 20 of 53 unique spacer arrangements were observed for CRISPR1, CRISPR2, and CRISPR3, respectively. This indicates that the overall CRISPR content was unique in most strains. Perhaps the polymorphisms observed in the spacer contents of the three CRISPR loci across different S. thermophilus strains are an indicator of the activity of the locus, whereby spacer hypervariability is directly correlated with historical phage exposure. Arguably, the degree of spacer polymorphism, in terms of both total number of unique spacers and total number of unique spacer arrangements, for a given CRISPR locus, could be directly correlated with its activity. Consequently, we propose that in S. thermophilus CRISPR1 is the most active locus, followed by CRISPR3. This is supported by several observations: (i) repeat degeneracy seems to correlate with relative activity, whereby the most degenerate repeats are found in the least active locus, namely, CRISPR2; (ii) spacer size is more highly conserved in the most active loci, namely, CRISPR1 and CRISPR3, and least conserved in the least active locus, namely, CRISPR2; (iii) the average and maximum numbers of spacers are highest for CRISPR1 and lowest for CRISPR2; and (iv) the number of CRISPR BIMs obtained is higher for CRISPR1 than CRISPR3.
Previous data have suggested that the enzymatic machinery of a specific locus cannot be effective in conjunction with the CRISPR genetic content of another (
2). Specifically, when
cas genes are inactivated in a particular CRISPR locus, the ability of this locus to provide resistance and integrate novel spacers is lost, despite the concurrent presence of other CRISPR loci and
cas genes elsewhere in the chromosome (
2). Here, we provide data indicating that each CAS system may be directly linked to a particular CRISPR repeat sequence, which is consistent with the observed comparable clustering of CRISPR repeats and Cas sequences (Fig.
5), as previously suggested by Kunin et al. (
16). Further studies investigating the mechanism of action of CRISPRs are currently under way and might provide insights into the roles of the various
cas genes and the functional link between specific Cas proteins and a particular CRISPR repeat. Among Cas proteins, some are likely involved in the addition of novel repeat-spacer units, via a molecular interaction with CRISPR repeats. Other Cas proteins are likely involved in the spacer-encoded resistance, which may be mediated via a RNAi-like mechanism (
19). These Cas proteins probably include at least one nuclease which might recognize and digest a specific target sequence. This is supported by the recent discovery of a highly conserved motif, which we propose to name CRISPR motif, immediately downstream of the proto-spacers found in phage sequences (
7). For CRISPR1, the AGAAW CRISPR motif located two nucleotides downstream of the proto-spacer might serve as a recognition site for a CRISPR1-specific Cas nuclease (Fig.
7). A different CRISPR motif was also identified for CRISPR3 (Fig.
7), GGNG, located one nucleotide downstream of the proto-spacer, which suggests again that each CRISPR locus has a unique CRISPR motif which may serve as a sequence recognition pattern, specific to a particular Cas enzymatic machinery. Further, CRISPR motifs may serve as additional elements to define a particular CRISPR/Cas system.
We have shown that two distinct CRISPR loci, namely, CRISPR1 and CRISPR3 have the ability to evolve directly in response to phages by the polarized addition of new spacers derived from viral genomic sequences. Accordingly, CRISPR spacers provide a historical perspective of phage exposure, whereby spacers present in the vicinity of the leader were relatively recently added, whereas distal spacers likely originated from previous events.
In addition to CRISPR variability due to the acquisition of novel spacers in response to phages, primarily at the leader end, we noticed that modifications can occur throughout the CRISPR locus, as seen in DGCC7710
Φ2972 +S15 (
7), where a deletion occurred concomitantly with the insertion of a new spacer at the leader end (Fig.
2). Specifically, most of the variability observed at the trailer end of the locus seems to occur via deletion (Fig.
2), arguably resulting in the preferential deletion of older spacers, which are likely less valuable for the bacterium in its current environment. This phenomenon is probably due to homologous recombination events occurring between CRISPR direct repeats. On the other hand, spacers recently acquired may be more valuable and thus more likely to be retained in the current environment. In some instances, peculiar spacers seem to be retained between seemingly distant strains, perhaps indicating that they provide a critical function (Fig.
2), such as targeting a conserved phage sequence. Altogether, CRISPR loci seem to evolve both through additions and deletions of repeat-spacer units.
Similarities between CRISPR spacers and phage or plasmid sequences have been documented previously (
2,
4,
21,
23). Although the majority of CRISPR spacers shows homology to phage (77%) and plasmid (16%) sequences, we identified four CRISPR spacers that are 100% identical to
S. thermophilus chromosomal gene sequences, including
dtpT and
rexA. This might indicate that the CRISPR/Cas system, in addition to providing resistance against foreign genetic elements such as plasmids and phages, may also serve as a microbial regulatory system involved in the control of mRNA transcripts levels for genes encoded on the chromosome, perhaps using a system based on RNAi, as previously suggested (
19).
Overall, the dynamic nature of CRISPR loci is potentially valuable for typing and comparative analyses of strains and microbial populations. Given that some loci are relatively active while others bear lower levels of polymorphism, the potential of a given CRISPR locus for typing and epidemiological studies has to be assessed on a case-by-case basis. Since CRISPRs are widely distributed in Bacteria and Archaea and actively involved in an adaptive immune system against foreign genetic elements, as well as intrinsic chromosomal elements, they provide critical insights into the relationships between prokaryotes and their environments, notably the coevolution of host and viral genomes.