Streptococcus pyogenes
In 1927, long before lysogeny was described, it was demonstrated that a filterable agent from scarlet fever isolates could convert nonscarlatinal
S. pyogenes to toxigenic strains (
22). We know now that this conversion is mediated by bacteriophages and that 90% of
S. pyogenes isolates are lysogenic (Fig.
1A). For several reasons,
S. pyogenes provides a neat test case for the role of prophages. Transformation and conjugation appear to play no or only minor roles in lateral DNA transfer in this species, giving phages a special role in this process (
68,
189).
S. pyogenes belongs to the lactic acid bacteria (LAB) branch of low-G+C gram-positive bacteria, and its phages are good examples of phages from LAB of medical and economical importance and will therefore be discussed in more detail. Phylogenetic relatives are used in the dairy industry as starter organisms, and their viruses have become a focus for comparative phage genomics studies (
28). The only known habitat of
S. pyogenes is the human; it is normally found on the skin and in the oral cavity in humans.
S. pyogenes comes in many M serotypes and causes an astonishing range of diseases including pharyngitis, scarlet fever, pyodermitis, fasciitis, rheumatic fever, and toxic shock syndrome. With respect to the three sequenced strains, M1 strains were associated with wound infections, M3 strains were identified in patients with severe invasive infections, and M18 strains caused rheumatic fever outbreaks (
8). Despite the protean character of this pathogen, the sequenced
S. pyogenes isolates are genetically closely related. For example, the M18 strain shared 1,532 of 1,696 open reading frames (ORFs) identified in the M1 strain, and sequence identity ranged from 83 to 100% at the base par level. In fact, a dot plot analysis revealed essentially a straight line between the two serotypes, with 1.7 of the 1.9 Mb of chromosomal DNA shared. There were only five larger regions of difference; all were prophage sequences (
189). This observation leads to an interesting question: is the specific pathogenic potential of a given
S. pyogenes strain influenced by its prophage content? Microarray analysis of 36 M18 strains demonstrated that prophages are not only a significant source of genomic divergence between strains of M1, M3, and M18 serotypes but also the predominant source of difference between M18 strains. The M18 strains differed by a maximum of 3% of the genes, and prophages were responsible for virtually all the variation in gene content. Variation in the prophages ranged from entire absence of the prophages from the test strain to small differences in the gene content of the prophages (
189).
When the DNA sequences from the 15 available
S. pyogenes prophages were compared against each other in a dot plot matrix, five clusters of phages consisting of two to four members could be distinguished, while two phages had only limited DNA sequence similarity to the rest of the phages (Fig.
2, green boxes). Next to
E. coli prophages, this is the largest set of prophage sequences from a single species for comparative analysis. Recently, a proposal was made to base the taxonomy of
Siphoviridae on the genetic organization of the structural gene cluster (excluding the tail fiber genes) (
171). All currently known LAB prophages showed a conserved overall gene order: left attachment site (
attL)-lysogeny-DNA replication-transcriptional regulation-DNA packaging-head-joining-tail-tail fiber-lysis modules-right attachment site (
attR) (Fig.
3, 4, 5). Particularly well conserved is the order of the structural genes, allowing the distinction of three major forms of head gene clusters exemplified by the
cos-site
Streptococcus thermophilus phage Sfi21, the
pac-site
S. thermophilus phage Sfi11, and the
cos-site
Lactococcus lactis phage r1t as prototypes. The corresponding phages were also observed in
S. pyogenes.
The first group of
S. pyogenes prophages are the r1t-like
Siphoviridae (Fig.
2). The second group resembles Sfi11-like
Siphoviridae. Sequence alignments allowed the distinction of two Sfi11 subgroups with
S. pneumoniae phage MM1 and
S. thermophilus phage O1205, respectively, as reference strains (Fig.
2). The third group of prophages are members of the Sfi21-like
Siphoviridae. Database matches again permitted the differentiation of two subgroups: one had sequence similarity to
Lactobacillus gasseri phage A2 (
171), and the other had sequence similarity to staphylococcal phages (Fig.
2). Except for the tail fiber and lysis genes and part of the lysogeny genes,
S. pyogenes prophage 315.5 lacks DNA sequence similarity to the other
S. pyogenes prophages, while it has some protein sequence similarity to a
Bacillus halodurans prophage (see below). Phage tail fibers and lysins have to interact with the cell surface and the cell wall of their bacterial host cell and should therefore be subjected to a strong adaptive selection pressure. Not surprisingly, all but one of the sequenced
S. pyogenes prophages had a highly related tail fiber module (Fig.
2, red box). Central to this module is the phage hyaluronidase, an enzyme that splits the hyaluronic acid-containing capsule surrounding the bacterial cell. This lytic enzyme allows the phage to reach the cell surface, where it injects its DNA into the bacterial cell. Only prophage 315.6 lacks this tail fiber module typical for
S. pyogenes prophages. Over these genes, prophage 315.6 had about 40% sequence identity at the amino acid level to
S. agalactiae prophage SA1 (see below), suggesting a possible cross-species infection. In fact, these two phages had some DNA sequence similarity across their entire genomes.
One would expect competition and exclusion between prophages during the establishment of a polylysogenic cell. Exclusion could be mediated by three proteins encoded in the lysogeny module, namely, the phage integrase, the superinfection exclusion gene
sie (
140), and the phage repressor (immunity function). It is therefore not surprising that a substantial diversification was observed over this region between the
S. pyogenes prophages (Fig.
2, yellow box; the largest group of phages sharing early genes is marked by blue circles). Eleven distinct prophage integration sites were identified in the three sequenced
S. pyogenes strains. Three sites were occupied in more than one strain; in all cases, the corresponding phage integrases had at least 92% sequence identity at the amino acid level. Seven prophages used unique integration sites; however, not all possessed unique integrases. Prophages 8232.1 and 315.1 showed distinct chromosomal integration sites while sharing identical integrases. Seven- and 14-bp core sequences were deduced for the two integration sites, showing a 6-bp overlap. A similar observation was made for prophages 8232.5 and 315.5. Such low specificity of the phage integrase with respect to conserved nucleotides in the core sequence was also observed in other phages from LAB (
S. thermophilus phage Sfi21 [
33] and
Lactobacillus phage mv4 [
5]). It is therefore unlikely that the competition for integration sites is a limiting factor in the establishment of polylysogenic cells.
For the DNA-packaging, head, and tail genes,
S. pyogenes prophage 370.3 had sequence similarity to
L. lactis phage r1t (Fig.
3A). Differences between r1t-like
S. pyogenes prophage were seen in a group of three nonstructural genes preceding the terminase gene (Fig.
3B, box C). An endonuclease gene and a point mutation interrupt and truncate the tail tape gene from prophage 8232.4 (boxes A and B, respectively). Prophage 315.3 showed distinct lysis and lysogenic conversion genes (box E). All three prophages demonstrated variations in the early genes (boxes D) and a disrupted replisome organizer gene (e.g., orf7a and orf7b in 370.3 [Fig.
3A]).
S. pyogenes prophage 370.2 could be aligned with
S. thermophilus phage O1205 in the DNA-packaging, head, and tail genes (Fig.
4A). Differences between the two O1205-like
S. pyogenes prophages 370.2 and 8232.3 were seen in the early-gene cluster and the lysis/lysogenic conversion genes. Prophage 8232.3 showed ORF-disrupting point mutations in the tail tape and a tail fiber gene, and a lysis gene is interrupted by an endonuclease, while prophage 370.2 contained a stop codon within the portal gene (Fig.
4A).
The four MM1-like
S. pyogenes prophages had extensive DNA sequence similarity throughout the entire structural genes. Except for the tail fiber and two head genes, extensive protein sequence similarity to
S. pneumoniae prophage MM1 was also detected (Fig.
4B). In the structural genes, differences between the MM1-like
S. pyogenes prophages amounted for a few gene replacements (Fig.
4B and C, boxes B and C), the transfer of a holin gene to the opposite strand (box H), and a point mutation leading to an inactivating frameshift in a tail fiber gene (box A). In contrast, quite extensive differences were detected over the early and the lysogenic conversion genes.
The A2-related subgroup of Sfi21-like
S. pyogenes prophages differed over the nonstructural early genes and the putative lysogenic conversion genes but showed closely related structural genes (Fig.
5B); those of the
Staphylococcus-related subgroup differed only for the genes near the
attR site (Fig.
5C).
A clear trend for prophage DNA loss was seen in the M1
S. pyogenes strain (Fig.
1A). A 13-kb prophage remnant, 370.4, encoded only lysogeny and DNA replication genes. A closely related prophage remnant was identified at a corresponding position in the Manfredo strain. The prophages shared both flanking
att sites but differed by internal insertions and deletions and gene replacements (
37). In addition, three prophage remnants of only 2 kb were identified; they consisted of the phage integrase accompanied by the phage repressor and a potential lysogenic conversion gene in the R-1092 remnant (
37).
All prophages but one (315.1) encoded potential virulence factors between the lysin gene and the
attR site (mitogenic factors, toxins/superantigens, enzymes) (
8,
53,
68) (Fig.
3 to
5). The lysogenic conversion genes in the three prophages of the M1 strain differ in their G+C content from the surrounding prophage and bacterial DNA (
68), suggesting a faulty phage excision process in an unusual bacterial host with a lower G+C content as the origin of this DNA (
230). The horizontal spread of these genes is also suggested by the presence of sequence-identical genes in the horse pathogen
Streptococcus equi (
37). Notably, there is a short stretch of sequence conservation adjacent to the right attachment site between different
S. pyogenes prophages (Fig.
3B and
4C, arrows). This conserved segment and the highly conserved region around the hyaluronidase gene in the tail fiber might allow an exchange of lysogenic conversion genes between different
S. pyogenes prophages by homologous recombination.
The prophage-encoded hyaluronidase and DNase have been suspected of promoting bacterial spread through host tissue by their ability to hydrolyze glucosaminic bonds in hyaluronic acid, a major component of the extracellular matrix in the connective tissue, and the liquefaction of pus when degrading the DNA from decaying lymphocytes, respectively. Notably, antibodies against both phage enzymes are found in some post-streptococcal diseases (
82). The virulence properties of the DNases are not entirely clear since they were also described in the literature as mitogenic factors (or streptodornases) (
191). The prophage 315.4-encoded Sla protein showed phospholipase A
2 activity. Sla has sequence homology to a potent snake venom toxin and might contribute to inflammation and coagulopathy seen in streptococcal toxic shock syndrome (
13).
Many
S. pyogenes prophages encoded streptococcal pyrogenic exotoxins (Spe) in the lysogenic conversion region (
8). However, the specific combination of toxins differed between the sequenced
S. pyogenes strains: the M1 strain showed
speC,
speH, and
speI genes; the M3 strain demonstrated
ssa,
speK, and
speA3, genes; while the sequenced M18 contained the
speA1,
speC,
speL, and
speM genes (
8). These are all distinct members of a large family of superantigens, and they include the scarlet fever toxin. These proteins bind the T-cell receptor and the major histocompatibility complex protein outside of the usual peptide binding site and lead to a pathological activation of the immune system, possibly allowing the escape of
S. pyogenes from immune surveillance (
170). It is conceivable that the variable combination of superantigens and mitogenic factors (
sda,
sdn,
mf2,
mf3, and
mf4) provided by multiple prophages influences the pathogenic potential of the polylysogenic host. This is a theoretically interesting possibility to explain the strikingly distinct symptoms associated with pathogens whose bacterial genome sequences are so similar.
For example, prophage NIH1.1 was identified in an M3
S. pyogenes strain from a toxic shock syndrome patient. It resembled prophage SF370.1 over the entire structural gene cluster but encoded a distinct superantigen (SpeL instead of SpeC) (
96) (Fig.
4B). Notably, possession of prophage NIH1.1 was a genetic marker for newly emerging M3
S. pyogenes strains in Japan (
97), which had replaced an otherwise genetically identical strain that just lacked the prophage NIH1.1 (
96). Prophage acquisition might thus be a major mechanism of short-term evolution in this epidemiologically highly dynamic and clinically variable bacterial species (
8,
13). In an appealing model, the emergence of new, unusually virulent subclones of M3 strains is explained by the sequential acquisition of prophages 315.5, 315.2, and 315.4 in approximately 1920, 1940, and 1985 (
13), suggesting bacterial pathogenicity evolution by prophage-mediated lateral gene transfer in the fast lane.
The possession of phage-located toxin genes does not automatically lead to the expression of these genes. Clinical isolates containing toxin genes showed a variable pattern of toxin expression when grown in broth culture (
105). However, growth of these strains in mice or coculture with human pharyngeal cells led to the production of the toxins (
30,
105). A small heat-stable factor released from the pharyngeal cells was identified as an inducer of the prophages (
26). This is a fascinating observation, since it means that streptococcal prophages respond via bacterial regulation systems to signals emitted from the eukaryotic host. Mobile DNA and prophages were also the most prominent group of genes that showed expression changes when mRNA from
S. pyogenes cells grown at 29 or 37°C was assayed on an M1 strain-based microarray (
190).
Streptococcus thermophilus
S. thermophilus is naturally found in raw milk and represents a major starter bacterium in the dairy industry. Lysogeny is not widespread in this species (
29). According to the mode of DNA packaging, two groups of temperate
S. thermophilus phages have been characterized (
122), represented by the
pac-site Siphovirus O1205 (
192) (Fig.
4A) and the
cos-site Siphovirus Sfi21 (
129) (Fig.
5A). Temperate and virulent
S. thermophilus phages showed a peculiarly close genetic relationship. Virulent phages which are the predominant ecological isolates from both the factory and raw milk (
31) are essentially the result of deletion, gene replacement and rearrangement events in the lysogeny module of temperate phages (
127). In silico analysis demonstrated that prophages related to the two basic types of
S. thermophilus prophages are found in many low-G+C gram-positive bacteria (Fig.
4A and
5A). Comparative genomics revealed distant relationships to lambdoid phages from gram-negative bacteria and even prophages from
Archaea (
51) (Fig.
6). In fact, over the structural gene cluster, the Sfi21-like phages shared a gene map with
E. coli phage HK097. Sfi11 phage even showed protein sequence similarity to phage lambda, suggesting distant phylogenetic relationships between these phages (
28). No protein sequence similarity linked HK097 and Sfi21 prophages. However, some features characteristic for this group of phages were identified in both: the major head protein was proteolytically cleaved at amino acid 104 and 105, respectively, releasing an N-terminal protein fragment with strong coiled-coil structure (
51). In both prophages, a protease gene precedes the major head gene. In phage Sfi21, this protease belongs to the ClpP protein family, as in many other Sfi21-like phages from LAB and even prophages from γ-proteobacteria (
54). This specific gene constellation is a diagnostic criterion for Sfi21-like phages. Sfi11-like phages showed a distinct head gene constellation consisting of three phage head genes and one scaffold gene.
Sfi21 belongs to the few temperate phages from LAB for which a transcription map was established both in the lytic mode of infection (
210) and in the prophage state (
209) (Fig.
5A). In the lytic mode, essentially the entire Sfi21 genome was transcribed, allowing a distinction of early (transcription regulation module), middle (DNA replication), and late (structural and lysis genes) transcripts. In the lysogenic state, only two Sfi21 genome regions were transcribed from the otherwise transcriptionally silent prophage (
209). One transcript comprised the DNA segment from the
cl-like repressor (
34) to the superinfection exclusion (
sie) genes located directly upstream of the phage integrase (
32). The cloned
cI repressor protected a cell against superinfection with temperate phages (
34), while the cloned
sie gene conferred protection against many virulent phages (
32). Another transcript covered four genes located between the lysin gene and the
attR site (
209). These genes lacked database matches, preventing speculations about their possible functions.
S. thermophilus prophage O1205 carries a different set of genes near
attR, and they also belong to the few genes transcribed from the prophage (
209) (Fig.
4A). A lysogenic conversion phenotype was observed for a
S. thermophilus strain lysogenic with the prophage TP-J34: it showed distinct growth properties (planktonic versus aggregated growth) when lysogenic or when prophage cured (H. Neve, personal communication). TP-J34 displayed a distinct set of genes between the lysin gene and the
attP site (
158). A database search revealed that many temperate phages from low-G+C gram-positive bacteria showed extra genes between the phage lysin and
attR (
209). With the exception of a
Bacillus halodurans prophage (see below), these prophage genes from free-living bacteria showed an informative database match, precluding any speculation with respect to their function (
209). In accordance with theoretical predictions, a prophage remnant consisting of the phage integrase and a few transcribed phage genes was described for
S. thermophilus (
209).
Lactococcus lactis
L. lactis is the closest phylogenetic relative of the genus
Streptococcus and is the major starter used in the cheese industry. Due to the economical impact of phage infections, lactococci and their phages became a focus of research in dairy microbiology. The completely sequenced
L. lactis strain IL1403 contained six prophage elements (
42). Two inducible and one noninducible prophages showed the genome organization of
cos-site temperate
Siphoviridae closely related to
S. thermophilus phages Sfi21. Three 15-kb prophage remnants had maintained only lysogeny genes (integrase and repressor) and, in variable amounts, DNA replication and a few structural genes (
42). In contrast to our interpretation, these authors viewed them as P4-like satellite phages.
The Sfi21-like lactococcal prophages are represented by prophage BK5-T (Fig.
5A). Over the structural gene cluster, BK5-T showed an interesting gradient of sequence similarity covering high and low DNA identity to prophages bIL286 (
Lactococcus) and Sfi21 (
Streptococcus) or moderate or low protein sequence identity to phages adh (
Lactobacillus) and PVL (
Staphylococcus). Since this gradient of prophage relatedness reflects the phylogenetic relationship of their host bacteria, a coevolution of prophages with their bacterial hosts was initially discussed (
52). However, further analysis also demonstrated substantial sequence diversification within prophages from a single bacterial species (
L. lactis), including DNA sequence (bIL286), protein sequence (bIL309), or only genome organization similarity to BK5-T (bIL285), creating a problem for phage taxonomy and models of phage evolution (
171). Transcription in the BK5-T prophage was limited to two regions: three transcripts covered the phage integrase, the
sie homologue and the
cI repressor, and another transcript was derived from an anonymous large gene located between the phage lysin and the
attR site (
21) (Fig.
5A).
The best-characterized Sfi11-like prophage in
L. lactis is prophage TP901-1 (
24,
25) (Fig.
6). The structural proteins from TP901-1 have been characterized by protein sequencing (
100), immunoelectron microscopy (
100), and mutational analysis (
165). The results confirmed that the prediction of gene functions by comparative genomics, and specifically the alignment of the structural gene map with phage lambda, is quite reliable (
28). For example, as in lambda, the length of the tail structure is determined in TP901-1 by the length of the tail tape measure protein (
165). Also, the prediction of a transcriptional regulation module between the DNA replication module and the structural gene module in prophages from LAB was confirmed experimentally (
24). In fact, many of the in silico predictions of gene assignments in phages from LAB were confirmed by experiments with one or the other phage from LAB, instilling some confidence in the power of comparative phage genomics. Indeed, in some cases the experiments were actually guided by comparative genomics. The genome analysis of
S. thermophilus phages differing in host range provided keys to the location of the phage antireceptor on the genome map and suggested a mechanism of diversification by the exchange of highly variable gene segments flanked by conserved gene segments encoding collagen-like peptides (
127,
200). The model was subsequently confirmed by the construction of chimeric phages with
S. thermophilus phage DT1 (
61). However, two genes occupied different genome positions in dairy phages from their positions in phage lambda. In prophages from low-G+C gram-positive bacteria, the lysis cassette is invariably located downstream of the tail fiber genes, in contrast to lambda, where they are found upstream of the DNA packaging genes. Second, the excisionase from TP901-1 was identified (
23) within the early genes downstream of the
cro-like repressor gene (
131). In lambda, the
xis gene is found directly upstream of the phage integrase gene. This position is occupied in lactococcal and streptococcal prophages by the
sie gene (
140).
The third class of lactococcal prophages is represented by phage r1t (
206). Not only did its genetic switch region show a comparable structure to that of phage lambda (
155), but also molecular modeling of its repressor on the basis of the lambda
cI-repressor allowed the design of a thermolabile repressor mutant as a genetic tool (
154). The functions of two predicted DNA replication genes were confirmed by biochemical experiments, including a replisome organizer (
235) and a RusA protein. The latter is an endonuclease that resolves Holliday junction intermediates formed during DNA replication, recombination, and repair (
182). Interestingly, the RusA protein of
E. coli is also encoded by the defective prophage DLP12 and
rusA-like sequences are associated with prophage sequences in several bacteria. Since phages from LAB dedicate more genes from their genome to DNA replication functions than the similarly organized phage lambda does, one might ask to what extent some of these genes are of potential use to the bacterial host. Such dual functions could also explain why prophage remnants in LAB demonstrated a trend to maintain genes from the lysogeny and DNA replication modules. As in
S. thermophilus phages, closely related virulent derivatives of temperate r1t-like phages were described. Their lysogeny module consisted only of the genetic switch structure, while the phage integrase has been eliminated (
133).
Many lactococcal phages, including r1t, contain introns at various genome positions, demonstrating that selfish DNA elements such as prophages can also become the target for parasitic DNA elements. Intron homing is the process by which introns spread through a population of intronless alleles and is initiated by intron-encoded endonucleases. In dairy phages, these endonucleases are found relatively frequently (
47,
71). The process of intron homing can be very efficient: an ecological survey in
S. thermophilus phages revealed that all phage lysin genes possessing a 14-bp consensus sequence contained an intron. As with the prophage DNA, one would expect a selection pressure to remove the intron or to prevent its further spread. Indeed, large deletions within the homing endonuclease were detected in
S. thermophilus phages (
71).
Lactobacillus
The use of
Lactobacillus, another LAB, in various food fermentation processes and as probiotic (health-promoting bacteria) has motivated research into their phages and prophages.
L. delbrueckii prophage mv4, for which closely related virulent phages were also described (
142,
208), became the focus for research into the site specificity of the integration system (
4,
5).
Lactobacillus gasseri phage A2, in comparison, is the best-characterized LAB phage with respect to its DNA-packaging mechanism (
75). Also, the genetic switch structure of A2 was studied in more detail than in any other phage from LAB: three operators located between divergently transcribed repressor genes were bound with different affinities by the two repressors, resulting in a repositioning of the RNA polymerase (
76,
110,
111).
When corresponding genome segments were studied in different phages from LAB, substantial biological variability was frequently observed. The genetic switch region can serve as an example.
Lactobacillus casei phage A2 still follows the phage lambda paradigm relatively closely. Substantial deviations from this theme were found in other phages from LAB; e.g.,
Lactobacillus plantarum phage phig1e showed seven 15-bp operators with dyad symmetry in this region, which were bound differentially by the repressors encoded by the flanking genes (
102); in the
Lactococcus phage BK5-T, the divergently transcribed repressor genes are separated by one ORF which is normally found further downstream of the early lytic transcript (
134); and in the lactococcal phage TP901-1 and the streptococcal phage Sfi21, the lytic (Cro) repressor lacked binding activity for the DNA of the genetic switch region and inhibited the lysogeny (
cI) repressor binding to the genetic switch region possibly by protein-protein interaction between the two repressors (
34,
132).
The holin-lysin system provides another example. The similarity with the phage lambda holin S and lysin R gene constellation was demonstrated in experiments where
Lactobacillus phage holin (
162) and lysin (
19) could complement lambda prophages containing mutations in both genes. However,
S. thermophilus phages showed two holin genes with distinct biological properties, suggesting a holin-antiholin system in the control of the lytic process (
184). Apparently, there are many different solutions to a given problem for phages with a common overall genome organization. This is not a peculiar situation in phages from LAB; similar observations were made with lambdoid phages (
218).
Most sequenced
Lactobacillus species contain prophage sequences. The 2-Mb chromosome of the gut commensal
Lactobacillus johnsonii NCC 533 (Nestlé) contained two prophages showing the genome organization of Sfi11-like
pac-site
Siphoviridae (
54). The lysogeny module of these prophages contained more genes than are commonly found in temperate phages of LAB (
128). In one prophage, two of these extra genes showed links to a genomic island from
S. aureus. Northern blot analysis revealed that these genes are transcribed in the lysogen. Microarray analysis demonstrated that the two prophages Lj928 and Lj965 represented quantitatively the majority of the strain-specific DNA of the sequenced
L. johnsonii strain. Another
L. johnsonii prophage, Lj771, had extensive DNA sequence identity to a prophage in the sequenced
L. gasseri strain (Joint Genome Institute). Differences over the late genes were limited to few genome regions (lysin and anti-receptor) but were more extensive over the early genes.
The sequenced
L. plantarum strain WCFS1 (
106) contained two closely related Sfi11-like prophages that had a nearly identical structural gene cluster. One prophage contained a disruptive mutation in the terminase gene. Candidate lysogenic conversion genes were identified by database searches and transcription analysis near both the
attL and
attR sites. The extra genes shared similarity to a mitogenic factor encoded by an
S. pyogenes prophage. This observation is notable since the sequenced
L. plantarum strain was isolated from the oral cavity of a human, which is also the habitat of
S. pyogenes. A prophage remnant consisted of truncated lysogeny, DNA replication, and a few structural genes typical for an Sfi21-like phage. It abutted directly one of the Sfi11-like prophages.
Listeria
Although most if not all
Listeria strains carry functional or cryptic prophages, the potential influence of lysogeny on the host phenotype is unknown. Only one
Listeria prophage has been investigated in some molecular detail: A118 belongs to Sfi11-like
Siphoviridae, but lacks a
pac-site (
125). The prophage integrates into
comK, a putative transcriptional activator for various factors involved in competence for DNA uptake. However,
Listeria is not easily transformable, and so a negative lysogenic conversion phenotype is not immediately obvious. A closely related prophage, EGDe, was identified in the sequenced
Listeria monocytogenes strain (
79). Over the structural gene cluster, differences from A118 were limited to the major head gene. In view of the intricate protein-protein interactions which occur during phage morphogenesis, it is surprising that a single protein can be exchanged without upsetting the other phage proteins participating in the head-building process. More substantial differences were detected over the nonstructural genes including the lysogeny module, which might explain why A118 can be propagated on a strain containing the EGDe prophage. From the
Listeria strain ScottA, isolated during a large listeriosis epidemic in the United States, an Sfi21-like prophage, PSA, was induced and sequenced (accession number AJ312240 ). Like all sequenced
Listeria prophages, PSA contained a cluster of genes without database matches near the
attR site. Parts of these genes were shared between different
Listeria prophages.
Listeria is ubiquitous in nature; it can be found in soil and the gut, and it represents an opportunistic pathogen in animals and to a lesser extent in humans.
L. monocytogenes, the etiological agent of listeriosis, a severe food-borne disease, and the nonpathogenic species
L. innocua shared a closely related genome and an unexpected synteny with
B. subtilis and
S. aureus (
79). Remarkably, all major gaps in the alignment of the two bacterial genomes were represented by the prophages integrated into
L. innocua. Except for prophage genes, less than 10 and 5% of the genes were
L. monocytogenes and
L. innocua specific, respectively.
L. innocua contains five prophages; only A118-like prophage 1 is shared with
L. monocytogenes, but the two prophages are integrated into two different chromosome locations. Over the structural genes, prophages 2, 3, and 5 resembled
B. subtilis prophage PBSX,
Xylella prophage XfP3, and
Lactococcus prophage bIL285 (
171), respectively. The closest relative of prophage 4 was the
L. monocytogenes prophage EGD, with which it had low to moderate sequence similarity in a patchwise fashion.
Staphylococcus aureus
Staphylococcal enterotoxins cause an acute food-poisoning syndrome that is the second most frequent food-borne disease in the United States. Like botulism, the illness results from ingesting preformed bacterial toxins. The gene for enterotoxin A is carried by several staphylococcal prophages near their attachment sites (
14). In addition,
S. aureus causes a range of diseases from skin infections to life-threatening conditions such as sepsis. The organism produces many toxins and is highly efficient at overcoming antibiotics. A number of prophages have been found in clinical isolates. Their sequencing revealed the carriage of several toxin genes. Prophage PVL, a typical Sfi21-like siphovirus (Fig.
5A), encoded the clinically important bicomponent cytotoxin leukocidin S and F between the phage lysin and the
attR site (
103). Leukocidin is an established staphylococcal virulence factor, which causes leukocytolysis and tissue necrosis. The same toxin was found on prophage SLT, showing a distinct morphology (an elongated instead of icosahedral head as in PVL), suggesting horizontal transmission of toxin genes between temperate phages (
153) (Fig.
7). Despite its distinct head morphology, SLT also showed the genome organization of an Sfi21-like siphovirus, with the characteristic gene constellation portal protein-ClpP protease-major head gene (identical to the prophage phi12 head protein, see below). The noninducible prophage PV83 shared with PVL the entire structural gene cluster (>86% amino acid identity) but showed a variant leukocidin (LukM/ LukF pore-forming complex) next to a distinct lysis cassette. The defective nature of this prophage might be linked to the incorporation of a transposase-containing insertion sequence into a head-to-tail joining gene of PV83. A second insertion element is found near the
attR site of PV83 (
234).
Exfoliative toxin is one of the extracellular staphylococcal proteins causing blistering skin disease. The exfoliative toxin A is encoded downstream of the lysin gene in prophage ETA (
225) (Fig.
7). This prophage showed the genome structure of an Sfi11-like siphovirus, with many protein sequence links to the phages described in the preceding sections. Comparison with prophage SLT identified a possibly inserted group of genes between the tail fiber and lysis genes (Fig.
7). These genes encode a cell hydrolase and a protein related to a collagen-like surface protein, a virulence factor in
S. pyogenes, thus representing further candidate lysogenic conversion genes.
The genome sequence from the methicillin-resistant strain N315 and the vancomycin-resistant strain Mu50, isolated 15 years apart from Japanese patients, were closely related (99% at the nucleotide level); most of the differences were due to the insertion of Mu50-specific DNA elements (
109). Both strains had related prophages phiN315 and phiMu50A integrated at the same chromosomal locus (beta-hemolysin) next to the pathogenicity island SaPIn1 (Fig.
1B). The two prophages belonged to the Sfi21-like
Siphoviridae and had sequence similarity to prophage PVL over large parts of the genome (Fig.
7). However, the DNA-packaging, head, and tail genes belonged to different modules, showing some sequence similarity to
Listeria prophage PSA. Differences between phiN315 and phiMu50A included several gene replacements in the lysogeny module (e.g., a sugar transferase), a larger replacement in the putative transcription regulation module, and a single-gene indel near the
attR site, providing an additional truncated lysin gene in phiN315. Both prophages contained several candidate lysogenic conversion genes (encoding enterotoxin P, staphylokinase, and the M-like protein fragment) near but not in the direct vicinity of the
attR site. The vancomycin-resistant strain contains an additional prophage phiMu50B that shares with prophage ETA the integrase and integration site and sequence similarity over part of the early, tail fiber, and lysis genes (Fig.
7). PhiMu50B is a close relative of prophage phi11 from
S. aureus strain 8325, with which it could be aligned at the DNA level over nearly the entire genome length including the head gene cluster, defining a second allele of structural genes in Sfi11-like
S. aureus prophages (Fig.
8). PhiMu50B contained candidate lysogenic conversion genes in the vicinity of the genetic switch region (two genes with links to the pathogenicity island SaPIn1) and genes downstream of the phage lysin (one showed a match with a
S. pyogenes prophage gene located next to a superantigen or toxin) (Fig.
7).
In contrast to N315, which was isolated from a hospital infection, MW2 is a community-acquired methicillin-resistant
S. aureus strain which is otherwise susceptible to many antibiotic classes. This strain had 95% identity to N315 and Mu50 at the nucleotide level. MW2 contains two prophages: phiSa3 and phiSa2. The first is found at a position occupied by prophages in four of the five currently sequenced
S. aureus strains (Fig.
1B). Comparison of the corresponding prophage maps revealed patchwise relatedness. This mosaic structure was interpreted as evidence for multiple crossovers between the phages (
7). PhiSa3 encodes two new enterotoxins in the vicinity of
attL (enterotoxin G and K homologues, nearly identical to the corresponding genes in the SaPIn3 pathogenicity island [
226]) and the
sea toxin gene located between the tail fiber and lysis module. PhiSa3 differed from the prophage PVL essentially only in the associated virulence factors (Fig.
8). PhiSa2 has DNA sequence similarity to prophage phi12 essentially over the entire genome (Fig.
8). Differences included a few indels and some gene replacements. Notable was the possession of the
lukF and
lukS genes between the lysin gene and
attR in phiSa2, where phi12 lacked ORFs.
Strain 8325 was used for the construction of the first physical maps of
S. aureus. It harbors three prophages, phi11, phi12, and phi13 (
95) (Fig.
1B). phi11 and phi13 have been studied in some detail. phi11 DNA is 5% terminally redundant and 40% circularly permutated (
126). phi11 is one of the few prophages from low-G+C gram-positive bacteria that showed the
attP-int-xis gene constellation familiar from phage lambda (
227,
228). This is, however, not the common situation even in staphylococcal phages (
38). phi13 was the first staphylococcal phage associated with positive (staphylokinase) and negative (beta-toxin) phage conversion (
222). The negative phage conversion occurred because phi13 integrated into the beta-toxin, leading to gene inactivation (
46). This is not an isolated case.
S. aureus phage L54a integration confers a lipase-negative phenotype due to insertional inactivation of a lipase gene (
119). The positive phage conversion is conferred by the staphylokinase gene located between the phi13 lysin gene and
attR.
A dot plot matrix of the available
S. aureus prophages demonstrated five distinct groups of structural modules (Fig.
8). Three distinct groups of Sfi21-like
cos-site
Siphoviridae were identified: PVL-PV83-phi13-phiSa3 comprise the first group, the second is represented by SLT-phiSa2-phi12, while the third group is provided by phiMu50A-phiN315. In addition, two different Sfi11-like
pac-site
Siphoviridae were revealed by the dot plot: prophages phiMu50B-phi11 on one side and phiETA on the other side. With respect to the early genes, the distinction of DNA homology groups was less obvious. Two loosely defined groups could be distinguished (Mu50B-PVL-Sa3 vs all the others), but an extensive mosaicism prevented a sharper distinction of modules (Fig.
8).
In contrast to the prophage-containing S. aureus strains, the sequenced Staphylococcus epidermidis strain ATCC 12228 (accession number AE015929 ) lacked prophage sequences.
Bacillus
Comprehensive research was conducted on two virulent phages of the soil bacterium
Bacillus subtilis: phi29, a podovirus, and SPP1, despite its life-style a typical Sfi11-like siphovirus (
54) and by far the best-characterized phage of this proposed phage genus (see references
6 and
130 for recent examples and references therein). Much less is known about temperate
B. subtilis phages, which have been classified into five groups (
231). Only three groups are represented by sequenced prophages. The group I phage phi105 shows the genome organization of a typical Sfi21-like siphovirus (
52) and was investigated mainly for repressor binding to the genetic switch region (
205). The group III phage SPBc2 represents a 134-kb siphovirus consisting of 187 predicted ORFs, 70% of which lacked matches to the database (
118). A mere 14 ORFs shared links with other phages. In contrast, about 30 ORFs had links with bacterial genes, mostly from
B. subtilis. According to the orientation of the ORFs, three clusters could be distinguished. Cluster I contains the integrase/recombinase. Cluster II starts with the lysis cassette and continues with the structural gene module. The tail fiber genes had up to 50% amino acid sequence similarity to proteins from defective
B. subtilis prophages. Cluster III contained genes involved in transcriptional regulation, DNA replication, and nucleotide metabolism.
Group V prophages are represented in the sequenced
B. subtilis strain by the defective prophages PBSX and skin. Upon UV or mitomycin C induction, the cell releases phage-like particles consisting of small heads and large, complex tails that adsorb to and kill related bacilli acting like bacteriocins. The head contains randomly selected 13-kb fragments of the bacterial chromosome. In that respect, PBSX resembles a small bacteriophage-like particle discovered in the purple nonsulfur bacterium
Rhodobacter capsulatus, which transfers random 4.5-kb segments of the genome of the producing cell to recipient cells, where allelic replacements occur. This particle was called a gene transfer agent, resulting in a genetic exchange process controlled by the bacterial cell (
113). However, the DNA packaged into the PBSX head is not injected into the cell (
223). The widespread occurrence of the PBSX-like defective phages throughout the
Bacillus species and the failure to isolate strains cured of PBSX nevertheless suggested that their continued maintenance is advantageous, if not essential, for the host strain (
223). The 28-kb PBSX prophage remnant consists of a shortened lysogeny and DNA replication module and a structural gene cluster whose organization resembles that of the Sfi11-like
pac-site
Siphoviridae (Fig.
9). In comparison with the standard genome map of Sfi11-like phages, PBSX lacks a large head protein gene normally located between the portal gene and the scaffold gene. In addition, there are fewer head-to-tail joining genes than usual, possibly explaining the small head morphology. The siphovirus-like tail fiber genes are followed by sequence links to putative tail genes from the myovirus prophage SPBc2 and end in a lysis cassette consisting of a holin and an amidase-type lysin gene (Fig.
9). During sporulation, the ca. 50-kb skin prophage element is excised from the
B. subtilis chromosome by a DNA rearrangement event (
197). The prophage remnant contains a seemingly complete set of structural genes characteristic of Sfi11-like
pac-site
Siphoviridae. Over these genes, it had sequence similarity to many structural genes from the PBSX prophage and
Listeria innocua prophage 2 (
79). The structural gene cluster is preceded by a DNA replication module. The lysogeny region is reduced to the genetic switch structure, while an integrase was not detected. An integrase was found downstream of the prophage lysis cassette, separated by a group of bacterial genes including an arsenic resistance operon. In contrast to PBSX, the skin element contains no genes essential for
B. subtilis viability.
The sequencing of
B. halodurans, an industrial source of enzymes used under alkaline pH, revealed 112 ORFs encoding transposases or recombinases, suggesting an important role of these enzymes in horizontal gene transfer in this species; however, no prophage was reported (
196). A reanalysis of the sequence revealed a complete prophage, showing the typical genome organization of an Sfi11-like siphovirus with sequence matches over the head and tail genes to
S. pyogenes prophage 315.5. As in a number of other prophages, an isolated adenine methyltransferase gene was detected between the DNA replication module and the DNA-packaging module. More interestingly, however, was the presence of a type II restriction endonuclease and an associated cytosine-specific methyltransferase located between the phage lysin and the
attR site. Possession of the prophage thus confers a potentially new restriction modification system to the lysogenic cell.
Clostridium
The spore-forming clostridia are widely disseminated in soil and lakes but are also found in the intestinal flora.
Clostridium botulinum is defined as any clostridial isolate that produces botulinum toxin, which causes an often fatal form of food poisoning. Biological experiments conducted 30 years ago established that lysogenization by some bacteriophages with contractile tails converted nontoxigenic into toxigenic isolates. Curing of the prophage leads to concomitant loss of the toxigenicity (
66,
67). However, the first temperate
Clostridium phage was sequenced only recently (
233). Despite its relatively small genome size of 33.5 kb, the
Clostridium perfringens phage phi3626 showed the typical genome organization of Sfi21-like
Siphoviridae (Fig.
9). The presence of two genes related to sporulation-dependent transcription factors in the early-gene cluster suggests that this prophage also has a potential involvement with sporulation.
C. perfringens is present in different pathotypes; strains producing the enterotoxin CPE are an important cause of food poisoning and recently also antibiotic associated diarrhea, while the histotoxic clostridia produce exotoxins that are implicated in gas gangrene. The histotoxic
C. perfringens strain (
185) contains a prophage with a variable extent of protein sequence identity, ranging from 25 to 80% to phi3626, essentially over the entire structural gene cluster. The structural module was flanked on one side by a lysis cassette. On the other side, the similarity to the genome organization of typical prophages from low-G+C gram-positive bacteria was less obvious: only a XerD/C-like recombinase and three potential transcriptional regulators could be identified. No phage-encoded candidate virulence factors could be identified by in silico analysis. The sequenced
Clostridium acetobutylicum strain used in industrial fermentation (
159) contained two prophages. Prophage 1 showed the structural gene cluster of an Sfi21-like phage. Sequence similarity to phi3626 was weak and was limited to five structural genes, while similarity to
S. aureus prophages was more prominent. The structural gene cluster was preceded by 50 mostly very small ORFs that lacked links to phage genes except for three DNA replication and two repressor genes. This region ends with two closely spaced resolvase genes. The diagnosis of prophage 2 is less well backed by database matches. In fact, it showed an ORF organization reminiscent of the structural gene cluster from temperate dairy phages, but sequence similarities were limited to the tail tape measure protein and three ORFs sharing weak amino acid identity to
B. subtilis phage SPBc2. Upstream of these genes, 60 ORFs were localized which lacked database matches except for two DNA helicases, a phage lysin, and, again, three weak links to SPBc2.
Clostridium tetani strain E88 contains three prophages: phiCT3 is a hybrid combining PBSX-like tail and tail fiber genes with phage 3626-like DNA-packaging and Sfi21-like head genes (Fig.
9). The closest relatives of the DNA replication genes were found in
Listeria phage A118.
PhiCT1 showed a gene map typical of Sfi11-like
Siphoviridae, and some database matches backed this attribution. The structural genes were flanked by genes whose closest relatives were found in insect viruses (up to 39% amino acid identity with
Chilo iridescent virus). Similar close relationships between phage and insect virus proteins were also reported for several dairy phages (
128). SPBc2-like and A118-like integrase genes were detected upstream of the structural genes from phiCT2, suggesting that the putative early genes containing a ferritin-like gene resulted from a recombination event. Also, phiCT2 demonstrated a relatively typical structural gene cluster backed by numerous, but diverse phage links. The structural module is preceded by gene fragments of a resolvase and a transposase followed further upstream by another integrase gene. An intervening gene showed sequence relatedness to toxin A from
Clostridium difficile.