Our perspective on microbial diversity has improved enormously over the past few decades. In large part this has been due to molecular phylogenetic studies that objectively relate organisms. Phylogenetic trees based on gene sequences are maps with which to articulate the elusive concept of biodiversity. Thus, comparative analyses of small-subunit rRNA (16S or 18S rRNA) and other gene sequences show that life falls into three primary domains,
Bacteria,
Eucarya, and
Archaea (
51,
52). Based on rRNA trees, the main extent of Earth’s biodiversity is microbial. Our knowledge of the extent and character of microbial diversity has been limited, however, by reliance on the study of cultivated microorganisms. It is estimated that >99% of microorganisms observable in nature typically are not cultivated by using standard techniques (
1).
Recombinant DNA and molecular phylogenetic methods have recently provided means for identifying the types of organisms that occur in microbial communities without the need for cultivation (see references
1,
20, and
35 for reviews). Results from application of these methods to a number of diverse environments confirm that our view of microbial diversity was limited and point to a wealth of novel and environmentally important diversity yet to be studied (
34). It is the aim of this review to collate, compare, and incorporate the results of the environmental sequence-based studies into the context of known bacterial diversity. We discuss the sequence data at the taxonomic level of the phylogenetic division because divisions constitute first-order clades for describing the breadth of bacterial diversity. Although we have yet to determine even the outlines of the bacterial tree, common threads are beginning to emerge that revise our current views of bacterial diversity and distribution in the environment.
PHYLOGENETIC DIVERSITY IN THE BACTERIAL DOMAIN
In 1987, Woese described the bacterial domain as comprised of about 12 natural relatedness groups, based mainly on analyses of familiar cultivated organisms such as cyanobacteria, spirochetes, and gram-positive bacteria (all of which, based on rRNA sequence divergence, display greater evolutionary depth than plants, animals, and fungi) (
51). These relatedness groups have variously been called “kingdoms,” “phyla,” and “divisions”; we use the latter term. For the purposes of this review we define a bacterial division purely on phylogenetic grounds as a lineage consisting of two or more 16S rRNA sequences that are reproducibly monophyletic and unaffiliated with all other division-level relatedness groups that constitute the bacterial domain. We judge reproducibility by the use of multiple tree-building algorithms, bootstrap analysis, and varying the composition and size of data sets used for phylogenetic analyses. The typical interdivisional rRNA sequence difference is 20 to 25%. For comparison, the 16S rRNAs of
Escherichia coli and
Pseudomonas aeruginosa, both representatives of the γ group of
Proteobacteria, differ overall by about 15%; the 16S rRNAs of
E. coli and
Bacillus subtilis (“low-G+C gram-positive bacterial” division) differ by about 23%.
At the current stage in the phylogenetic classification of
Bacteria, divisions are not consistently named or taxonomically ranked. rRNA-defined divisions are identified by classes (e.g.,
Proteobacteria [
41] and
Actinobacteria [
42]), orders (e.g.,
Thermotogales and
Aquificales), families (e.g.,
Chlorobiaceae), generic names such as the
Nitrospira group (
11), or common names such as the green nonsulfur (GNS) bacteria and low-G+C gram-positive bacteria (
51). Division-level nomenclature has not even been consistent between studies, so some divisions are identified by more than one name. For instance, green sulfur bacteria is synonymous with
Chlorobiaceae; high-G+C gram-positive bacteria is synonymous with
Actinobacteria and
Actinomycetales. Indeed, it probably is premature to standardize taxonomic rankings for bacterial divisions at this point when our picture of microbial diversity is likely still incomplete and the topology of the bacterial tree is still unresolved.
In the past decade the number of identifiable bacterial divisions has more than tripled to about 40 due in significant part to culture-independent phylogenetic surveys of environmental microbial communities (
21,
34). These analyses rely on sequences of rRNA genes obtained by cloning directly from environmental DNA or, as in the majority of studies, after amplification by the PCR (
1,
20,
35). Figure
1 represents the division-level diversity of the bacterial domain as inferred from representatives of the approximately 8,000 bacterial 16S rRNA gene sequences currently available. Although 36 divisions are shown in Fig.
1, several other division-level lineages are indicated by single environmental sequences (
9,
21,
37), suggesting that the number of bacterial divisions may be well over 40. Several of the described divisions are well represented by cultivated strains and were the first to be characterized phylogenetically (
51). The majority of the bacterial divisions, however, are poorly represented by cultured organisms. Indeed, 13 of the 36 divisions shown in Fig.
1 are characterized only by environmental sequences (shown outlined) and so are termed “candidate divisions” to indicate their unsubstantiated status as new bacterial divisions (
21). One of these candidate divisions, OP11, is now sufficiently well represented by environmental sequences to conclude that it constitutes a major bacterial group (see below). Phylogenetic studies so far have not resolved branching orders of the divisions; bacterial diversity is seen as a fan-like radiation of division-level groups (Fig.
1). The exception to this, however, is the
Aquificalesdivision, which branches most deeply in the bacterial tree in most analyses.
BACTERIAL DIVERSITY AND DISTRIBUTION IN THE ENVIRONMENT
Culture-dependent studies indicate that representatives of some bacterial divisions are cosmopolitan in the environment, whereas others appear restricted to certain habitats (
39). Culture-independent studies so far conducted reflect and expand this view. Table
1 summarizes the environmental distribution of sequences by habitat type, compiled from most of the available 16S rRNA-based clonal analyses: 86 studies contributing nearly 3,000 sequences. An expanded version of this table that details division-level representation in the individual studies is available at
http://crab2.berkeley.edu/pacelab/176.htm . Table
1includes only divisions for which representatives have been detected in at least two independent studies and for which at least one near-complete 16S rRNA gene sequence is known. Table
1 is, therefore, not an exhaustive listing of potential division-level diversity for all studies.
Sequence representatives of several bacterial divisions have been identified in a wide range of habitats, suggesting the cosmopolitan or ubiquitous distribution of the corresponding organisms in the environment and, potentially, their broad metabolic capabilities. Some of these cosmopolitan divisions are well-known from cultivation studies; however, others are little known or have not yet been detected by cultivation. Figure
2summarizes the representation of selected cosmopolitan divisions by sequences of cultivated and uncultivated organisms. The
Proteobacteria (purple photosynthetic bacteria and relatives),
Cytophagales(
Bacteroides-Cytophaga-Flexibacter group), and the two gram-positive divisions,
Actinobacteria and low-G+C gram-positive bacteria, are well represented by cultivated organisms and therefore are familiar to us in principle. These four divisions account for 90% of all cultivated bacteria characterized by 16S rRNA sequences and approximately 70% of the environmental sequences collated in Table
1. By contrast, other cosmopolitan divisions revealed by clonal analyses, such as
Acidobacterium,
Verrucomicrobia, GNS bacteria, and OP11, are poorly represented by sequences from cultivated organisms (Fig.
2) and consequently are little known with regard to their general properties. Although many of the bacterial divisions occur widely, others seem to occupy a more limited range of habitats (Table
1). All cultivated representatives of
Aquificales, for instance, are thermophilic hydrogen metabolizers, and all environmental sequences of
Aquificales have been obtained only from high-temperature environments. This suggests a specialized habitat niche for this group. Alternatively, the apparently limited environmental distribution may simply reflect a sampling or methodological artifact and representatives of such divisions may be present in a wider range of habitats, but not yet detected.
The database of environmental rRNA sequences is compromised in resolving some phylogenetic issues by a large number of relatively short sequences. More than half of the sequences collated in Table
1 are less than 500 nucleotides (nt) long, which represents only one-third of the total length of 16S rRNA. This is due to an unfortunate trend in many environmental studies of sequencing only a portion of the gene in the belief that a few hundred bases of sequence data is sufficient for phylogenetic purposes. Indeed, 500 nt is sufficient for placement if some longer sequence is closely related (>90% identity in homologous nucleotides) to the query sequence. In the case of novel sequences, <85% identical to known sequences, however, <500 nt is usually insufficient comparative information to place the sequence accurately in a phylogenetic tree and can even be misleading.
Since all but 4 (
40,
46,
49,
50) of the 86 studies collated in Table
1 were conducted using PCR to amplify rDNA from extracted environmental DNA, the question arises as to whether molecular analyses accurately reflect the division-level diversity that occurs in the environment. It is well established that PCR-associated artifacts such as differential amplification of different rDNA templates (
36,
44), sensitivity to rRNA gene copy number (
12), PCR primer specificity (
48), sensitivity to template concentration (
6), amplification of contaminant rDNA (
45), and formation of chimeric sequences (
23) may skew our assessment of microbial diversity. Most of the studies collated in Table
1, however, analyzed tens to hundreds of clones, so it seems likely that these studies have sampled the main types of sequences in the communities examined. We believe, acknowledging the caveats of the methodology, that the clonal analyses collated in Table
1 probably include the most abundant (metabolically active) bacterial sequence types in the samples analyzed, likely representing the members of the communities that are involved in the principal metabolic activities, such as carbon cycling.
CONCLUSION
Phylogenetic trees based on rRNA sequences show that bacterial diversity is represented by natural relatedness groups, the phylogenetic divisions (
51). About 36 such divisions are currently identifiable. The final extent of division-level diversity in the bacterial domain is still unknown but clearly will be more than 40 divisions. Culture-independent studies have resulted in multiple hits on the majority of described divisions in different habitat types (Table
1), suggesting that the final number of divisions will be within the same order of magnitude as the present estimate.
The molecular analyses of environmental DNA have revealed substantial phylogenetic diversity with little or no representation among organisms previously studied. Because of their abundance and wide distribution, some of the organisms represented by the sequences likely contribute significantly to the global chemical cycles. Descriptions of newly identified, but apparently important, bacterial divisions such as the
Acidobacterium and
Verrucomicrobia, are presently confounded by too few cultivated representatives and only rudimentary descriptions of the strains. Cultivation efforts need to be directed at new representatives of the diverse groups for further study. Continued work to sequence the 16S rDNAs of all deposited type cultures (<50% sequenced to date [
14]) may also result in detection of additional cultivated representatives of newly described divisions. It is a challenge to microbial biologists to determine the physiological diversity and environmental roles of these recently articulated divisions of
Bacteria.
The phylogenetic differences between the bacterial divisions probably are reflected in substantial physiological differences. Some properties, the general properties of
Bacteria, are expected to be distributed among all the divisions. Division-specific novelties are known as well, for instance, endospore formation by the low-G+C gram-positive bacteria or axial filaments (endoflagella) in the spirochetes. Some biochemical properties evidently have transferred laterally among the divisions. For example, the two types of photosynthetic complexes, photosystem I (PSI) and PSII, are each distributed sporadically among the divisions, consistent with lateral transfer (
3). Lateral transfer may also have resulted in combinatorial novelty among the divisions; PSI and PSII, for instance, apparently came together in the cyanobacteria to create oxygenic photosynthesis, with profound consequences to the biosphere (
3). Many more such division-specific qualities and cooperations should become evident at the molecular level as comparative genomics gives us a sharper phylogenetic picture of bacterial diversity.