The First 50 Plant Genomes
All rights reserved. No part of this periodical may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Permission for printing and for reprinting the material contained herein has been obtained by the publisher.
Fifty-five plant genomes have been published to date representing 49 different species (Table 1 includes PubMed IDs for complete reference). What have we learned from the first wave of plant genomes? It has been said that plant genome papers (and genome papers in general) are dry and lack “biology” and that the days of high impact plant genome papers are drawing to a close unless they explore significant biology. However, with each new genome, earlier observations are refined and plant genome papers continue to reveal novel aspects of genome biology. For example, the tomato and banana genome papers refined current thinking on the whole genome duplications (WGD) that shaped dicot and monocot genome evolution (D'Hont et al., 2012; Tomato Genome Consortium, 2012). These observations were enabled not only by high quality genome assemblies but also by a greater number of genomes available for comparisons. In addition, the initial round of plant genomes enabled the first generation of functional genomics that helped to define the roles of hundreds of genes, provided unprecedented access to sequence-based markers for breeding, and provided glimpses into plant evolutionary history. More genomes, representing the diverse array of species in Viridiplantae are still required to gain a full understanding of plant genome structure, evolution, and complexity.
Scientific name | Common name | Year | Type | Division or monocot/dicot | Chr (#) | Size | Assembled | Assem | Gene (#) | Repeat | scaffold N50 | contig N50 | Sequencer types | Journal | PMID | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Mb | % | % | kb | |||||||||||||
1 | Arabidopsis thaliana | arabidopsis | 2000 | model | dicot | 5 | 125 | 115 | 92 | 25,498 | 14 | NA | NA | Sa | Nature | 11130711 |
2 | Oryza sativa | rice | 2002 | crop | monocot | 12 | 430 | 362 | 84 | 59,855 | 26 | 12 | 7 | Sa | Science | 11935017 |
3 | Oryza sativa | rice | 2002 | crop | monocot | 12 | 420 | 389 | 93 | 61,668 | NA | NA | NA | Sa | Science | 11935018 |
4 | Oryza sativa | rice | 2005 | crop | monocot | 12 | 389 | 371 | 95 | 37,544 | 26 | NA | NA | Sa | Nature | 16100779 |
5 | Populus trichocarpa | black cottonwood | 2006 | crop | dicot | 19 | 485 | 410 | 84 | 45,555 | NA | 3100 | 126 | Sa | Science | 16973872 |
6 | Vitis vinifera | grape | 2007 | crop | dicot | 19 | 475 | 487 | 103 | 30,434 | 41 | 2065 | 66 | Sa | Nature | 17721507 |
7 | Physcomitrella patens | moss | 2008 | model | bryophyta | 27 | 510 | 480 | 94 | 35,938 | 16 | 1320 | 292 | Sa | Science | 18079367 |
8 | Vitis vinifera | grape | 2007 | crop | dicot | 19 | 505 | 477 | 95 | 29,585 | 27 | 1330 | 18 | Sa,4 | PlosOne | 18094749 |
9 | Carica papaya | papaya | 2008 | crop | dicot | 9 | 372 | 370 | 99 | 28,629 | 43 | 1000 | 11 | Sa | Nature | 18432245 |
10 | Lotus japonicus | lotus | 2008 | model | dicot | 6 | 472 | 315 | 67 | 30,799 | 56 | NA | NA | Sa | DNA Research | 18511435 |
11 | Sorghum bicolor | sorghum | 2009 | crop | monocot | 10 | 818 | 739 | 90 | 34,496 | 62 | 62,400 | 195 | Sa | Nature | 19189423 |
12 | Cucumis sativus | cucumber | 2009 | crop | dicot | 7 | 367 | 244 | 66 | 26,682 | 24 | 1140 | 20 | Sa,I | Nature Genetics | 19881527 |
13 | Zea mayes | maize | 2009 | crop | monocot | 10 | 2300 | 2048 | 89 | 32,540 | 85 | 76 | 40 | Sa | Science | 19965430 |
14 | Glycine max | soybean | 2010 | crop | dicot | 20 | 1115 | 973 | 87 | 46,430 | 57 | 47,800 | 189 | Sa | Nature | 20075913 |
15 | Brachypodium distachyon | brachypodium | 2010 | model | monocot | 5 | 272 | 272 | 100 | 25,532 | 21 | 59,300 | 348 | Sa | Nature | 20148030 |
16 | Ricinus communis | castor bean | 2010 | crop | dicot | 10 | 320 | 326 | 102 | 31,237 | 50 | 561 | 21 | Sa | Nature Biotechnology | 20729833 |
17 | Malus x domestica | apple | 2010 | crop | dicot | 17 | 742 | 604 | 81 | 57,386 | 67 | 1542 | 13 | Sa,4 | Nature Genetics | 20802477 |
18 | Jatropha curcas | jatropha | 2010 | crop | dicot | NA | 380 | 286 | 75 | 40,929 | 37 | NA | 4 | Sa, | DNA Research | 21149391 |
19 | Theobroma cacao | cocoa | 2011 | crop | dicot | 10 | 430 | 327 | 76 | 28,798 | 24 | 473 | 20 | Sa,4,I | Nature Genetics | 21186351 |
20 | Fragaria vesca | strawberry | 2011 | crop | dicot | 7 | 240 | 210 | 87 | 34,809 | 23 | 1361 | NA | 4,S,I | Nature Genetics | 21186353 |
21 | Arabidopsis lyrata | lyrata | 2011 | model | dicot | 8 | 207 | 207 | 100 | 32,670 | 30 | 24,500 | 227 | Sa | Nature Genetics | 21478890 |
22 | Selaginella moellendorffii | spikemoss | 2011 | non-model | lycopod | NA | 110 | 213 | 193 | 22,285 | 38 | 1700 | 120 | Sa | Science | 21551031 |
23 | Phoenix dactylifera | date palm | 2011 | crop | monocot | 18 | 658 | 381 | 58 | 28,890 | 40 | 30 | 6 | I | Nature Biotechnology | 21623354 |
24 | Solanum tuberosum | potato | 2011 | crop | dicot | 12 | 844 | 727 | 86 | 39,031 | 62 | 1318 | 31 | Sa,4,I | Nature | 21743474 |
25 | Thellungiella parvula | thellungiella | 2011 | model | dicot | 7 | 140 | 137 | 98 | 30,419 | 8 | 5290 | NA | 4,I | Nature Genetics | 21822265 |
26 | Cucumis sativus | cucumber | 2011 | crop | dicot | 7 | 367 | 323 | 88 | 26,587 | NA | 319 | 323 | Sa,4 | PlosOne | 21829493 |
27 | Brassica rapa | chinese cabbage | 2011 | crop | dicot | 10 | 485 | 284 | 59 | 41,174 | 40 | 1971 | 27 | I | Nature Genetics | 21873998 |
28 | Cannabis sativa | hemp | 2011 | crop | dicot | ? | 820 | 787 | 96 | 30,074 | NA | 16 | 2 | 4,I | Genome Biology | 22014239 |
29 | Cajanus cajan | pigeon pea | 2011 | crop | dicot | 11 | 833 | 605 | 72 | 48,680 | 52 | 516 | 22 | Sa,I | Nature Biotechnology | 22057054 |
30 | Mediucago truncatula | medicago | 2011 | model | dicot | 8 | 454 | 262 | 58 | 62,388 | 31 | 1270 | NA | Sa,4,I | Nature | 22089132 |
31 | Setaria italica | setaria | 2012 | model | monocot | 9 | 490 | 423 | 86 | 38,801 | 46 | 1007 | 25 | I | Nature Biotechnology | 22580950 |
32 | Setaria italica | setaria | 2012 | model | monocot | 9 | 510 | 397 | 80 | 35,471 | 40 | 47,300 | 126 | Sa | Nature Biotechnology | 22580951 |
33 | Solanum lycopersicum | tomato | 2012 | crop | dicot | 12 | 900 | 760 | 84 | 34,727 | 63 | 16,467 | 87 | Sa,4,S,I | Nature | 22660326 |
34 | Cucumis melo | melon | 2012 | crop | dicot | 12 | 450 | 375 | 83 | 27,427 | NA | 4680 | 18 | Sa,4,I | PNAS | 22753475 |
35 | Linum usitatissimum | flax | 2012 | crop | dicot | 15 | 373 | 318 | 85 | 43,484 | 24 | 132 | 20 | I | Plant Journal | 22757964 |
36 | Musa acuminata malaccensis | banana | 2012 | crop | monocot | 11 | 523 | 472 | 90 | 36,542 | 44 | 1311 | 43 | Sa,4,I | Nature | 22801500 |
37 | Gossypium raimondii | cotton D | 2012 | crop | dicot | 13 | 880 | 775 | 88 | 40,976 | 60 | 2284 | 45 | I | Nature Genetics | 22922876 |
38 | Azadirachta indica | neem | 2012 | crop | dicot | NA | 364 | NA | NA | 20,169 | 13 | 452 | 1 | 4,I | BMC Genomics | 22958331 |
39 | Hordeum vulgare | barely | 2012 | crop | monocot | 7 | 5100 | 4980 | 98 | 30,400 | 84 | NA | NA | NA | Nature | 23075845 |
40 | Pyrus bretschneideri | pear | 2013 | crop | dicot | 17 | 527 | 512 | 97 | 42,812 | 53 | 541 | 36 | I | Genome Research | 23149293 |
41 | Citrullus lanatus | watermelon | 2012 | crop | dicot | 11 | 425 | 354 | 83 | 23,440 | 45 | 2380 | 26 | I | Nature Genetics | 23179023 |
42 | Triticum aestivum | wheat | 2012 | crop | monocot | 21 | 17,000 | 3800 | 22 | 94,000 | 80 | NA | 1 | 4 | Nature | 23192148 |
43 | Gossypium raimondii | cotton D | 2012 | crop | dicot | 13 | 880 | 738 | 84 | 37,505 | 61 | 18,800 | 136 | Sa,4,I | Nature | 23257886 |
44 | Prunus mume | chinese plum | 2012 | crop | dicot | 8 | 280 | 237 | 85 | 31,390 | 45 | 578 | 32 | I | Nature Communications | 23271652 |
45 | Cicer arietinum | chickpea | 2013 | crop | dicot | 8 | 738 | 532 | 72 | 28,269 | 49 | 39,990 | 24 | Sa,I | Nature Biotechnology | 23354103 |
46 | Hevea brasiliensis | rubber tree | 2013 | crop | dicot | 18 | 2150 | 1119 | 52 | 68,955 | 72 | 3 | NA | 4,S,I | BMC Genomics | 23375136 |
47 | Phyllostachys heterocycla | moso bamboo | 2013 | non-model | monocot | 24 | 2075 | 2051 | 99 | 31,987 | 59 | 329 | 12 | I | Nature Genetics | 23435089 |
48 | Oryza brachyantha | rice relative | 2013 | non-model | monocot | 12 | 300 | 263 | 88 | 32,038 | 29 | 1013 | 20 | I | Nature Communications | 23481403 |
49 | Prunus persica | peach | 2013 | crop | dicot | 8 | 265 | 227 | 86 | 27,852 | 37 | 27,400 | 214 | Sa | Nature Genetics | 23525075 |
50 | Aegilops tauschii | wheat DD | 2013 | crop | monocot | 7 | 4360 | 4244 | 97 | 43,150 | 66 | 58 | 5 | 4,I | Nature | 23535592 |
51 | Triticum urartu | wheat AA | 2013 | crop | monocot | 7 | 4940 | 4660 | 94 | 34,879 | 67 | 64 | 3 | I | Nature | 23535596 |
52 | Nelumbo nucifera | ancient lotus | 2013 | non-model | dicot | 8 | 929 | 804 | 87 | 26,685 | 57 | 3400 | 39 | I | Genome Biology | 23663246 |
53 | Utricularia gibba | bladderwort | 2013 | non-model | dicot | 16 | 77 | 82 | 106 | 28,500 | 3 | 95 | 26 | 4,I | Nature | 23665961 |
54 | Picea abies | norway spruce | 2013 | crop | gymnosperm | 12 | 19,600 | 12,000 | 61 | 28,354 | NA | NA | NA | Nature | 23698360 | |
55 | Capsella rubella | capsella | 2013 | non-model | dicot | 8 | 219 | 135 | 62 | 26,521 | NA | 15,100 | 134 | Sa | Nature Genetics | 23749190 |
- † Abbreviations: Sa, Sanger; 4, Roche/454; S, SOLiD; I, Illumina; NA, not reported in primary publication; kb, kilobases; Mb, megabases; Chr, chromosome; PMID, PubMed ID
It All Started with a Wild Mustard Plant
Since the publication in 2000 of the model Arabidopsis thaliana genome in the journal Nature, the number of genomes has steadily increased, peaking in 2012 with 13 publications (Fig. 1A). At this current trajectory there should be hundreds of plant genome publications over the next several years. Genome papers have been quite formulaic with a description of the assembly, gene numbers, repeats, WGDs, over and under-represented gene families, and finally, some aspect of novel biology, usually with a focus on transcription factors. Genomes have been published in 12 different journals with 38 of the 55 (69%) published genomes appearing in Nature journals (Nature, Nature Genetics, Nature Biotech, and Nature Communications); Science is second with six published genomes. As we see from the most recent publication of the Capsella rubella genome paper, the genome paper is shifting from a formulaic approach to a focus on how the genome elucidates novel biological aspects, such as the evolution of selfing to an outcrossing mating system (Slotte et al., 2013). The trend toward biology is quite positive and necessitated by demands for publication in high impact journals. However, the plant community is just at the beginning of exploring the diversity of plant genomes, and the rigor of the genome paper model with the associated in-depth exploration of genome features provides an essential foundation for the plant research community.
One of the forces driving the rapid increase in fully sequenced plant genomes is the exponential decrease in cost and speed of genome sequencing fueled by high throughput DNA sequencing (Schatz et al., 2012). More than half of the published genomes have been sequenced entirely or partly using Sanger technology (Table 1), which provides long high quality ∼1000 base pair (bp) reads. Sanger sequencing requires a cloning step and is time consuming with an expensive price tag, although the final result is usually high quality depending on the genome. When 454 came onto the scene in the early 2000s the cost of sequencing dropped an order of magnitude (US$200K vs. US$2 M) encouraging the emergence of consortia and funding for the sequencing of new genomes. Grape was the first genome published in 2007 using a combination of 454 and Sanger, and now there are at least 18 genomes that have used varying amounts of 454 sequence. Illumina and SOLiD sequencing changed the paradigm yet again providing very short reads (35–150 bp) at yet another order of magnitude lower cost than 454. Only two genome projects have used SOLiD for genome sequencing (strawberry and tomato); however, Illumina has played an exclusive role in 12 genomes, and was used in combination with other technologies in another 17 genomes. Third generation sequencing technologies such as Pacific Bioscience (PacBio) promise long (>5 kb) single molecule reads that would greatly improve assembly of repeat rich plant genomes. PacBio long reads show great promise in resolving regions that the other sequencing technologies have problems with (skewed GC, homopolymers), but throughput and accuracy are two issues that still require attention. However, new sequencing technologies are only part of the future of plant genomes since tried and true methods, such as BACs (bacterial artificial chromosomes), are finding a place in hybrid sequencing approaches such as in the highly heterozygous pear genome (Wu et al., 2013).
Most of the plants chosen to be sequenced to date fit specific criteria such as size of research community, model organisms or economically important, small genome size, ploidy (diploid), availability of inbred lines (low heterozygosity), access to genetic and physical maps, expressed sequence tags (EST)/transcriptome and other genomic tools. Seventy-three percent (40) of the plant genome publications have been on crop species and some of these crop species double as model systems while several were sequenced purely for research such as Arabidopsis thaliana, Arabidopsis lyrata, Brachypodium distachyon, Physcomitrella patens (moss), and Selaginella moellendorffii (spikemoss). Most (94%) genomes sequenced to date are Angiosperms, of which 36 are dicots and 16 are monocots, while only one gymnosperm (spruce), one bryophyte (moss), and one lycophyta (club-moss) have been sequenced (Table 1). Much of the early decisions about which genomes to sequence were driven by the Department of Energy Joint Genome Institute (JGI), which resulted in the publication and public availability (phytozome) of eleven of the highest quality plant genomes. The Beijing Genome Institute (BGI) has contributed consistently over the years starting with the rice genome, then ten additional genomes primarily based on Illumina technology, and now they have announced a large-scale plant genome sequencing project. However, a “1000 plant genome project” analogous to that in other communities has yet to emerge.
Plant Genomes Both Large and Small
Plant genome sizes span several orders of magnitude from the carnivorous corkscrew plant (Genlisea aurea) at 63 megabases (Mb) to the rare Japanese Paris japonica at 148,000 Mb (Bennett and Leitch, 2011). The smallest published genome is the carnivorous bladderwort (Utricularia gibba) at 82 Mb, while the largest, the Norway Spruce (Picea abies), stands by itself at 19,600 Mb, compared to the second largest of maize at 2300 Mb and the overall median of 480 Mb (Table 1, Fig. 1B). Access to high quality reference genomes confirmed that long terminal repeats (LTRs) retrotransposons are a primarily driver of the dramatic size range in plants (El Baidouri and Panaud 2013). For the large barley genome (5100 Mb), where retrotransposons are abundant and more recently active, a powerful genomics resource was generated through an alternative “gene-ome” approach by anchoring a high quality genespace assembly on a deep physical map merged with high-density genetic maps (International Barley Genome Sequencing Consortium, 2012). In contrast, large gymnosperm genomes have highly diverged ancient repeats, which could make assembling these genomes tractable with current sequencing and assembly technologies (Kovach et al., 2010). The smallest reported conifer genome is the same size as maize and the median genome size is 9700 Mb, which is why a large push to sequence gymnosperms may have to wait for the next wave of sequencing technologies with increased read length and decreased price. As the community moves forward to choose the next round of genomes to sequence, the Kew Genome Size database will continue to provide a rich resource of non-model/non-crop species to investigate (Bennett and Leitch, 2011).
One measure of genome assembly quality is the contiguity or the length of contigs and scaffolds at which 50% of the assembly can be found; this is commonly referred to as N50. Sorghum, Brachypodium distachyon, soybean, and foxtail millet have the top four scaffold contiguities with 62.4, 59.3, 47.8, and 47.3 Mb respectively and all four were sequenced using Sanger as part of the JGI pipeline (Table 1). However, the genome with the ninth largest scaffold N50 is the tomato genome at 16 Mb, which was predominantly assembled using 454. Each scaffold is comprised of thousands of contigs and contig length generally drives the completeness and quality of the gene predictions. Not surprising, the 11 JGI assemblies based on Sanger have the top contig N50 ranging from 347 to 119 kilobases (kb), while the median contig N50 for all assemblies is 25.6 kb. Illumina based assemblies, primarily from BGI, have a similar median length (25.9 kb), which reflects their comprehensive strategy that makes use of different sized sequencing libraries. Another measure of a genome assembly is the amount of the genome captured in the assembly. Of the published genomes, the median genome assembly captured 85% of the predicted genome size, which is usually estimated by flow cytometry or more recently by k-mer depth analysis. The remaining fraction of the genome not assembled generally represents the highly repetitive portion of the genome such as high copy number ribosomal repeats, centromeres, telomeres, and transposable elements. Therefore an average plant genome assembly captures 85% of the genome space in thousands of contigs with an N50 of 20 kb and tens of scaffolds with an N50 of 1 Mb.
Annotation of any genome, but particularly plant genomes, is difficult especially as the definition of what constitutes a gene continues to evolve. Many parts of the genome are ‘expressed’ in that RNAs are formed, but do not correspond to traditional genes in that they are not translated to a protein. However, most annotated plant genomes have between 20,000 and 94,000 genes with a median predicted gene count of 32,605 (Table 1, Fig. 1C). Differences between genomes most likely lies in the tools used for annotation and how relaxed the annotators were in calling genes as well as lineage-specific genes and gene family expansions. Genomes produced by next generation sequencing typically have smaller contig and scaffold sizes that complicate annotation as genes may not exist on single contigs but may be broken across contigs, thus inflating the number of annotated genes (e.g., pigeon pea, Varshney et al., 2012). Further complicating annotation is that there are many expressed non-coding RNAs that are functionally important (Eddy, 2001), but not considered genes in a traditional sense. Small RNA precursors are often not included in a genome annotation, but are important for plant development and silencing of TEs (Arikit et al., 2013). Small RNAs and other non-coding RNAs are often annotated and curated separately from genome annotations in small, boutique databases. Long-term, however, one goal should be to combine these various sources of information into a single database/annotation making it easier for the biologist to pull together relevant information needed for forming hypotheses.
Plant genomes are packed, and often obese, with transposable elements (TEs) (Bennetzen 2000), which contain protein-coding sequences that are often annotated as genes. In rice, for instance, it was estimated that only 40,000 of the more than 55,000 annotated genes are really genes and that the other 10,000 to 15,000 are TEs–usually low copy TEs as high copy elements are relatively easy to find (Bennetzen et al., 2004). TEs include various families that move via copy-and-paste (class I) and cut-and-paste (class II) mechanisms. Copy-and-paste TEs can dramatically increase the size of a genome such as occurred in a relative of rice with a genome nearly two-fold larger than rice (Piegu et al., 2006). Transposon biology is an intriguing area of research and relies on relatively complete genomes so that TEs are captured in sequence contigs and can be accurately annotated. Schemes for classification of TEs have been agreed on (Wicker et al., 2007), but annotation of non-LTR TEs is complicated by the lack of structural clues that allow routine ab initio prediction (El Baidouri and Panaud, 2013). Another complication is that in genomes produced by short read DNA sequencing technology, TEs are often missed in the assembly due to their repetitive nature. Genomes sequenced to date range from 3 to 85% repetitive sequence (Table 1; median 43%), with TEs, specifically cut-and-paste TEs (LTRs), comprising the majority of that sequence. Capturing and annotating these genomic components is important as it is becoming increasingly clear that TEs can be domesticated to function in gene regulation and as structural components of the genome.
Making Genomes “Functional”
One of the key take homes from the first 49 sequenced plant species is that we still have a lot to learn about the organization of genomes, function of genes, and how to characterize the non-coding space. Each new genome uncovers novel genes specific to a species, and a vast amount of non-coding space that requires methods for ab initio and functional annotation. One specific challenge is how we will leverage a growing number of high throughput technologies, otherwise referred to as “omics” approaches, to functionally annotate features of the plant genome. In this special issue of The Plant Genome we highlight several omics studies that have used high throughput approaches such as gen-omics (SNP detection), epigen-omics (methylation) metagen-omics (plant-fungal interactions), and ion-omics (element profiling) to refine our functional understanding of several key crop genomes (Eichten et al., 2013; Roorkiwal et al., 2013; Ruzicka et al., 2013; Ziegler et al., 2013). As we have seen through the model organism and human ENCODE projects, the layering of omics data exponentially increases the value of a reference genome (Celniker et al., 2009; ENCODE Project Consortium 2012).
While reference genomes provide a starting point, or platform for discovery in a specific species, it only captures a brief moment in the history of that species’ diversity and lacks the information content that would enable activities such as molecular breeding and phylogenetic analyses. Roorkiwal et al. (2013, this issue) describe the development of an Illumina BeadXpress SNP genotyping platform for two important crops in the developing world, pigeon pea and chickpea (Roorkiwal et al., 2013). Both pigeon pea and chickpea have lagged behind other crops in their genetic improvement due to a lack of genome and breeding resources that would enable such applications as marker assisted selection (MAS) and phylogenetic screens to identify genetic novelty in wild species. The development of an Illumina BeadXpress SNP genotyping platform provides the opportunity to assess larger populations of plants with an adequate density of markers, which is ideal for breeding applications such as MAS and scans of diversity for disease and abiotic traits.
A prominent feature of plant genomes is their epigenetic landscape. The epigenome encompasses DNA methylation, histone modifications and other modifications not directly encoded in the genome. In general, DNA methylation is thought to mark permanent changes in the genome that must exist over the developmental lifetime of the plant, such as silencing transposable elements in embryonic tissue to protect the fidelity of the genome from transposition. Eichten et al. (2013, this issue) address the question of whether DNA methylation also specifies tissue types in maize. Using genome-wide array and sequencing technologies to assess DNA methylation and gene expression in two maize inbreds, B73 and Mo17, across four tissue types (leaf, immature tassel, embryo and endosperm), the authors find that there are more differentially methylated regions (DMRs) between maize inbreds than in the tissues they sampled (Eichten et al., 2013). The DMRs that were identified between tissue types did not correlate with subsequent expression changes suggesting the DMRs were not in fact functional in specifying tissue type. Despite other plants such as tomato that display tissue and developmentally regulated DMRs (Zhong et al., 2013), this may not be a general phenomenon in other species such as maize, which highlights the need to functionally define genomic elements in specific species.
Genetic screens are still the primary tool for functionally defining features of genomes. Mutant screens have been central in elucidating pathways, uncovering novel functionality of known genes, and allowing the discovery of novel non-coding features such as epigenetic regulation and small RNAs. Ziegler et al. (2013, this issue) describe a powerful high throughput mutant screen for elemental differences between field grown soy plants, which could be applied to any plant species with modestly sized seeds like soy (Ziegler et al., 2013). High throughput elemental profiling, or ionomics, is an emerging omics platform that provides a glimpse of a plant–soil environment and how that plant is accessing that environment. Ionomics screens have been powerful at detecting genetic factors controlling ion uptake but also have started to shed light on root architecture and morphology. Therefore, this high throughput screen, which is agnostic to plant species, has the potential to functionally characterize a plant organ, the root, which has traditionally been difficult to define genetically and molecularly in a field environment.
An almost uncharacterized area of plant biology is the complement of organisms that live mutually with plant communities, or the metagenome. In many plants, the acquisition of inorganic minerals is facilitated by an active network of mycorrhizal associations between soil fungal species and plant roots. However, assessing how these fungal and plant species interact has been hampered by the fact that many fungal species cannot be cultured. The advent of high throughput sequencing has enabled an unprecedented opportunity to identify the genomic changes induced through these communal relationships. Ruzicka et al. (2013, this issue) use high throughput sequencing to characterize the transcriptomes of both the tomato genome and its arbuscular mycorrhizal fungal symbiont in the field (Ruzicka et al., 2013). Instead of culturing the symbiont, a metagenomic sequencing strategy was employed where RNA from a wild-type tomato plant and a mutant for reduced mycorrhizal colonization were sequenced and bioinfomatically separated. This metagenomic analysis revealed a suite of genes for transport and cell wall remodeling required for the symbiotic relationship. Metagenomic sequencing will open up the opportunity to explore additional symbiotic relationships and further functionally characterize aspects of the genome that are not innate to the genome sequence.
Future Plant Genomes
The first ∼50 plant genomes have provided a glimpse at the gene number, types and numbers of repeats, and how genomes grow and contract. However, we are just at the beginning of defining the functional aspects of plant genomes. To reach the goal of breeding better plants for future food, clothing, and energy, we will need to expand both the species sequenced, the number of species re-sequenced, and the type of omics data layered on genomes. Currently only one gymnosperm has been sequenced and no CAM (Crassulacean acid metabolism) photosynthetic plants have been sequenced. While we have come a long way over the past 13 yr since the publication of the Arabidopsis genome, we still have a long way to go before we will be able to engineer the plant of the future.
Current Access to Plant Genomes
General Plant Genome Resources (accessed 24 July 2013)
http://bioinformatics.psb.ugent.be/plaza/
http://mips.helmholtz-muenchen.de/plant/genomes.jsp
http://www.ncbi.nlm.nih.gov/genomes/PLANTS/PlantList.html
http://genomevolution.org/wiki/index.php/Sequenced_plant_genomes
Plant Specific (accessed 24 July 2013)
Supplemental Information Available
Table 1 is also available for download with the online version of this article.