How many genes are there in plants (… and why are they there)?
Introduction
How to count the number of genes in eukaryotic genomes, much less the number of proteins that they encode, is not self-evident [1, 2, 3, 4]. Nevertheless, in the past few years, much has been learned from major genome annotation projects, and estimates of gene number are generally becoming more realistic [5, 6]. What has become clear from recent plant genome-sequencing projects is that plants seem to have lots of genes: studies often report more than 40 000 [7, 8••, 9•, 10••]. There might be several reasons for this. A first explanation that has been put forward many times has to do with the typical lifestyle of a plant. Plants are sessile and cannot escape enemies or uncomfortable conditions. They are stuck in place and have therefore developed many strategies that improve their chances of survival when faced with grazing herbivores (including insects and snails), pathogens (viral, bacterial and fungal), varying climates, competing neighbour species, and other forms of stress. In addition, because they do not move, many plants have invented either efficient reproductive strategies that rely on external factors, such as wind and water, or ways to build colourful and scented flowers that attract pollen- and nectar-collecting animals to effect efficient mating and seed dispersal. In other words, plants must make tens of thousands of chemical compounds, which they use to ward off competition from other plants, to fight infections, and to respond generally to the environment.
A second reason why plants have so many genes might be gene duplication, or more precisely gene retention following gene duplication. Gene duplication and retention in plants has been extensive and gene families are generally larger in plants than in animals. Furthermore, most (if not all) plant species have experienced at least one (and probably more) whole-genome duplications in their evolutionary past [11, 12•]. Many of the genes created through these major events have been retained in extant plant genomes [13••]. Here, we briefly discuss what is known about the number of genes in those plant species whose whole-genome sequences have been determined, and comment on possible reasons for the large number of genes in these genomes. When discussing gene numbers, we consider protein-coding gene loci rather than the number of transcripts a gene potentially encodes. Non-protein-coding genes are not discussed here, although it has been shown that many regions of the genome that were previously considered inactive or featureless might actually contain many sites of RNA activity [14, 15].
Section snippets
How many genes are there in plants?
The caveats in gene prediction have been extensively discussed elsewhere and are not the subject of this paper. Suffice it to say that, although great progress has been made in the development of sophisticated gene finders and gene-prediction platforms (e.g. [16, 17, 18, 19, 20]), gene prediction and genome annotation are notoriously difficult [3, 4, 5]. Because the annotation community is well aware of this, gene models are continuously being re-evaluated on the basis of novel data and
Why are there so many genes in plants?
One of the most striking features of angiosperms is that many have experienced one or more episodes of polyploidy in their ancestry [12•, 31]. Apart from species that are currently polyploid, which include most crops, others are considered to have paleopolyploid genomes. When the sequencing of the flowering plant Arabidopsis genome started, this model plant, with its small genome, was not expected to be an ancient polyploid. Five years after the release of its genome sequence [24], however,
Conclusions
When discussing genomes with fellow scientists, their first question is usually, ‘How many genes’? The abstracts of papers that publish the first drafts of genome sequences also often mention the estimated number of genes. Our interest in the number of genes in a genome is probably a relic from the days when we were convinced that this number was correlated with the complexity of its host. In the meantime, we have learned better. The fact that man has only about twice the number of genes of the
References and recommended reading
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest
Acknowledgements
We would like to thank all members of the Bioinformatics and Evolutionary Genomics group for stimulating discussions. KV is a postdoctoral fellow of the Fund for Scientific Research, Flanders.
References (42)
Annotating the genome of Medicago truncatula
Curr Opin Plant Biol
(2006)- et al.
Identification of transcribed sequences in Arabidopsis thaliana by using high-resolution genome tiling arrays
Proc Natl Acad Sci USA
(2006) - et al.
A recent polyploidy superimposed on older large-scale duplications in the Arabidopsis genome
Genome Res
(2003) - et al.
Evidence that rice, and other cereals, are ancient aneuploids
Plant Cell
(2003) Count me out
Genome Biol
(2000)Computational prediction of eukaryotic protein-coding genes
Nat Rev Genet
(2002)- et al.
Current methods of gene prediction, their strengths and weaknesses
Nucleic Acids Res
(2002) - et al.
EGASP: the human ENCODE genome annotation assessment project
Genome Biol
(2006) - et al.
Consistent over-estimation of gene number in complex plant genomes
Curr Opin Plant Biol
(2004) - et al.
The genomes of Oryza sativa: a history of duplications
PLoS Biol
(2005)
Differential methylation of genes and repeats in land plants
Genome Res
Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes
Proc Natl Acad Sci USA
The genome of black cottonwood, Populus trichocarpa (Torr. & Gray ex Brayshaw)
Science
Genome duplication and the origin of angiosperms
Trends Ecol Evol
Widespread genome duplications throughout the history of flowering plants
Genome Res
Modeling gene and genome duplications in eukaryotes
Proc Natl Acad Sci USA
Elucidation of the small RNA component of the transcriptome
Science
Eugene: a eukaryotic gene finder that combines several sources of evidence
Lect Notes Comput Sci
Comparative gene prediction in human and mouse
Genome Res
TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders
Bioinformatics
JIGSAW: integration of multiple sources of evidence for gene prediction
Bioinformatics
Cited by (104)
-
Quantitative genetics of pleiotropy and its potential for plant sciences
2022, Journal of Plant PhysiologyCitation Excerpt :This influenced biological theories of the direction of selection (Via and Hawthorne, 2002; Latta and Gardner, 2009), adaptation (Orr, 2000; Thoen et al., 2017; Brown and Kelly, 2018), and speciation (Shaw et al., 2011; Yamamichi and Sasaki, 2013) notably. The idea of pleiotropy makes especially sense when considering that the average number of genes in flowering plants ranges between 20,000 and 40,000 but the number of phenotypes is far greater (Sterck et al., 2007; Qin et al., 2015). Thus, at least some genes must influence multiple of them in parallel to make an organism work.
-
Identification of Drought Stress Genes Expressed in Jatropha curcas by Using RNA Sequencing
2024, AIP Conference Proceedings -
Advances in integrated high-throughput and phenomics application in plants and agriculture
2022, Principles and Practices of OMICS and Genome Editing for Crop Improvement