How many genes are there in plants (… and why are they there)?

https://doi.org/10.1016/j.pbi.2007.01.004 Get rights and content

Annotation of the first few complete plant genomes has revealed that plants have many genes. For Arabidopsis, over 26 500 gene loci have been predicted, whereas for rice, the number adds up to 41 000. Recent analysis of the poplar genome suggests more than 45 000 genes, and partial sequence data from Medicago and Lotus also suggest that these plants contain more than 40 000 genes. Nevertheless, estimations suggest that ancestral angiosperms had no more than 12 000–14 000 genes. One explanation for the large increase in gene number during angiosperm evolution is gene duplication. It has been shown previously that the retention of duplicates following small- and large-scale duplication events in plants is substantial. Taking into account the function of genes that have been duplicated, we are now beginning to understand why many plant genes might have been retained, and how their retention might be linked to the typical lifestyle of plants.

Introduction

How to count the number of genes in eukaryotic genomes, much less the number of proteins that they encode, is not self-evident [1, 2, 3, 4]. Nevertheless, in the past few years, much has been learned from major genome annotation projects, and estimates of gene number are generally becoming more realistic [5, 6]. What has become clear from recent plant genome-sequencing projects is that plants seem to have lots of genes: studies often report more than 40 000 [7, 8••, 9•, 10••]. There might be several reasons for this. A first explanation that has been put forward many times has to do with the typical lifestyle of a plant. Plants are sessile and cannot escape enemies or uncomfortable conditions. They are stuck in place and have therefore developed many strategies that improve their chances of survival when faced with grazing herbivores (including insects and snails), pathogens (viral, bacterial and fungal), varying climates, competing neighbour species, and other forms of stress. In addition, because they do not move, many plants have invented either efficient reproductive strategies that rely on external factors, such as wind and water, or ways to build colourful and scented flowers that attract pollen- and nectar-collecting animals to effect efficient mating and seed dispersal. In other words, plants must make tens of thousands of chemical compounds, which they use to ward off competition from other plants, to fight infections, and to respond generally to the environment.

A second reason why plants have so many genes might be gene duplication, or more precisely gene retention following gene duplication. Gene duplication and retention in plants has been extensive and gene families are generally larger in plants than in animals. Furthermore, most (if not all) plant species have experienced at least one (and probably more) whole-genome duplications in their evolutionary past [11, 12•]. Many of the genes created through these major events have been retained in extant plant genomes [13••]. Here, we briefly discuss what is known about the number of genes in those plant species whose whole-genome sequences have been determined, and comment on possible reasons for the large number of genes in these genomes. When discussing gene numbers, we consider protein-coding gene loci rather than the number of transcripts a gene potentially encodes. Non-protein-coding genes are not discussed here, although it has been shown that many regions of the genome that were previously considered inactive or featureless might actually contain many sites of RNA activity [14, 15].

Section snippets

How many genes are there in plants?

The caveats in gene prediction have been extensively discussed elsewhere and are not the subject of this paper. Suffice it to say that, although great progress has been made in the development of sophisticated gene finders and gene-prediction platforms (e.g. [16, 17, 18, 19, 20]), gene prediction and genome annotation are notoriously difficult [3, 4, 5]. Because the annotation community is well aware of this, gene models are continuously being re-evaluated on the basis of novel data and

Why are there so many genes in plants?

One of the most striking features of angiosperms is that many have experienced one or more episodes of polyploidy in their ancestry [12•, 31]. Apart from species that are currently polyploid, which include most crops, others are considered to have paleopolyploid genomes. When the sequencing of the flowering plant Arabidopsis genome started, this model plant, with its small genome, was not expected to be an ancient polyploid. Five years after the release of its genome sequence [24], however,

Conclusions

When discussing genomes with fellow scientists, their first question is usually, ‘How many genes’? The abstracts of papers that publish the first drafts of genome sequences also often mention the estimated number of genes. Our interest in the number of genes in a genome is probably a relic from the days when we were convinced that this number was correlated with the complexity of its host. In the meantime, we have learned better. The fact that man has only about twice the number of genes of the

References and recommended reading

Papers of particular interest, published within the period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

We would like to thank all members of the Bioinformatics and Evolutionary Genomics group for stimulating discussions. KV is a postdoctoral fellow of the Fund for Scientific Research, Flanders.

References (42)

  • P.D. Rabinowicz et al.

    Differential methylation of genes and repeats in land plants

    Genome Res

    (2005)
  • S.B. Cannon et al.

    Legume genome evolution viewed through the Medicago truncatula and Lotus japonicus genomes

    Proc Natl Acad Sci USA

    (2006)
  • G. Tuskan et al.

    The genome of black cottonwood, Populus trichocarpa (Torr. & Gray ex Brayshaw)

    Science

    (2006)
  • S. De Bodt et al.

    Genome duplication and the origin of angiosperms

    Trends Ecol Evol

    (2005)
  • L. Ciu et al.

    Widespread genome duplications throughout the history of flowering plants

    Genome Res

    (2006)
  • S. Maere et al.

    Modeling gene and genome duplications in eukaryotes

    Proc Natl Acad Sci USA

    (2005)
  • C. Lu et al.

    Elucidation of the small RNA component of the transcriptome

    Science

    (2005)
  • T. Schiex et al.

    Eugene: a eukaryotic gene finder that combines several sources of evidence

    Lect Notes Comput Sci

    (2001)
  • G. Parra et al.

    Comparative gene prediction in human and mouse

    Genome Res

    (2003)
  • W.H. Majoros et al.

    TigrScan and GlimmerHMM: two open-source ab initio eukaryotic gene-finders

    Bioinformatics

    (2004)
  • J.E. Allen et al.

    JIGSAW: integration of multiple sources of evidence for gene prediction

    Bioinformatics

    (2005)
  • Cited by (104)

    View all citing articles on Scopus
    View full text