Volume 165, Issue 3 p. 658-661
Free Access

Ancient duplication of cereal genomes

Andrew H. Paterson

Corresponding Author

Andrew H. Paterson

Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA;

(Author for correspondence: tel +1 706 583 0162; fax +1 706 583 0160; email [email protected])Search for more papers by this author
John E. Bowers

John E. Bowers

Plant Genome Mapping Laboratory, University of Georgia, Athens, GA 30602, USA;

Search for more papers by this author
Yves Van de Peer

Yves Van de Peer

Department of Plant Systems Biology, Ghent University, B-9052 Ghent, Belgium.

Search for more papers by this author
Klaas Vandepoele

Klaas Vandepoele

Department of Plant Systems Biology, Ghent University, B-9052 Ghent, Belgium.

Search for more papers by this author
First published: 03 February 2005
Citations: 32

The discovery of multiple ancient polyploidization events in Arabidopsis (Vision et al. 2000; Simillion et al., 2002; Bowers et al., 2003) foreshadowed the finding that Oryza (rice), too, had undergone extensive ancient duplication of its chromatin. Although the possibility of duplication in the rice genome had been suggested long ago, early studies of the sequence raised questions about whether rice was an ‘ancient aneuploid’ (Vandepoele et al., 2003) or paleo-polyploid across its entire genome (Paterson et al., 2003). In this issue, Wang et al. (pp. 937–946), contribute to a resolution of this question by using an independent assembly of a divergent rice subspecies, generally supporting the occurrence of a whole-genome duplication – although some questions still remain unanswered. Using independent dating approaches, Wang et al. also support prior estimates (Paterson et al., 2004) that this event occurred about 70 million yr ago, suggesting that it has affected the genome organization of virtually all of the world's cereal crops.

‘Many more, if not all, higher plant species, considered as diploids because of their genetic and cytogenetic behaviour, are actually ancient polyploids’

Polyploidy and the angiosperms

Polyploidy, the merger of multiple chromosome sets in a common nucleus, ranks among the most important of evolutionary mechanisms affecting angiosperm genomes. It has long been suspected that many angiosperms were ancient polyploids (Stebbins, 1966). By contrast, the relative scarcity of polyploidy in dioecious organisms (such as most animals, but few plants) is thought to be related to a need for balanced gene dosage between autosomal loci and the nondegenerated members of heteromorphic sex chromosome sets (Orr, 1990). The discovery that one polyploidization event predates the monocot–eudicot divergence arguably suggests that all angiosperms may be ancient polyploids (Bowers et al., 2003). The discovery of several additional events in the same lineage (Bowers et al., 2003) raises the as yet unanswered question of whether polyploidy might truly be cyclical, with distinct advantages that are gradually eroded by ‘diploidization’ and divergence of suites of duplicated genes.

The controversy about rice, and why it is important

As the first representative of the Poaceae (cereals), a plant family that provides the majority of calories consumed by humans together with a growing share of our fuel and also many other ‘ecosystem services’ such as erosion control, duplication analysis of the Oryza (rice) sequence was of special importance. It had long been known that rice chromosomes occasionally paired with seemingly incorrect partners (Lawrence, 1931), and had been shown by restriction fragment length polymorphism (RFLP) mapping that rice chromosomes 1–5 (Kishimoto et al., 1994) and 11–12 (Nagamura et al., 1995) each contained duplicated gene pairs in what appeared to be collinear orders. Initial analysis of genomic shotgun sequence suggested a widespread propensity for gene duplication that was consistent with a large-scale event perhaps 40–50 million yr ago (Goff et al., 2002).

In view of this background, it was no surprise that two early investigations of partial assemblies for Oryza sativa (L.) ssp. japonica each suggested ancient duplication of rice chromosomes. However, the findings of the two groups differed in key ways, with one reporting duplication over only about 15% of the genome (‘ancient aneuploidy’; Vandepoele et al., 2003), and the other suggesting a probable whole-genome event based on duplication over about 62% of the genome (Paterson et al., 2003). The importance of resolving this difference was highlighted by the finding that this event predated the divergence of the major cereals from one another (Paterson et al., 2004), and thus it is a common factor affecting the genome structure of many of the world's leading crops.

Perspective from a second subspecies

In this issue, Wang et al. describe analysis of an independent and advanced sequence assembly from O. sativa ssp. indica, a close relative of ssp. japonica that has been the target of a whole-genome shotgun effort (Yu et al., 2002). Across 370 Mb assembled into 12 chromosomes, Wang et al. find 10 duplicated blocks that contain 47% of the predicted transcriptome. While the largest of these, between chromosomes 2 and 4, was found in both earlier studies (Paterson et al., 2003; Vandepoele et al., 2003), smaller ones such as between chromosomes 1 and 5 escaped detection by Vandepoele et al. (2003). Wang et al. corroborated the estimate of 70 million yr ago for the antiquity of the rice event (Vandepoele et al., 2003; Paterson et al., 2004) based on analysis of rice/maize homologs, and suggested that the extent of gene loss has been somewhat less (32–65%) than found in the earlier studies (∼80%). Finally, Wang et al. tentatively assigned a date of about 5 million yr ago to a duplication of chromosomes 11 and 12, chromosomes that had not yet been adequately sequenced for Vandepoele et al. (2003) to address, and which Paterson et al. (2003, 2004) identified as more recently duplicated than the remainder of the genome but did not estimate a date.

Admirably, while Wang et al. was in review, concerns about the diversity of findings had motivated reanalysis of more advanced rice assemblies (tigr v1.0) using less stringent thresholds for inferring significance. These analyses made it clear that the fraction of the rice genome found in duplicated blocks is indeed appreciably larger than the 15% reported in Vandepoele et al. (2003) and agrees more closely with Wang et al. and Paterson et al. (2004). Yet another independent analysis of the japonica sequence arrived at a similar conclusion (Guyot & Keller, 2004). These re-analyses also support the finding that, apart from a continuous mode of (tandem) duplication, both a recent small-scale (i.e. chromosome 11–12) and an older large-scale duplication event shaped the rice genome. The observation that approximately 7% of the rice genome is located in overlapping block duplications suggests that older, perhaps cryptic cycles of polyploidy (such as the γ event thought to be shared by all angiosperms (Bowers et al., 2003) may also have shaped the genome.

Looking back at ancient duplications from the future

While the evidence for a large-scale, if not genome-wide, duplication event in a common ancestor of the cereals is growing stronger, many questions remain. Although the large-scale duplication event (i.e. duplicated blocks with 0.95 > Ks > 0.78) accounts for the majority of all duplicated blocks (∼62% of all anchor points) in the rice genome, these blocks cover less than half of the physical rice genome. Consequently, it appears that these blocks may not actually be ‘uniformly over 10 of all 12 chromosomes’ as suggested (Wang et al.). It will be of much interest to shed light on whether there exists differential preservation of ancient gene orders in different regions of the genome, and what factors might contribute to it.

Furthermore, diploidization processes appear to vary widely in different taxa. By many measures, rice and Arabidopsis are thought to have experienced genome duplications at similar times (although even this remains controversial, with different authors supporting estimates of from 30 to 100 million yr). However, age distributions of duplicated genes are considerably different in the two taxa (fig. 5 in Vandepoele et al., 2003). In fact, one recent study of age distributions fails to detect evidence of duplication in rice (Blanc & Wolfe, 2004). One possible explanation may be that the rate of gene loss is much higher in rice than in Arabidopsis (but the rates of gene loss and the fraction of genes in tandem duplications in Arabidopsis and rice do not seem to be significantly different at first sight; Simillion et al., 2004; Wang et al.). Finally, Wang et al. are among the first to try to mitigate the effect of the negative correlation of Ks with guanine+cytosine (GC) content, an issue that is of special importance in high-GC lineages such as the rice. In any case, these additional incongruities need to be resolved before we can conclude with certainty that rice underwent a truly whole-genome duplication event.

The archaeology of plant genome duplication is only just emerging from infancy. While there naturally remains room for improvement of methodology associated with detecting paleopolyploidy, a more seminal need is better understanding of the fates of individual genes and interacting sets of genes following polyploidy. While many new data from microbes such as yeast are shedding valuable light on the roles and consequences of gene duplication, population genetic theory predicts that these consequences should be very different in organisms with larger body size and associated smaller effective population sizes (Lynch & Conery, 2003). The propensity of the angiosperms for polyploidy, together with rapidly growing genomic data and tools, makes them an especially attractive system in which to explore consequences of polyploidy that may be more likely to extend to most crown eukaryotes.

Acknowledgements

AHP and JEB thank the US National Science Foundation and USDA National Research Initiative for financial support. We all thank many scientists who deposit and curate data in public databases, especially GenBank, the International Rice Genome Sequencing Program and tigr.