Volume 61, Issue 4 p. 591-599
Free Access

Sequence-level comparative analysis of the Brassica napus genome around two stearoyl-ACP desaturase loci

Kwangsoo Cho

Kwangsoo Cho

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK

Present address: Highland Agriculture Research Centre, Rural Development Administration, Pyeongchang 232-955, Korea.

Search for more papers by this author
Carmel M. O’Neill

Carmel M. O’Neill

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK

Search for more papers by this author
Soo-Jin Kwon

Soo-Jin Kwon

National Academy of Agricultural Science, Rural Development Administration, Suwon 441-707, Korea

Search for more papers by this author
Tae-Jin Yang

Tae-Jin Yang

Department of Plant Science, Plant Genomics and Breeding Institute, and Research Institute for Agriculture and Life Sciences, College of Agriculture and Life Sciences, Seoul National University, Seoul 151-921, Korea

Search for more papers by this author
Andrew M. Smooker

Andrew M. Smooker

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK

Search for more papers by this author
Fiona Fraser

Fiona Fraser

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK

Search for more papers by this author
Ian Bancroft

Corresponding Author

Ian Bancroft

Department of Crop Genetics, John Innes Centre, Norwich Research Park, Colney, Norwich NR4 7UH, UK

(fax +44 1603 450045; e-mail [email protected]).Search for more papers by this author
First published: 05 February 2010
Citations: 18

Summary

We conducted a sequence-level comparative analyses, at the scale of complete bacterial artificial chromosome (BAC) clones, between the genome of the most economically important Brassica species, Brassica napus (oilseed rape), and those of Brassica rapa, the genome of which is currently being sequenced, and Arabidopsis thaliana. We constructed a new B. napus BAC library and identified and sequenced clones that contain homoeologous regions of the genome including stearoyl-ACP desaturase-encoding genes. We sequenced the orthologous region of the genome of B. rapa and conducted comparative analyses between the Brassica sequences and those of the orthologous region of the genome of A. thaliana. The proportion of genes conserved (∼56%) is lower than has been reported previously between A. thaliana and Brassica (∼66%). The gene models for sets of conserved genes were used to determine the extent of nucleotide conservation of coding regions. This was found to be 84.2 ± 3.9% and 85.8 ± 3.7% between the B. napus A and C genomes, respectively, and that of A. thaliana, which is consistent with previous results for other Brassica species, and 97.5 ± 3.1% between the B. napus A genome and B. rapa, and 93.1 ± 4.9% between the B. napus C genome and B. rapa. The divergence of the B. napus genes from the A genome and the B. rapa genes was greater than anticipated and indicates that the A genome ancestor of the B. napus cultivar studied was relatively distantly related to the cultivar of B. rapa selected for genome sequencing.

Introduction

The Brassicaceae consist of about 340 genera and over 3350 species (Johnston et al., 2005). They are important not only as crops but also as a resource for studying the impacts of polyploidy in plants (O’Neill and Bancroft, 2000; Axeisson et al., 2001; Geddy and Brown, 2007;Rana et al., 2004; Parkin et al., 2005; Lysak et al., 2005). Polyploidy is a prevalent evolutionary mechanism within angiosperms and it has been estimated that 30–70% of modern plant species have evolved through a polyploid ancestor (Leitch and Leitch, 2008). Brassica napus is an allotetraploid (2n = 4x = 38) that arose, probably within the last 10 000 years, by hybridization between unknown lines of Brassica rapa (which contains the Brassica A genome) and Brassica oleracea (which contains the Brassica C genome). The parental genomes contribute sets of homoeologous loci. Studies have shown that the constituent chromosomes show no evidence of rearrangement, with linkage maps of the B. napus A genome and the B. rapa genome being collinear, and those of the B. napus C genome and the B. oleracea genome being collinear (Parkin et al., 1995; Parkin and Lydiate, 1997). Brassica rapa and B. oleracea show extensive genome triplication, so are themselves derived from a polyploid ancestor. In this case, chromosomal rearrangements have occurred (Lysak et al., 2007; Schranz et al., 2006). Evidence of these rearrangements can be readily identified in the genome of B. napus. For example, Parkin et al. (2005) reported that 21 syntenic blocks (with an average size of about 4.8 Mb in Arabidopsis thaliana) have been maintained since the divergence of the Arabidopsis and Brassica lineages, which occurred around 20 million years ago (Yang et al., 1999). Recent studies conducted at the sequence level have greatly increased our understanding of the details of genome evolution in B. rapa and B. oleracea since divergence from the Arabidopsis lineage (Yang et al., 2006; Town et al., 2006), but sequence-based studies on the scale of complete bacterial artificial chromosomes (BACs) have not been reported for B. napus.

In addition to polyploidization, plant genome expansion can also occur by the amplification of transposable elements and by segmental duplication events (including tandem duplication of genes). The accumulation of retroelements and other transposable elements has played a major role in genome expansion during genome evolution (Fedoroff, 2000). Retroelements have a transcribed RNA intermediate that is converted into DNA by reverse transcriptase and integrated into the host genome. This creates new polymorphisms, increasing retrotransposon numbers and potentially causing genomic rearrangements, such as chromosome breaks and translocations (Kalendar et al., 2000; Fedoroff, 1989). Retroelements are extremely abundant in some species; for example, they occupy about 80% of the maize genome. However, they are much less abundant in others, such as A. thaliana, in which they occupy <10% of the genome (Arabidopsis Genome Initiative, 2000).

Fatty acids are important both as storage materials and in signalling pathways in plants (Kachroo et al., 2003; Chaturvedi et al., 2008). Although B. napus includes some vegetable crops, the most important B. napus crop is oilseed rape. This is the principal oil crop in temperate regions, with diverse applications including both food and industrial applications (including bio-energy). The fatty acid composition of plant oils determines their physical properties and the end-use of the oil. Control of the extent of desaturation of constituent fatty acids (i.e. the number of carbon-to-carbon double bonds present) is particularly important. For example, the omega 3 fatty acid linolenic acid (ALA) is an essential nutrient in the human diet and is advantageous to health. However, it is thermally unstable (and forms trans-fatty acids that are thought to be detrimental to health if present in oils that are hydrogenated to form components of spreads; Oomen et al., 2001) so for most purposes the aim is to minimize its content.

The stearoyl-ACP desaturase (SAD) represents the first step in the desaturation reaction pathway (Knutzon et al., 1992). In Arabidopsis, the SAD gene family consists of seven conserved members (At1g43800, At2g43710, a tandem triplet At3g02610, At3g02620, At3g02630 and a tandem pair At5g16230, At5g16240), which exhibit tissue-specific differences in expression (Kachroo et al., 2007). Slocombe et al. (1994) reported that B. napus has four SAD genes, two of which originated from the B. rapa ancestor and the other two from the B. oleracea ancestor, based upon hybridization with a probe with homology to At2g43710. In this study, we report the sequence-level comparative analysis between BAC clones containing the B. napus gene family orthologous to the SAD genes on Arabidopsis chromosome 3, which we refer to as SAD3, and the corresponding genomic regions in B. rapa and A. thaliana. This family of SAD genes is of particular interest because it is present in A. thaliana as a tandem triplet, thus it represents three of the seven members of the overall SAD gene family in A. thaliana. Our study aimed to enable us to make progress towards understanding the basis of the apparent reduction in SAD gene copy number in B. napus compared with A. thaliana and to investigate the extent of divergence, at the sequence level, that is likely to be encountered when exploiting the B. rapa genome sequence for applications in oilseed rape.

Results

BAC library construction and analysis

In order to conduct the comparative sequence analysis, we constructed a genomic BAC library from the doubled haploid B. napus (oilseed rape) variety Tapidor using the binary vector pYLTAC7 (Liu et al., 1999). The library was constructed using Sau3AI-digested DNA from purified nuclei, as described previously (O’Neill and Bancroft, 2000). It was designated JBnY and 73 728 clones were arrayed into 192 × 384-well microtitre plates. In addition, we used a pre-existing genomic BAC library, JBnB, from the same variety (Rana et al., 2004), which had also been constructed using Sau3AI-digested DNA from purified nuclei.

To evaluate the average insert size over the entire JBnY library, one well was randomly chosen from each of the microtitre plates in which the libraries were arrayed. DNA was isolated from the corresponding clones, digested with I-SceI to release the clone inserts, and resolved by pulsed field gel electrophoresis (PFGE) to estimate insert sizes. Of the 192 clones analysed, 34 clones (18%) apparently did not have an insert (or had one smaller than 25 kb). Of those with measurable insert sizes, the mean insert was 90 kb. Therefore we estimate that the library should provide about 4.5-fold redundant representation of the approximately 1200-Mb genome of B. napus.

Identification and sequencing of B. napus and B. rapa BAC clones containing stearoyl-ACP desaturase-encoding genes

A probe with similarity across the family of stearoyl-ACP desaturase-encoding genes (derived from At2g43710) was designed using A. thaliana genomic sequences, amplified by PCR from A. thaliana genomic DNA and used to screen, by colony hybridization, the JBnY and JBnB BAC libraries using methodology reported previously (O’Neill and Bancroft, 2000). DNA was prepared from 96 clones identified by colony hybridization, cleaved with HindIII and hybridized on Southern blots using a probe specific to the stearoyl-ACP desaturase-encoding tandem triplet of genes At3g02610/At3g02620/At3g02630. Eighteen clones hybridized to this probe and could be classified into three groups. Group 1 (JBnY117L03, JBnB039G20) and group 2 (JBnY049H10, JBnY092H05, JBnB004E10, JBnB056I05, JBnB073E11) showed hybridization of a distinct size of band across multiple clones (a different size for each group). Group 3 (JBnY003H12, JBnY004D16, JBnY053C15, JBnY065P11, JBnY078M13, JBnY148G02, JBnY167K05, JBnY171H08, JBnY186O19, JBnB088N23, JBnB157A20) showed hybridization of bands corresponding to molecules of high molecular weight, but all of slightly different mobility. The remaining clones showed no hybridization.

To confirm the results of the Southern blot analysis, all of the hybridizing BACs from the JBnY library were end-sequenced. The sequence data are shown in Table S1 in Supporting Information. We had previously end-sequenced many of the BACs from the JBnB library. The BAC end-sequences of all clones were assessed for alignment with the genome of A. thaliana using BLAST. The results are summarized in Table S2. One clone in group 1, JBnY117L03, showed alignment between both of its end sequences and the A. thaliana genome (to gene models At3g02520 and At3g03090), confirming that it contained the targeted region of the B. napus genome. The other BAC in group 1, JBnB039G20, showed similarity of one end sequence to the same region of the A. thaliana genome. Three BACs from group 2 showed similarity of one end sequence with the same region of the A. thaliana genome, confirming that this group also represents the targeted region of the B. napus genome. However, none of the clones from group 3 showed such similarity, so this heterogeneous group does not represent the targeted region of the genome. Nine of the BACs for which colony hybridization was observed using the probe with similarity to all SAD genes of A. thaliana, but which did not hybridize with the probe specific to the SAD genes on A. thaliana chromosome 3, had end-sequence matches to regions of A. thaliana chromosome 2 (JBnB021A07, JBnB044L12, JBnB069N03, JBnB112F03, JBnB139A22, JBnB140H19) or chromosome 5 (JBnB004A17, JBnB019O01, JBnB161O04) that contain SAD genes (At2g43710 and At5g16230/40, respectively), so contain the B. napus members of these other SAD gene families. One BAC from group 1 (JBnY117L03) and one from group 2 (JBnY092H05) were sequenced to finished standard (apart from one region in JBnY117L03 and two regions in JBnY092H05 that could not be sequenced) by the Beijing Genomics Institute and GATC Biotech, respectively, as contracted services. Brassica rapa BAC clone KBrB003I01, which encompasses the equivalent genomic region from B. rapa ssp. pekinensis cultivar Chiifu, had already been sequenced as part of the international Brassica rapa Genome Sequencing Project (http://brassica.bbsrc.ac.uk/; http://brassica-rapa.org). The sequences of these three BACs are available from the public databases as accessions AC238677 (KBrB003I01), FP583353 (JBnY117L03) and FP579003 (JBnY092H05).

Sequence annotation

Gene annotation in the B. napus sequences was conducted primarily using the gene prediction program FGENESH. Simple sequence repeats and transposons were identified by RepeatMasker (http://www.repeatmasker.org/) followed by manual inspection. Twenty-six gene models were constructed for the 82.9-kb sequence of JBnY117L03. Two of these represented transposable elements and the remaining 24 showed sequence similarity to gene models At3g02520 to At3g03090 in a 174-kb region of A. thaliana chromosome 3. Annotation of the corresponding B. rapa sequences, as represented in BAC KBrB003I01, produced 25 gene models in a 79.8-kb region, 24 of which showed sequence similarity to gene models in the same region of A. thaliana chromosome 3 and one being a novel sequence (KBrB003I01_24). Annotation of the sequence of JBnY092H05 produced 19 gene models, 17 of which showed similarity to A. thaliana genes in the region At3g02380 to At3g02750, with the remaining two showing similarity to transposable elements. The inferred coding sequences of all gene models are shown in Table S3.

Mapping JBnY117L03 and JBnY092H05 in the B. napus genome

The end sequences of the BAC clone JBnY117L03 were used to design primers for the amplification, by PCR, of the corresponding regions from the genomes of B. napus lines that were the parents of available mapping populations. Amplicons were separated by polyacrylamide gel electrophoresis (PAGE) with silver staining. A polymorphism was detected in one of the amplicons between the parental lines of a mapping population, designated ‘QDH’, for which a linkage map has been developed and aligned with the conventional linkage group nomenclature for Brassica species. Population QDH was derived from a cross between cultivar Tapidor and a resynthesized B. napus that was established by hybridization of B. rapa‘Rapa 29′ and B. oleracea‘Atlantica’ (unpublished). This polymorphic marker was mapped in the QDH population to linkage group A3, as shown in Figure 1.

Details are in the caption following the image

Linkage mapping of JBnY117L03.
The oval symbol denotes the marker on Linkage Group A3 that was derived from a bacterial artificial chromosome (BAC) end sequence from JBnY117L03.

As the A and C genomes of B. napus are very closely related, resulting in the amplification of many marker amplicons from both genomes, we sought to confirm that sequences within JBnY117L03 originated from the A genome and assess the expectation that the sequences in JBnY092H05 originated from the homoeologous region of the C genome. To do this, we selected the sequences of gene models in B. rapa BAC KBrB003I01 and public B. oleracea genomic survey sequence (GSS) data with similarity to A. thaliana genes At3g02540, At3g02555, At3g02560, At3g02600 and At3g02650, and quantified the similarity of these to sequences in each of JBnY117L03 and JBnY092H05. The results are shown in Table 1. They show that the sequences in JBnY117L03 are more similar to the sequences of the B. rapa genes, confirming that this BAC represents a region of the Brassica A genome, whereas the sequences in JBnY092H05 are more similar to the B. oleracea GSS, confirming that this BAC represents a region of the Brassica C genome. As the linkage group C3 of B. napus is homoeologous to linkage group A3, we can infer that JBnY092H05 maps to linkage group C3.

Table 1. Determination of the genome of origin for sequences in the bacterial artificial chromosome (BAC) clones JBnY117L03 and JBnY092H05
Arabidopsis thaliana gene model Brassica oleracea GSS % Identity Brassica rapa gene model % Identity
JBnY117L03 JBnY092H05 JBnY117L03 JBnY092H05
At3g02540 BH495733 95.1 98.4 KBr_02 97.8 96.0
At3g02555 BH935696 92.1 99.0 KBr_03 100.0 92.5
At3g02560 BH735152 91.3 96.0 KBr_04 99.7 95.8
At3g02600 BH422105 91.9 99.4 KBr_08 99.5 98.0
At3g02650 BH968242 94.7 99.7 KBr_10 99.0 92.5
Mean ± SD 93.0 ± 1.7 98.5 ± 1.5 Mean ± SD 99.2 ± 0.9 95.0 ± 2.4
  • GSS, genomic survey sequence.

Comparative analysis of genome sequences in B. napus, B. rapa and A. thaliana

Dot plot analysis of the B. napus sequences showed extensive sequence conservation, although this is disrupted by numerous insertion–deletion events, as shown in Figure 2. The conserved regions of the genome sequences of B. rapa (KBrB003I01), B. napus (JBnY117L03 for the A genome and JBnY092H05 for the C genome) and A. thaliana are illustrated in Figure 3. Of the 25 gene models in this region of the A. thaliana genome represented in all three Brassica BAC clones (i.e. from gene models At3g02520 to At3g02750), 14 (56%) have conserved gene models in the Brassica genomes. The stearoyl-ACP desaturase, SAD3, which is present as a tandem array of three genes in A. thaliana (At3g02610, At3g02620, At3g02630), was found to be present as a single copy in the Brassica genomes (JBnY117L03_11 originating from the B. napus A genome, JBnY092_5 originating from the B. napus C genome, and KBrB003I01_09 in B. rapa). Conversely, a C-5 sterol desaturase (At3g02590), which is present as a single copy in A. thaliana, was found to be present as a tandem pair of genes in both B. napus genomes and in B. rapa. As shown in Figure 4, this tandem pair of genes is uninterrupted in both B. rapa (KBr B003I01_6 and KBr B003I01_7) and the B. napus C genome (JBnY092H05_07 and JBnY092H05_08), but only one copy is intact in the B. napus A genome (JBnY117L03_6), the other having been split in two (JBnY117L03_7 and JBnY117L03_9) by the insertion of a long terminal repeat (LTR) retrotransposon (JBnY117L03_8). This retrotransposon is of a type 1 copia-like element with identical 361-bp LTRs and a single exon encoding 1471 amino acids with reverse transcriptase and integrase domains. The Brassica orthologues of A. thaliana gene models At3g02640 and At3g02650 are consistently fused into single models. Thus, for the overlap region, the net effect is that the set of 14 A. thaliana gene models corresponds to a total of 12 gene models in each of the Brassica genomes, as shown in Figure 3.

Details are in the caption following the image

Dot-plot analysis of the sequences of the Brassica napus SAD3-containing bacterial artificial chromosomes (BACs) JBnY117L03 and JBnY092H05.

Details are in the caption following the image

Annotated genes in orthologous genomic regions of Brassica napus, Brassica rapa and Arabidopsis thaliana.
Conserved genes are shown in black and connected; gene models specific to B. napus, B. rapa and A. thaliana are shown in blue, red and green, respectively. The gene models identified as transposable elements are indicated with ‘TE’.

Details are in the caption following the image

Multiple dot-plot analysis of Arabidopsis thaliana delta 7-sterol-C5-desaturase (At3g02590) against Brassica rapa and Brassica napus sequences.
The arrow indicates the hypothetical direction of transcription of the genes and the retrotransposon.

The genomic region in the B. napus A genome aligning to the A. thaliana region containing gene models At3g02520 to At3g03090, at 82.9 kb, is slightly expanded relative to the corresponding region in B. rapa, which is 79.8 kb. In addition to the retrotransposon JBnY117L03_8, the B. napus sequences contain a DNA transposon, JBnY117L03_24. A further expansion comes from simple sequence repeat (SSR) motifs, six of which are present and total 472 bp in length in B. napus, whereas only five are present, totalling 130 bp in length, in B. rapa.

Divergence of the B. napus A and C genomes from that of B. rapa

In order to estimate the time of divergence of the studied genome segments in B. napus and that of the Brassica species for which genome sequencing is underway, B. rapa, we analysed the sequence divergence of the subset of 10 genes that were conserved between both B. napus genomes, B. rapa and A. thaliana, and appeared to represent complete proteins (albeit transposon-interrupted in the case of one orthologue of At3g02590 in BAC JBnY117L03). We calculated the proportions of conserved nucleotide sequences in the coding regions, as shown in Table 2 and the Ks values (number of synonymous nucleotide substitutions), as shown in Table 3. The nucleotide identity between coding sequences in B. napus genes and their A. thaliana orthologues was found to be 84.2 ± 3.9% for JBnY117L03 and 85.8 ± 3.7% for JBnY092H05, which is consistent with similar findings for other Brassica species. Similarly, the Ks values of 0.520 ± 0.148 and 0.486 ± 0.096 for JBnY117L03 and JBnY092H05, respectively, which indicates a time since divergence of the lineages of 17.3 ± 4.9 and 16.2 ± 0.032 Myr, respectively, using a mutational rate of 1.5 × 10−8 synonymous substitutions per site per year (Koch et al., 2000), is consistent with estimates derived using other Brassica species. The nucleotide identity between coding sequences in B. napus genes and their B. rapa orthologues was found to be 97.5 ± 3.1% and 93.1 ± 4.9% for JBnY117L03 and JBnY092H05, respectively. The corresponding Ks values of 0.056 ± 0.071 and 0.175 ± 0.076 for JBnY117L03 and JBnY092H05, respectively, indicate the length of time since divergence of the B. napus A genome from that of B. rapa as 1.9 ± 2.4 Myr, and the length of time since divergence of the B. napus C genome from that of B. rapa as 5.8 ± 2.5 Myr.

Table 2. Relatedness of 10 genes conserved between Brassica napus, Brassica rapa and Arabidopsis thaliana: coding sequence (CDS)
A. thaliana orthologue CDS identity (%)
JBnY117L03- JBnY092H05-
B. rapa A. thaliana B. rapa A. thaliana
At3g02520 89.93 84.38 97.23 89.27
At3g02550 100 76.41 87.2 78.99
At3g02560 99.65 88.66 96.34 88.31
At3g02580 98.23 88.02 97.16 87.19
At3g02590-1 99.15 85.16 82.38 82.72
At3g02590-2 94.17 81.07 97.04 86.11
At3g02630 96.8 84.36 90.99 90.49
At3g02650 99.14 80.59 93.18 81.5
At3g02660 98.4 86.4 94.56 85.62
At3g02750 99.18 87.36 95.2 87.72
Mean ± SD 97.5 ± 3.1 84.2 ± 3.9 93.1 ± 4.9 85.8 ± 3.7
Table 3. Relatedness of 10 genes conserved between Brassica napus, Brassica rapa and Arabidopsis thaliana: synonymous base substitution rate (Ks value).
A. thaliana orthologue Ks value
JBnY117L03- JBnY092H05-
B. rapa A. thaliana B. rapa A. thaliana
At3g02520 0.246 0.7289 0.1053 0.3931
At3g02550 0 0.681 0.2331 0.5052
At3g02560 0.0218 0.4228 0.1862 0.44
At3g02580 0.0421 0.4333 0.0726 0.5109
At3g02590-1 0.0222 0.3422 0.3186 0.4887
At3g02590-2 0.0609 0.4156 0.1691 0.5453
At3g02630 0.0787 0.5944 0.247 0.4063
At3g02650 0.0132 0.6977 0.1608 0.6401
At3g02660 0.0579 0.5378 0.1619 0.5997
At3g02750 0.0187 0.3471 0.0915 0.3263
Mean ± SD 0.056 ± 0.071 0.520 ± 0.148 0.175 ± 0.076 0.486 ± 0.096

Discussion

BAC library construction using a transformation-competent vector

A new BAC library was constructed using genomic DNA from the oilseed rape cultivar Tapidor in order to identify B. napus orthologues of genes and to underpin comparative genomics studies. In contrast to previous Brassica BAC libraries, the vector pYLTAC7 was used (Liu et al., 1999). This transformation-competent binary vector enables subsequent transformation into plants via Agrobacterium tumefaciens (Liu et al., 2002). To aid the efficiency of any subsequent transformation experiments, and in order to reduce the number of genes transferred simultaneously, we restricted the average insert size of the library to 90 kb. Nevertheless, the 73 728-clone library, which we named the JBnY library, still provides ∼4.5-fold redundant coverage of the ∼1200-Mb genome of B. napus and enabled the identification of our targeted region.

Comparative analysis of the stearoyl-ACP desaturase locus

The seven SAD genes in A. thaliana include one tandem pair and one tandem triplet. The genomes of B. rapa and B. oleracea are generally expected to contain multiple copies of each gene present in the A. thaliana genome as a consequence of extensive whole genome triplication (Town et al., 2006; Yang et al., 2006), with B. napus expected to contain the sum of genes present in B. rapa and B. oleracea, which represent its progenitor genomes (U, 1935). However, initial analyses of copy number of SAD genes in B. napus identified only four loci (Slocombe et al., 1994). We targeted our investigation to the B. napus orthologue(s) of the tandem triplication of SAD genes on chromosome 3 of A. thaliana (At3g02610, At3g02620, At3g02630), which we denote as the SAD3 gene family. Although the proteins that the SAD3 family encode are similar, nucleotide similarity between At3g02610 and At3g02620 is much higher than between either of these and At3g02630. The pair of SAD genes on A. thaliana chromosome 5 (At5g16230 and At5g16240) are also relatively diverged at the nucleotide level, with At5g16240 being more similar to At3g02630 and At5g16230 being more similar to At3g02610 and At3g02620. This suggests that an initial tandem duplication of SAD genes occurred prior to the last whole genome duplication in the ancestry of the Brassicaceae (Bowers et al., 2003), with a further duplication of one of the genes, forming the pair At3g02610 and At3g02620, much more recently. We identified only two SAD3-containing regions of the B. napus genome, representing homoeologous segments of the Brassica A and C genomes. By sequencing complete BACs representing these regions, we confirmed that the correct regions of the B. napus genome had been identified, and found the B. napus SAD3 genes to be present in a single copy at both loci. These copies were closely related to the A. thaliana SAD3 gene At3g02630, with genes corresponding to the additional two SAD genes observed in A. thaliana (At3g02610 and At3g02620) having been lost. Tandem arrays of genes are a common feature of the genome of A. thaliana (AGI, 2000) and differences between such arrays in A. thaliana and B. oleracea have been noted previously (Town et al., 2006). The lower copy number of the SAD3 gene family in B. napus compared with A. thaliana accounts for part of the difference in the overall copy number of the SAD gene family. However, the selective advantage of increased numbers of SAD genes in A. thaliana is unclear.

Comparative analysis of genome segments

Previous studies conducted in B. oleracea and B. rapa have indicated that the extent of gene loss compared with orthologous regions of the A. thaliana genome is approximately one in three (Town et al., 2006; Yang et al., 2006). However, for the region we have studied in B. napus, the extent of gene loss is greater than this, at 44% for the genomic region sequenced from both B. napus genomes. Our comparative analysis in B. rapa showed the same extensive gene loss as observed in B. napus. Thus our results show that the Brassica genomes have a broader range of regional gene content variation relative to A. thaliana than the initial reports indicated. Although the previous studies have provided evidence for gene loss from the Brassica genome segments, it is also possible that the A. thaliana genome has accumulated genes in this region or that the identification of some of the gene models in this region of the A. thaliana genome sequence has been erroneous.

The B. napus sequences were assigned, on the basis of linkage mapping and sequence similarity, to homoeologous regions of the B. napus genome, corresponding to their position within linkage groups A3 and C3. Estimated across a set of 10 gene models conserved in both B. napus genomes, the nucleotide conservation across coding sequences between B. napus and B. rapa ranged from 89.9 to 100%, with a mean of 97.5%. This is lower than might be expected as B. rapa is considered to be the progenitor of the B. napus A genome, probably within the last 10 000 years. However, we do not know which extant B. rapa type most closely represents the actual ancestor of B. napus, so we interpret the result as indicating that the A genome ancestor of B. napus was relatively distantly related to the B. rapa cultivar Chiifu. Indeed, it is possible that different parts of the B. napus genome, as represented by European winter oilseed rape such as the cultivar Tapidor, may be derived from a range of different B. rapa types. This could result, for example, from intercrossing of B. napus lines of multiple independent origins or from occasional out-crossing with B. rapa.

There is evidence of some expansion of the Brassica genomes. The orthologues of the single-copy A. thaliana gene model At3g02590 are represented as tandem pairs in both B. napus genomes. However, one of the pair in the B. napus A genome has been disrupted by the insertion of a Ty1/copia-like type retroelement. This insertion is likely to have occurred very recently as the element has perfectly conserved 361-bp LTRs, and encodes putatively intact reverse transcriptase and integrase. This insertion is not present in the corresponding gene in either the B. napus C genome or the B. rapa genome.

Overall, the sequences of the studied regions of the B. napus genome sequences show high degrees of conservation with those of the corresponding region of the B. rapa cultivar Chiifu genome, but not the near-identical structure that might have been observed for the B. napus A genome if the B. rapa cultivar selected for genome sequencing had been very closely related to the A genome progenitor of B. napus as represented by European oilseed rape. A consequence of this is that the assignment of B. napus sequences to the A or C genomes based on sequence similarity of relatively short contiguous sequences may not be reliable. The divergence of even the B. napus A genome from the emerging B. rapa genome sequence also means that, although the collinearity of genes is likely to be highly conserved, the content of functional genes may vary considerably. It is likely, therefore, that extensive sequence resources will be required from B. napus, in addition to B. rapa, to underpin the improvement of oilseed rape.

Experimental procedures

Screening and sequencing of BAC clones

The BAC library B. napus, JBnY, was screened with colony hybridization and Southern hybridization using probes based on stearoyl-ACP desaturase coding sequences in A. thaliana and B. napus, respectively. For the colony hybridization probe, which was designed to hybridize to the sequences of all SAD gene families, we designed primer sets (forward primer 5′-CAGGGAAGTGCATGTTCAAG-3′; reverse primer 5′-TCGATCTGCCTCATGTCAAC-3′) based on the exon 2 region of the FAB2 gene (At2g43710) in A. thaliana and amplified from A. thaliana genomic DNA. For the Southern hybridization probe, we designed primer sets (forward primer 5′-AAGAGAAGAGGGCGTGGAG-3′; reverse primer 5′-ACTGAAGCTGGTTTGGTTGC-3′) based on the exon 2 region of one of the SAD3 family genes (At3g02630) in A. thaliana and amplified from B. napus cv. Tapidor genomic DNA. Gel preparation and Southern blotting methods were as described by O’Neill and Bancroft (2000). Similarities of BAC end sequences of the selected clone, JBnY117L03, with the A. thaliana genome were identified by BLAST alignment (http://www.arabidopsis.org/). Complete sequencing of JBnY117L03 was carried out in the Beijing Genomic Institute, and of JBnY092H05 was carried out by GATC Biotech. Both sequencing and gene annotation of the BAC clone (KBrB003I01) containing the stearoyl-ACP desaturase (orthologue of At3g02630) derived from B. rapa were provided by the Korea Brassica Genomic Team. The orthologous region (around 174 kb) between At3g02520 and At3g03090 of the A. thaliana genome sequence and its gene annotation were obtained from ATiDB (http://atidb.org/).

Linkage mapping of BAC JBnY117L03

The primer (forward primer 5′-TCATCGTTCCCTCTGCTTTC-3′; reverse primer 5′-TCGATTGGTTGGTATTGACG-3′) was designed by primer 3 using BAC (JBnY117L03) end sequences and was screened against the Q population for mapping. In the Q population, polymorphisms detected between Rapa 29 (B. rapa) and Atlantica (B. oleracea) were used in the mapping.

Sequence analysis

Pairwise sequence comparisons were conducted using pipmaker (Schwartz et al., 2000) and BLAST2 (http://www.ncbi.nlm.nih.gov/). Gene annotation was obtained via the John Innes Centre BAC annotation pipeline including the gene prediction program FGENE-SH, GLIMMER and alignment with Arabidopsis genes (http://brassica.bbsrc.ac.uk/). Simple sequences and transposons were identified by RepeatMasker (http://www.repeatmasker.org/) followed by manual inspection.

Acknowledgements

We would like to thank Beom-Seok Park for access to the B. rapa BAC sequences (BioGreen Program, 20050301034438) prior to public release. This work was supported by the UK Biotechnology and Biological Sciences Research Council grant BB/E017363/1 and by the Korean Government Long Term Fellowship for the Overseas Study (2006-E-0136) awarded to Kwangsoo Cho. The BAC libraries and clones are available from Genome Enterprise Ltd. (http://orders2.genome-enterprise.com/libraries.html).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.