Volume 209, Issue 2 p. 855-870
Full paper
Free Access

Plastid genomes reveal support for deep phylogenetic relationships and extensive rate variation among palms and other commelinid monocots

Craig F. Barrett

Corresponding Author

Craig F. Barrett

Department of Biological Sciences, California State University, Los Angeles, CA, 90032 USA

Division of Plant and Soil Sciences, West Virginia University, Morgantown, WV, 26506 USA

*Author for correspondence:

Craig F. Barrett

Tel: +1 323 343 2049

Email: [email protected]

Search for more papers by this author
William J. Baker

William J. Baker

Royal Botanic Gardens, Kew, Richmond, TW9 3AB UK

Search for more papers by this author
Jason R. Comer

Jason R. Comer

Department of Plant Biology, University of Georgia, Athens, GA, 30602 USA

Search for more papers by this author
John G. Conran

John G. Conran

Department of Genetics and Evolution, School of Biological Sciences, University of Adelaide, Adelaide, 5005 Australia

Search for more papers by this author
Sean C. Lahmeyer

Sean C. Lahmeyer

Herbarium, The Huntington Library, Art Collection, and Botanical Gardens, San Marino, CA, 91108 USA

Search for more papers by this author
James H. Leebens-Mack

James H. Leebens-Mack

Department of Plant Biology, University of Georgia, Athens, GA, 30602 USA

Search for more papers by this author
Jeff Li

Jeff Li

Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, CA, 92521 USA

Search for more papers by this author
Gwynne S. Lim

Gwynne S. Lim

L. H. Bailey Hortorium and Plant Biology Section, Cornell University, Ithaca, NY, 14853 USA

Search for more papers by this author
Dustin R. Mayfield-Jones

Dustin R. Mayfield-Jones

Donald Danforth Plant Science Center, St Louis, MO, 63132 USA

Division of Biological Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211 USA

Search for more papers by this author
Leticia Perez

Leticia Perez

Department of Biological Sciences, California State University, Los Angeles, CA, 90032 USA

Search for more papers by this author
Jesus Medina

Jesus Medina

Department of Biological Sciences, California State University, Los Angeles, CA, 90032 USA

Search for more papers by this author
J. Chris Pires

J. Chris Pires

Division of Biological Sciences, Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211 USA

Search for more papers by this author
Cristian Santos

Cristian Santos

Department of Biological Sciences, California State University, Los Angeles, CA, 90032 USA

Search for more papers by this author
Dennis Wm. Stevenson

Dennis Wm. Stevenson

Pfizer Laboratory of Molecular Systematics, New York Botanical Garden, Bronx, NY, 10458 USA

Search for more papers by this author
Wendy B. Zomlefer

Wendy B. Zomlefer

Herbarium, The Huntington Library, Art Collection, and Botanical Gardens, San Marino, CA, 91108 USA

Search for more papers by this author
Jerrold I. Davis

Jerrold I. Davis

L. H. Bailey Hortorium and Plant Biology Section, Cornell University, Ithaca, NY, 14853 USA

Search for more papers by this author
First published: 09 September 2015
Citations: 153

Summary

  • Despite progress based on multilocus, phylogenetic studies of the palms (order Arecales, family Arecaceae), uncertainty remains in resolution/support among major clades and for the placement of the palms among the commelinid monocots. Palms and related commelinids represent a classic case of substitution rate heterogeneity that has not been investigated in the genomic era.
  • To address questions of relationships, support and rate variation among palms and commelinid relatives, 39 plastomes representing the palms and related family Dasypogonaceae were generated via genome skimming and integrated within a monocot-wide matrix for phylogenetic and molecular evolutionary analyses.
  • Support was strong for ‘deep’ relationships among the commelinid orders, among the five palm subfamilies, and among tribes of the subfamily Coryphoideae. Additionally, there was extreme heterogeneity in the plastid substitution rates across the commelinid orders indicated by model based analyses, with c. 22 rate shifts, and significant departure from a global clock.
  • To date, this study represents the most comprehensively sampled matrix of plastomes assembled for monocot angiosperms, providing genome-scale support for phylogenetic relationships of monocot angiosperms, and lays the phylogenetic groundwork for comparative analyses of the drivers and correlates of such drastic differences in substitution rates across a diverse and significant clade.

Introduction

The use of genomic data in plant phylogenetics is expanding, due to the availability of next-generation sequencing (NGS) technology, and new methods of sampling genomes (e.g. genome skimming, transcriptomes, hybrid capture). Genome skimming is an efficient way to generate data rapidly and inexpensively from all three genomes in plants, aided by multiplexing to recover the high copy fraction of total genomic DNA, which includes organellar genomes, ribosomal DNA and other multicopy elements (Steele et al., 2012; Straub et al., 2012). The plastid genome (plastome), due to its typical mode of uniparental inheritance, is a single, highly informative locus, and several genes of the plastome have been used extensively in plant molecular systematics. Complete plastomes have the potential to resolve and provide support for the most recalcitrant nodes in the plant tree of life.

The deepest divergences of the ‘commelinid’ monocots are among the most recalcitrant relationships in plants (e.g. Chase et al., 2006). The commelinids comprise a diverse clade including numerous ecologically and economically important species, including four orders: Arecales (palms, one family, Arecaceae), Zingiberales (eight families including bananas, gingers and relatives), Commelinales (five families including dayflowers and relatives), Poales (16 families including bromeliads, sedges, grasses, etc.) and Dasypogonaceae (an Australian family of four genera, unplaced to order). Despite the importance of the commelinids, questions remain regarding the relationships among their orders, as well as the extent to which substitution rates vary among them.

Among the monocots, the palm and grass families represent a classic case of extreme variation in evolutionary rates, with palms (Arecales, Arecaceae) having among the slowest rates of molecular change and grasses (Poales, Poaceae) among the most rapid (Bousquet et al., 1992; Gaut et al., 1992). Slow substitution rates in the palms and paucity of informative molecular characters in multigene studies, combined with morphological homoplasy, have been problematic for palm systematics. Because earlier studies drew on very limited genetic data, the molecular evolutionary contrasts within this group need re-evaluation with genomic data. Complete plastid genomes can provide sufficient character information to resolve relationships within the palms, and among the palms and their commelinid relatives. Complete plastomes also present opportunities to study patterns of molecular evolution, by providing unprecedented numbers of characters for comparative research.

The Arecales are of worldwide economic and horticultural importance, and are ecologically significant, emblematic members of tropical–subtropical ecosystems (Dransfield et al., 2008; Couvreur & Baker, 2013), encompassing a range of growth forms (e.g. trees, rhizomatous shrubs, lianas; Moore & Uhl, 1982; Tomlinson, 2006; Rudall et al., 2011). Earlier monocot studies based on a few loci have recovered conflicting commelinid relationships, with respect to the placement of the palms, with low branch support (Davis et al., 2004; Chase et al., 2006; Graham et al., 2006). More recent studies based on plastomes suggest palms being sister to Dasypogonaceae, with the two collectively being sister to the remaining commelinids, but few whole plastomes were sampled and support was not strong (Givnish et al., 2010; Barrett et al., 2013; Davis et al., 2013; Ruhfel et al., 2014). Relationships among groupings within the palms have received substantial attention in molecular studies over the past three decades (Uhl et al., 1995; Baker et al., 1999, 2009, 2011; Asmussen et al., 2000, 2006; Asmussen & Chase, 2001; Lewis & Doyle, 2001; Hahn, 2002), resulting in a phylogenetically informed classification recognizing five subfamilies: Calamoideae, Nypoideae (monotypic), Coryphoideae, Ceroxyloideae and Arecoideae (Dransfield et al., 2005; Asmussen et al., 2006). Multilocus studies based on plastid DNA tend to converge on a topology of (Calamoideae (Nypoideae; (Coryphoideae (Ceroxyloideae, Arecoideae)))), but with varying support for these groupings (Asmussen et al., 2006; Baker et al., 2009). Contrasting topology and support have been observed within the second largest palm subfamily, Coryphoideae (i.e. the fan palms). Based on four plastid loci, Asmussen et al. (2006) sampled genera representing all eight Coryphoid tribes (sensu Dransfield et al., 2005) as part of larger, family-wide analyses, generally demonstrating support for all tribes as monophyletic, as well as for some of the relationships between them. However, these analyses left some of the deepest nodes in the subfamily unresolved or weakly supported.

The current classification of palms is based on four plastid genes, and the great power of NGS technology provides the opportunity critically to re-examine the phylogenetic hypothesis upon which it is based. Here we assess relationships, support and evidence for heterogeneity in substitution rates among the commelinid monocots (focusing on the palms) using whole plastomes, and representing the largest dataset of nearly complete plastomes to date for the monocots. Specifically, we address two principal questions: what are the ‘deep’ relationships among the commelinid orders and the palms? What are the patterns of variation in plastid substitution rate across the commelinids?

Materials and Methods

Taxon sampling, plant material, and deposition of vouchers

Leaf material from various botanical gardens (see Table 1) was silica gel-dried and stored at −20°C. Representatives from all five subfamilies had at least one species from each of the 15 non-Arecoid tribes and three of the 14 Arecoid tribes, following the subfamilial and tribal classifications of Asmussen et al. (2006) and Dransfield et al. (2005), respectively. Six species with previously sequenced plastomes or plastid gene sets were chosen to represent the subfamily Arecoideae, based on Baker et al. (2009, 2011): Areca vestiaria and Veitchia spiralis (tribe Areceae); Cocos nucifera, Elaeis guineensis and E. oleifera (tribe Cocoseae); and Chamaedorea seifrizii (tribe Chamaedoreeae). The focus of sampling was not within the Arecoideae, as relationships in this subfamily have been treated by Baker et al. (2011), and are the subject of a recent plastome-based investigation (Comer et al., 2015). Here, arecoid taxa were represented by ‘placeholders’, in order to focus sampling more intensively on relationships within and among the remaining subfamilies. All four genera of Dasypogonaceae are represented; Baxteria was newly sequenced for this study, while sequences from Dasypogon, Calectasia and Kingia were generated previously (Givnish et al., 2010; Barrett et al., 2013). Nearly complete sets of coding genes from previous plastome-based studies were included in monocot-wide, protein-coding matrices (see later; Givnish et al., 2010; Steele et al., 2012; Barrett et al., 2013, 2014a,b; GenBank numbers listed in respective papers).

Table 1. Accessions sequenced for this study with taxonomic, voucher, Illumina read and coverage depth information
Species Subfamily Tribe Voucher GenBank no. No. reads (trimmed) Plastid x-cov
Acoelorrhaphe wrightii H. Wendl. & Becc. Coryphoideae Trachycarpeae Baker 1360 (K) KT312941 6501 171pe 92.0
Areca vestiaria Giseke Arecoideae Areceae Zomlefer 2310 (FTG, NY) KT312940 6032 216 22.6
Arenga caudata (Lour.) H. E. Moore Coryphoideae Caryoteae Baker 1986-3157 (K) KT312939 5377 322pe 68.9
Borassodendron machadonis (Ridl.) Becc. Coryphoideae Borasseae Baker 1989-3394 (K) KT312937 3519253pe 32.8
Brahea brandegeei (Purpus) H. E. Moore Coryphoideae Trachycarpeae 95 967 (HNT) KT312936 9873 046pe 213.4
Caryota mitis Lour. Coryphoideae Caryoteae Zona, Lewis & Roncal 920 (FTG) KT312915 4384 198 28.3
Chamaerops humilis L. Coryphoideae Trachycarpeae Baker 1363 (K) KT312935 35 863 369pe 289.1
Chuniophoenix nana Burret Coryphoideae Chuniophoeniceae 931 085-C (MBC)* KT312934 3672 565 71.6
Colpothrinax cookii Read Coryphoideae Trachycarpeae 20 040 881X (MBC)* KT261428 9251278pe 167.2
Corypha lecomtei Becc. ex Lecomte Coryphoideae Corypheae Comer 194 211 (BKF) KT312933 10 331 636pe 208.8
Eremospatha macrocarpa (G. Mann & H. Wendl.) H. Wendl. Calamoideae Lepidocaryeae II Baker 1358 (K) KT312932 55 220 626pe 1161.1
Eugeissona tristis Griff. Calamoideae Eugeissoneae Kamaruddin FRI 43 669 (KEP) KT312931 7870 726pe 55.1
Leucothrinax morrisii (H. Wendl.) C. Lewis & Zona Coryphoideae Cryosophileae Baker 1361 (K) KT312929 17 376 708pe 49.1
Licuala paludosa Griff. Coryphoideae Trachycarpeae Baker 1359 (K) KT312928 43 782 846pe 608.3
Lodoicea maldivica (Pers. ex H. Wendl.) Pers. ex H. Wendl. Coryphoideae Borasseae Baker 1994-3231 (K) KT312927 6651 326pe 47.5
Mauritia flexuosa L. f. Calamoideae Lepidocaryeae I Zomlefer 2333 (FTG, NY). KT312914 1294 602 19.5
Metroxylon warburgii Becc. Calamoideae Calameae KT312926 6595 474pe 81.3
Nypa fruticans Wurmb Nypoideae na Cuenca NAC34 (FTG) KT312925 469 0123 88.1
Phytelephas aequatorialis Spruce Ceroxyloideae Phytelepheae Aarhus 87BI00261 (BH) KT312924 9157 678pe 221.4
Pigafetta elata (Mart.) H. Wendl. Calamoideae Calameae Baker 508 (K) KT312923 7804 356pe 65.6
Pritchardia thurstonii F. Muell. & Drude Coryphoideae Trachycarpeae Baker 1362 (K) KT312922 23 556 696pe 583.3
Salacca ramosiana Mogea Calamoideae Calameae Baker 1979-4409 (K) KT312921 115 114 590pe 1918.1
Serenoa repens (W. Bartram) Small Coryphoideae Trachycarpeae Zomlefer 2334 (FTG, NY) KT312920 4397 733 41.5
Tahina spectabilis J. Dransf. & Rakotoarinivo Coryphoideae Chuniophoeniceae Baker 1356 (K) KT312919 5111 570pe 87.1
Trithrinax brasiliensis Mart. Coryphoideae Cryosophileae Noblick 5282 (FTG) KT312918 5839 884 54.9
Veitchia spiralis Becc. Arecoideae Areceae Zona 724 (FTG) KT312917 7115 607 42.3
Wallichia densiflora Mart. Coryphoideae Caryoteae Baker RBG Kew 1973-12609 (K) KT312916 9653 920pe 68.2
Washingtonia robusta H. Wendl. Coryphoideae Trachycarpeae Barrett 310 CA (CSLA) KT312942 37 162 404pe 1263.9
Nonpalms (Dasypogonaceae; Commelinales, Hanguanaceae)
Baxteria australis R. Br. ex Hook. na na Conran 3054a-1 (AD) KT312938 13 837 678 700.9
Hanguana malayana (Jack) Merr. na na Stevenson 0039-0 (NY) KT312930 11 682 426pe 1003.0
  • Subfamilial and tribal designations follow Asmussen et al. (2006) and Dransfield et al. (2005), respectively; Lepidocaryeae (Calamoideae) is split into two groups as in Baker et al. (2009). No. reads (trimmed), total number of reads remaining after quality trimming; plastid x-cov, mean coverage depth of the plastome for each accession, after reads were mapped to each annotated plastome. pe, paired end, 100 bp reads. Otherwise, reads were single-end, either 96 or 71 bp before trimming. Plant tissues were collected at Fairchild Tropical Botanical Garden/Montgomery Botanical Center (USA), Nong Nooch Tropical Botanical Garden (Thailand), Forest Research Institute (Malaysia), Royal Botanic Gardens Kew (UK), Huntington Botanical Garden (USA), and the City of Long Beach, CA (USA). Herbarium codes for vouchered specimens: AD, State Herbarium of South Australia; BH, L. H. Bailey Hortorium, Cornell University (USA); BKF, Bangkok Forest Herbarium (Thailand); CSLA, California State University, Los Angeles (USA); FTG, Fairchild Tropical Botanical Garden (USA); HNT, Huntington Botanical Gardens (USA); K, Royal Botanic Gardens Kew (UK); KEP, Forest Research Institute (Malaysia); MBC, Montgomery Botanical Center (USA, live specimens indicated by *); NY, New York Botanical Garden (USA). na, not applicable.

DNA isolation, sequencing and read trimming

Silica-dried leaf material was ground in liquid nitrogen, and total DNAs were isolated using a modified CTAB protocol (Doyle & Doyle, 1987). The DNA concentration was quantified using a NanoDrop spectrophotometer (Thermo Scientific, Carlsbad, CA, USA). Isolations with concentrations > 30 ng μl−1 were chosen for Illumina sequencing. The total gDNAs were shipped to Cold Spring Harbor Laboratory (Cold Spring Harbor, Woodbury, NY, USA) or to the University of Missouri-Columbia for library preparation and Illumina sequencing. The samples were sequenced over a 4-yr period (2009–2013), including single end reads of 71 or 96 bp, and paired end reads of 100 bp (Table 1). Sequences were trimmed and adaptors removed with Trimmomatic v.0.33 (Bolger et al., 2014). Trimming was completed from the 5′ and 3′ ends of each read, removing bases below PHRED 30 within a sliding window of four bases, keeping only reads/read-pairs of 51 bases or longer.

Plastome assembly and annotation

A multistep approach was taken to build plastomes, including reference guided assembly, de novo assembly and remapping of reads, to ensure high-quality assemblies. Reference-guided assemblies were conducted using Yasra (Ratan, 2009; http://www.bx.psu.edu/miller_lab), based initially on plastomes of Phoenix dactylifera (Arecaceae, Genbank GU811709) and Typha latifolia (Typhaceae; GenBank GU195652). The resulting plastomes were used as subsequent references. De novo assemblies were constructed in Velvet (Zerbino & Birney, 2008), using kmer lengths of 51–81 bp. Contigs from each assembly method were merged in Sequencher v.5.2 (GeneCodes, Ann Arbor, MI, USA). Reads were then remapped to references for each taxon to check for misassemblies or rearrangements in Geneious v.6 (http://www.geneious.com; Kearse et al., 2012) under stringent parameters (3% mismatch to reference; maximum gap size 1000 bp), and reads matching the draft reference were assembled de novo, also in Geneious under suggested parameters. Inverted repeat boundaries were determined in Sequencher, and verified by remapping reads in Geneious.

The UNIX function ‘grep’ was used to close contig gaps by searching among read files. Finished plastid genomes were annotated via BLASTX in Dogma (Wyman et al., 2004). Smaller exons (< 30 bp) were manually annotated. Lastly, primers were developed with Primer3 (Untergrasser et al., 2012) to close low-coverage gaps between contigs (for a few single end datasets, Supporting Information Fig. S1). Amplifications and Sanger sequencing were conducted as in Davis et al. (2004).

Plastid gene/genome alignment and data matrix construction

Monocot-wide matrices (protein coding loci)

GenBank files were generated in Sequin for all newly assembled plastomes, and other monocot plastome data were downloaded from GenBank. Annotated plastomes were imported into Geneious, and all protein-coding sequences were exported using the annotation table feature, including gene name, organism name and sequence. UNIX grep commands were used to capture sequences of each gene and to place them in separate files, executed as a UNIX script. Matrices were constructed using Geneious and SequenceMatrix (Vaidya et al., 2011), and Fasta files for each locus were exported for codon-based alignment in MACSE (Ranwez et al., 2011) using default gap penalties. Alignments were imported into Geneious, visualized and concatenated in SequenceMatrix. Acorus was chosen as the outgroup based on previous studies (Duvall et al., 1993; Chase et al., 2006; Barrett et al., 2013). The resulting matrix comprised 75 protein coding genes, 132 taxa, 83 928 aligned bp (28 661 parsimony informative) and 17.4% gaps/missing data (mainly from partially complete plastomes of Poales).

Three separate protein-coding matrices were analysed: (1) ‘mono132,’ the full set of protein coding loci including all taxa; (2) ‘comm58,’ a reduced taxon set with nearly all representative commelinid families and Doryanthes (Asparagales) as the outgroup; and (3) ‘comm58-3gene,’ a matrix of the 58 taxa from (2) given earlier, but including only atpA, matK and rbcL. Comm58 had 77 645 aligned characters (18 966 informative) with 28.4% gaps/missing data (again, mainly from Poales), and the comm58-3gene dataset had 5094 aligned characters (atpB, matK, rbcL; 1490 informative) and 8.1% gaps/missing. The latter two ‘reduced’ matrices were analysed due to computational considerations, with respect to models requiring long Markov chains. These three loci were chosen due to their proven utility (Chase et al., 2006) and small amounts of missing data. All matrices are available online (Figs S2–S5).

Palms + Dasypogonaceae only (whole plastomes)

Complete plastomes for Arecales and Dasypogonaceae were aligned (excluding one copy of the Inverted Repeat) using the Mauve (Darling et al., 2010) plugin in Geneious to identify potentially undetected rearrangements, and then in Mafft (Katoh & Standley, 2013; ‘auto’ algorithm, gap penalty = 3, offset = 0.1). All palms (except Tahina) and Dasypogonaceae sequenced to date have collinear plastomes (Barrett et al., 2013). Regions of ‘Ns’ in spacers and introns were removed before alignment. Since spacers can be difficult to align at higher taxonomic levels (e.g. across a family), and in place of extensive/unrepeatable manual editing, an analysis was conducted to explore the effect of character inclusion/exclusion based on explicit conservation criteria using Gblocks (Talavera & Castresana, 2007). Parameters ranged from ‘relaxed’ to ‘strict’ based on conserved block size, flanking regions, percentage of sequences with gaps at a position and so on. Before phylogenetic analyses, a partial fragment of ycf1 in the IR was removed, rearranged fragments were reversed based on breakpoints, and the second copy of the IR was removed.

Phylogenetic analyses

Of the ‘coding’ matrices, mono132 and comm58 were analysed with unpartitioned, codon-partitioned and gene-partitioned GTR + Γ models, using 25 rate categories. PartitionFinder v.1.1.1 (Lanfear et al., 2012, 2014) was used for the mono132 matrix, to find an improved partition modelling strategy among 75 genes and three codon positions, using GTR models of RAxML, linked branch lengths, the Bayesian information criterion (BIC), and the ‘hcluster’ algorithm due to the large size of the dataset. Whole plastome alignments were analysed in RAxML (Stamatakis, 2006), under an unpartitioned GTR + Γ model. Analyses were conducted with 3–5 replicate maximum likelihood (ML) searches, and branch support was assessed with 1000 standard bootstrap pseudoreplicates. Partition models were compared using the corrected Akaike information criterion (AICc, Burnham & Anderson, 2002), which identifies the best fit model while penalizing overparameterization. The number of free parameters for each model (K) = P(2N-3) + (P × S), where N is the number of taxa, S is the number of GTR model parameters and P is the number of partitions. The AICc was calculated as −2logeL + 2K(n/(nK−1)), where logeL is the likelihood, K is the number of free parameters, and n is the number of characters.

Additional ML analyses were conducted separately based on codon positions 1 + 2 and position 3 (unpartitioned GTR + Γ, as described earlier), respectively, of the mono132 matrix to assess differences in tree topology and lineage specific distances resulting from these two partitions. Furthermore, an analysis was run using ‘RY’ coding (with the unpartitioned ‘BINGAMMA’ binary model but otherwise as above), whereby pyrimidines and purines are coded as such, to explore the potential effect of systematic biases (e.g. GC content or transitional biases among clades).

Parsimony searches were conducted in TNT (Goloboff et al., 2008) with 100 random addition sequences, holding up to 10 000 trees, with tree bisection–reconnection branch swapping. Support was estimated via 1000 Jackknife pseudoreplicates, with 37% deletion probability using the same search parameters as described earlier. Branch support in all analyses is considered strong/robust if > 90, and weak if < 80.

Plastid substitution rate analyses, inference of rate changes

To assess variation in substitution rates among commelinid monocot orders, patristic distances from the common ancestral node of each individual order to each tip were calculated for the ‘mono132’ dataset, using the ‘ape’ and ‘geiger’ packages in R (Paradis et al., 2004; Harmon et al., 2008) and plotted in Past v.3.1 (Hammer et al., 2001). Statistical tests for differences in branch lengths among clades were not attempted due to issues of nonindependence of the data, (i.e. shared ancestral branches). Nearly all commelinid monocot families were represented; noncommelinid orders were not included in comparisons due to some limited sampling (e.g. for Alismatales, Dioscoreales, Pandanales, Liliales).

To test the null hypothesis that monocots evolve via a ‘Global Clock,’ analyses were conducted in the ‘baseml’ module of Paml v.4.8 (Yang, 1997, 2007). Specifically, different ‘branch models’ were tested, allowing rates to vary in prespecified regions of the tree corresponding to monocot orders, as opposed to a ‘background’ rate. Model M0 specified a global clock for all monocots; model M1 allowed Arecales to evolve via a local clock; M2 allowed Arecales and Poales to have independent local clocks; M3 allowed local clocks for all commelinid orders. Nested models were tested via likelihood ratio tests, where Λ = 2(logeL1−logeL2), which is chi-square distributed, and also using the corrected Akaike Information Criterion (Burnham & Anderson, 2002), where AICc = −2logeL + 2K(n/(nK−1)), in which n is the number of sites in the alignment, and K is the number of free parameters.

To investigate more explicitly the variation in substitution rates across the commelinid monocots in a phylogenetic context, a Bayesian approach was taken using the random local clock model in Beast v.1.81 (Drummond and Rambaut, 2007; Drummond & Suchard, 2010; Drummond et al., 2012). The RLC model has the advantage of requiring no a priori designation of the location of rate shifts, by assigning a prior on the number of rate changes throughout the tree, ranging from a global clock to a model in which all branches have independent rates. The objective was not to use fossil-calibration points for divergence time estimation, but to explore the probable number and locations of major shifts in relative substitution rate. Two matrices were analysed: the ‘comm58’ matrix with all coding loci, and a reduced matrix (comm58-3gene) that was much more computationally tractable than the former. Analyses were run using an unpartitioned GTR + Γ model (four rate categories, Yule speciation prior, Poisson rate change prior, and clock rate using CTMC rate reference (Ferreira & Suchard, 2008) with initial value of 1.0).

For the comm58matrix, three sets of 30 million generations from random seeds were run, and for the comm58-3gene matrix, seven sets of 100 million generations were run, with burn-in values from 20 to 85 million generations, respectively. Major ‘deep’ nodes were constrained based on relationships and support from previous analyses (Givnish et al., 2010; Barrett et al., 2013; this work) including: all commelinids, Zingiberales (Z), Commelinales (C), Zingiberales + Commelinales (ZC), Poales (P), ZC + P, Arecales (A), Dasypogonaceae (D), and A + D. Stationarity within and convergence between runs was verified using Tracer v.1.6 (Rambaut & Drummond, 2009), with effective sample sizes > 200 for parameters. Runs were combined with LogCombiner, trees modified in TreeAnnotator, and then visualized in FigTree (Rambaut, 2009), which allowed rate changes to be indicated among branches.

Mesquite v.3.03 (Maddison & Maddison, 2015) was used to assess the influence of third codon positions on topology and lineage specific substitution rates. The maximum likelihood tree based on codon positions 1 + 2 was compared with that from position 3 in a patristic distance correlation between the two trees. Analyses were run on a 12-core workstation with 64 Gigabytes of RAM (Linux).

Results

Read coverage, plastome structure in the palms and alignments

The mean coverage depth of plastomes ranged from 19.5× (Mauritia) to 1918.1× (Salacca); these two accessions also had the lowest and highest total numbers of reads (1294 602 and 115 114 590, respectively; Table 1). Plastome structure is conserved across palms and Dasypogonaceae, with two exceptions. Eugeissona tristis (Calamoideae) has lost ndhF, based on 55.1× mean coverage of the plastome (excluding ndhF) vs 2.5× mean coverage of ndhF, and on contiguity of paired reads across the deleted region. Tahina spectabilis (Coryphoideae: Chuniophoeniceae) has lost nearly all of Inverted Repeat B (IRB), except for a 159 bp fragment of ndhB (Fig. 1). Remapping reads to the annotated plastome of Tahina did not suggest the typical ‘doubling’ of read depth over the region corresponding to the IR observed in other plastomes (with IRA removed), suggesting major excisions of the IR.

Details are in the caption following the image
Structure of the plastome of Tahina spectabilis (Arecaceae: Chuniophoeniceae), showing loss of one copy of the Inverted Repeat, except for a 159 bp remnant of the ndhB gene, and a 1944 bp inversion (relative to Chuniophenix) of the region containing trnQuug, psbK, psbI and trnSgcu.

In addition, the plastome of Tahina has a 1944 bp inversion in the region corresponding to the Large Single Copy in other palms, with breakpoints between the 5′ exon of rps16/trnQ-UUG and trnS-GCU/5′ exon of trnG-UUG. Boundaries were verified by aligning the plastomes of Tahina and the closely related Chuniophoenix nana, and by mapping paired reads back to Tahina and visually examining the regions corresponding to the breakpoints of the inversion to check that each was spanned by several pairs of reads.

Phylogenetic placement of the palms among the commelinid monocots

Branch support based on the mono132 dataset is generally strong (Fig. 2), with 100 ML Bootstrap and 100 maximum parsimony (MP) Jackknife values (Fig. S6) for the monophyly of each order of monocots (except Pandanales, with one representative plastome). As well as for nearly all of the ‘deep’ branches relating orders of the monocots. Within the commelinids, there is moderate to strong support for (Arecales, Dasypogonaceae) (AD clade), and strong support for (Zingiberales, Commelinales, Poales; ZCP clade). Support is > 80 and 90 under all GTR partition models for AD and ZCP, respectively, but there is variation in support for these two clades (Fig. 2). Model partitioning affects support for these two nodes, particularly for AD; ML bootstrap support ranges from 91 (unpartitioned model) to 81 (codon partitioned) for this node, indicating that selection of model partition is significant. Comparison of the four ML model partitions used for the mono132 matrix indicates the model with the highest number of free parameters (gene partition) has the lowest AICc score, followed by the model identified by PartitionFinder (Table 2). Analysis of the RY-coded, mono132 matrix yielded identical major relationships among commelinid orders, with similar support values compared with those from standard GTR analyses based on nucleotide sequences (83 for AD and 86 for ZCP; Fig. S7c). The few weak to moderately supported nodes near the base of Poales are mostly associated with the placement of Typhaceae (Typha, Sparganium), Bromeliaceae (Brocchinia, Neoregelia, Puya), Mayacaceae (Mayaca) and Eriocaulaceae (Syngonanthus); under an unpartitioned model the latter two families form a clade, while they do not form a clade under other partitioning schemes. In all cases support is weak (< 60). Results in this region differed slightly for MP analyses, also with weak support (Fig. S6).

Details are in the caption following the image
(a) Phylogram based on maximum likelihood analysis of the ‘mono132’ matrix (GTR + Γ, model partitioned by codon position) of 75 protein coding genes of the plastome, showing branch lengths (minus duplicate genes from the Inverted Repeat). Bar, 0.04 substitutions per site. ‘Dios./Pand.’, a clade representing orders Dioscoreales and Pandanales. (b) Cladogram of ‘mono132’ matrix based on maximum likelihood analysis (GTR + Γ) of the same data. Numbers adjacent to branches are standard bootstrap percentages based on 1000 pseudoreplicates under GTR models partitioned by: codon, gene, unpartitioned, and by the results of PartitionFinder, respectively. Lack of bootstrap values indicates 100% for all model partitions at a particular node; *, Bootstrap value of 100 when among lower support values for that branch; –, alternative topology for that particular relationship (and thus no bootstrap value for that particular dataset). *, **, *** and **** next to terminal taxon names indicate four taxa with alternative relationships among model partitions (*, Veratrum, Liliales; **, Musa, Orchidantha, Ravenala, Zingiberales; ***, Eugeissona, Arecales; ****, Mayaca, Syngonanthus, Poales).
Table 2. Comparison of GTR partition models from maximum likelihood analysis of the mono132 dataset
GTR model partition No. partitions logeL Total no. free parameters AICc
Unpartitioned 1 −963525.5316 270 1927592.81
Codon partitioned 3 −957452.7 949 288 1915483.58
PartitionFinder 74 −956462.2005 927 1914799.13
Gene partitioned 75 −951607.2776 936 1905107.69
  • ‘No. partitions’, the number of data partitions for a particular model; −logeL, the likelihood; AICc, corrected Akaike Information Criterion where AICc = −2(logeL) + 2K(n/(nK−1)) and ‘n’ is the number of sites in the alignment, and ‘K’ is the number of free model parameters.

Phylogenetic relationships among the palms based on protein coding genes and complete plastomes

There is 100% support for the monophyly of each palm subfamily (excluding the monotypic Nypoideae) based on coding loci, as well as for the ‘deep’ branching pattern among subfamilies (Fig. 3). In particular, there is strong support for the position of Nypa. Support is generally strong for all relationships in Calamoideae, but the placement of Eugeissona is problematic. Depending on the analysis, Eugeissona is either weakly supported as sister to the rest of the calamoids (codon, gene partitioned models, and PartitionFinder), or (Eremospatha, Mauritia), then Eugeissona are successively sisters to the rest of the calamoids (tribe Calameae; unpartitioned model). Subfamily Coryphoideae comprises two strongly supported principal clades, each with four tribes. In one clade, Chuniophoeniceae (including Tahina) is sister to (Caryoteae (Corypheae, Borasseae)), all with 100% support in all analyses and for all clades. In the other clade, ((Sabaleae, Cryosophileae), (Phoeniceae, Trachycarpeae)), all have at least 99% support.

Details are in the caption following the image
A closer view of maximum likelihood-based plastid relationships among the palms (order Arecales) and Dasypogonaceae (from Fig. 2), showing tribal representation among subfamilies Calamoideae and Coryphoideae. As in Fig. 2, numbers adjacent to branches indicate bootstrap support values for codon, gene, unpartitioned and PartitionFinder GTR models, respectively. Support values are 100 unless otherwise indicated; *, among support values means 100% support; –, alternative topology exists for that model partition and thus no support; *, among taxon names indicates an alternative placement among searches using different model partitions (i.e. only here in the case of Eugeissona).

Overall, topology based on whole aligned plastomes (including spacers, introns, tRNAs and rRNAs; Fig. 4) is identical to that based only on protein coding genes, except for the placement of (Eremospatha, Mauritia) as sister to Calamoideae vs Eugeissona. In general, relationships and average support did not change drastically based on the different percentages of the full plastome alignment (moving from the full dataset to that containing only the most conserved sites as determined by Gblocks; Table 3). Average bootstrap support in the ML analysis was highest for the ‘Relaxed 1’ dataset (93% of total 159 074 aligned positions), with the lowest SD among mean values; on the other hand, ‘Strict 1’ (77% of full dataset) had the lowest mean support with highest SD (Table 3; Fig. 4). Interestingly, moving from more to fewer characters, overall support seemed to increase, then drop, followed by another increase, but overall gains/losses in support were small, ranging only 2.79%.

Details are in the caption following the image
Cladogram of Arecales and Dasypogonaceae based on maximum likelihood analysis of whole plastomes minus once copy of the Inverted Repeat (unpartitioned, GTR + Γ model). The six Bootstrap percentages correspond to various configurations of the full plastid dataset as output from Gblocks: Full dataset; Relaxed 2; Relaxed 1; Strict 1; Default; and Strict 2 (see Table 2 for further explanation). –, *, ** and *** among taxon names indicate an alternative placement among searches using different datasets.
Table 3. Characteristics of matrices based on aligned whole plastomes of the palms and Dasypogonaceae, and Gblocks character exclusion analyses
Full Relaxed 2 Relaxed 1 Strict 1 Default Strict 2
1. Minimum no. sequences for a conserved position 0 22 22 22 22 42
2. Minimum no. sequences (flank position) 0 22 22 22 35 42
3. Max no. contiguous nonconserved positions All 50 30 15 8 5
4. Minimum conserved block length 0 5 10 20 10 50
5. Minimum % sequences at a gap position 100% 100% 50% 50% 0% 0%
No. aligned characters 159 074 148 462 129 269 123 088 74 877 57 818
% Gaps or missing data 17.3 12.3 2.3 2.1 0 0
No. parsimony informative characters 19 124 17 421 15 562 13 640 7158 4086
% Total aligned characters 100% 93% 81% 77% 47% 36%
Average maximum likelihood (ML) bootstrap support 97.51 98.56 98.74 95.95 97.49 98.26
SD (ML) 10.71 6.19 5.65 12.67 10.67 8.19
Average MP bootstrap support 96.50 98.20 98.50 97.10 95.50 94.90
  • Numbers 1–5 (left column) refer to parameters adjusted in Gblocks. % Total aligned characters, relative size of the dataset compared with the ‘Full’ dataset. See Supporting Information Fig. S6 for parsimony trees and jackknife support values. MP, maximum parsimony.

Notable differences in a few places, based on different datasets, include the position of the clade (Eremospatha, Mauritia) as sister to all other calamoids in all datasets except ‘Strict 2,’ which contains the fewest but most conserved characters; there was low support for relationships at the base of subfamily Calamoideae for all datasets. Within Coryphoideae, Colpothrinax is sister to Pritchardia (Trachycarpeae), collectively sister to a clade of (Washingtonia, ((Brahea, Chamaerops), (Licuala, (Acoelorrhaphe, Serenoa)))) for all datasets but ‘Strict 2’ with 100% support. For the ‘Strict2’ dataset, Pritchardia is sister to the latter clade, with Colpothrinax sister to all of these collectively. Within Trachycarpeae, Licuala is sister to (Acoelorrhaphe, Serenoa) in all but the ‘Full’ dataset; support for this relationship generally increases as data are removed.

Molecular evolutionary rate variation among clades of the monocots

Figures 2(a) and 5 show the striking differences in branch lengths estimated under the GTR + Γ model in RAxML across orders of the commelinid monocots. Among the commelinids, Arecales have relatively short branch lengths, while Poales have the longest, but with extensive variation. Within Poales, earlier diverging families have shorter branches (namely, Bromeliaceae and Typhaceae; Fig. 2a), with a trend of increasing branch length moving from the root of the order. Nucleotide composition also varies among plastomes, with higher mean GC content in Commelinales and Poales, but with extensive variation among plastomes within these two orders (Fig. 5a); Arecales and Dasypogonaceae have similar, relatively low GC profiles, but ML analysis of the RY-encoded mono132 matrix does not reveal major differences in topology or support compared with that of the standard nucleotide matrix (Fig. S7c). Patristic distances between ML trees based on codon positions 1 + 2 and position 3 had a correlation of 1.0, and relationships among major clades of the commelinids were identical between trees based on these two partitions (Fig. S7a,b).

Details are in the caption following the image
(a) Comparison of GC nucleotide composition for complete plastid gene sets of the commelinid monocots included in the mono132 matrix. Error bars indicate ranges; boxes indicate the first and third quartiles, and horizontal bars represent the median values for each clade. (b) Comparison of intra-ordinal plastid branch lengths among the commelinid monocot orders, as assessed by root-to-tip patristic distances, from the common ancestor of each respective clade to each sampled tip (unpartitioned GTR-distances from RAxML analysis of the ‘mono132’ matrix); error bars and boxes are as above. Numbers next to the name of each order represent the number of plastid gene sets sampled for that order.

Branch models in baseml/PAML (comm58; Table 4) indicate a significant departure from a ‘global clock’ (all rates equal among clades). A model allowing a local clock for Arecales (M1) has a significantly better fit than the global clock (M0); as does a model in which Arecales and Poales both have separate local clocks (M2). The best fitting model is one in which all commelinid orders evolve via separate local clocks (M3), as indicated by AICc comparison and Bonferroni-corrected likelihood ratio tests (Table 3).

Table 4. Results of model comparisons of global vs local clocks using the baseml module of PAML for the ‘comm58’ dataset
Model M0 M1 M2 M3
Description Global Clock Arecales (local) vs M0 Arecales & Poales (each local) vs M1 All commelinid orders (each local) vs M2
No. Parameters, no. branch parameters 62, 0 63, 1 64, 2 66, 4
AICc 1072451.9 1070903.7 1067871.8 1066028.2
−logeL 536163.9 535388.8 533871.9 532948.1
Λ na 1550.1 3033.9 1847.6
df na 1 1 2
P-value/Bonferroni corrected na ***/*** ***/*** ***/***
Clade rate parameters (relative to background rate of 1) na A = 0.38194

A = 0.51486

P = 1.76292

A = 1.12280

P = 3.96456

D = 1.59113

ZC = 2.80485

  • No. parameters, the total number of free parameters for a particular model; no. branch parameters, number of clades defined to evolve by a local clock; −logeL, the likelihood of the data, given the model; AICc, corrected Akaike Information Criterion where AICc = −2(logeL) + 2K(n/(nK−1)) and logeL is the likelihood function, n is the number of sites in the alignment, and K is the number of free model parameters; P value, significance of log likelihood ratio test comparing fit of two models (including after Bonferroni correction where = α/m, and m is the number of branch parameters); ***, P < 0.001; Λ, the chi-square distributed, log-likelihood ratio test statistic used to evaluate significant differences in model fit, equal to two times the difference in log likelihood between the two models being compared with df. A, P, ZC, and D correspond to orders Arecales, Poales, Zingiberales/Commelinales and family Dasypogonaceae, respectively. na, not applicable.

Bayesian analyses under the random local clock (RLC) model in Beast of the comm58 dataset seemed to reach stationarity based on ESS values > 200 for parameters of interest, but did not converge between runs for several parameters (including the number of rate changes, the parameter of most interest), and also required months of computation time. When data were reduced to atpB, matK and rbcL (‘comm58-3gene’ dataset), four of seven runs converged on a median value of 22 substitution rate changes with 95% Highest Posterior Density (HPD) of 20–24 (Figs 6, 7). Rate accelerations occur in Poales, Commelinales and Zingiberales, and a few relatively minor rate changes occur in Dasypogonaceae and Arecales, with some rate decelerations in Arecales (Fig. 6). Comparison of posterior densities of the four converged runs vs runs without data (priors only) suggests that the 95% HPD of 20–24 rate changes is significantly different from the 95% HPD of 0–2 for prior only (Fig. 7).

Details are in the caption following the image
Relative plastid substitution rates among clades/branches of the commelinid monocots, based on the ‘comm58-3gene’ dataset, resulting from one of four converged Random Local Clock analyses in Beast v.1.81 (108 generations, with 2.0 × 107 burn-in steps). Numbers above branches indicate relative, noncalibrated, median rates, scaled by dividing all rates by those of the basal-most branches; thus, the numbers do not have units. Thickness of branches corresponds to relative rate.
Details are in the caption following the image
Density plots for random local clock MCMC analyses in Beast v.1.81 showing the probability density of the number of rate changes, based on the ‘comm58-3gene’ dataset. Right, result of one of four converged runs; left, result of an identical analysis using priors only (no data included). Median estimates of the number of rate changes among 58 taxa and internal branches are given, along with their 95% highest posterior density interval (95% highest posterior density, HPD). A lack of overlap among 95% HPDs of analyses run with and without data suggests a departure from a global clock, and is strong evidence of rate shifts.

Discussion

Structure of the plastome in Arecales

The loss of ndhF from the plastome in Eugeissona tristis is surprising, but has been documented in other clades (Wicke et al., 2011; Barrett et al., 2014a). The ndh complex is encoded by nuclear and plastid genes, and functions in electron cycling, especially in environments of variable or high light intensity, ameliorating the effects of oxidative stress (Peng et al., 2011). Losses or pseudogenes have been observed in both photosynthetic and heterotrophic plants (Braukmann et al., 2009; Blazier et al., 2011; Wicke et al., 2013; Barrett et al., 2014a), in which selection on photosynthetic function may have been altered. Reads mapped to the ndhF region of the closely related Calamus, suggesting a copy of ndhF in either the nuclear or mitochondrial genome. Coverage of 2.5× for ndhF (based on Eugeissona reads mapped to Calamus) is more similar to that of the mitochondrial genome than to that expected for the nuclear genome (< 1×), suggesting a functional copy in the mitochondrial genome.

Additional sequencing or long-range PCR may allow a more definitive answer as to the location of ndhF. However, a mitochondrial-encoded ndhF would not likely be adaptive (assuming the ndh complex is functional in Eugeissona); the ndh complex would thus require protein products from all three genomes. If ndhF is encoded in the mitochondrial genome and still functional, it would possibly serve an alternative function or may be pleiotropic, functioning in both the mitochondrion and chloroplast. Fragmented mitochondrial copies of ndh genes have been observed in the orchid Erycina pusilla but these are not likely expressed (Lin et al., 2015).

The lack of an IR in Tahina spectabilis (except a 159 bp, duplicate fragment of ndhB) is intriguing, since the ‘canonical’ structure of nonpoalean and nonheterotrophic monocots is generally conserved. Loss of the IR has been observed, for example, in papilionoid legumes (Palmer et al., 1987; Lavin et al., 1990; Liston, 1995) and some gymnosperms (Strauss et al., 1988; Wu et al., 2011; Wu & Chaw, 2014), which is hypothesized to have led to a destabilization of the gene order in the plastome, evidenced by rearrangements. The simplest mechanistic explanation for IR loss in Tahina is that two massive boundary shifts of the single copy/IR junctions occurred, diminishing the extent of the IR. According to the IR stabilization hypothesis, a complete or nearly complete loss of the IR would be followed by one or more gene order rearrangements. Tahina has a 1994 bp inversion, which is in line with that expectation, assuming the inversion occurred after IR reduction. The fact that changes in plastome structure were detected among 38 representative taxa suggests that deviations from canonical structure may be more prevalent than previously hypothesized, given the high degree of palm plastome conservation. It also suggests that structural changes do not occur deep in the palm plastid phylogeny as synapomorphies but unite more exclusive clades or may be present idiosyncratically in species or populations.

Phylogenetic placement of the palms among the commelinid monocots

The placement of the palms as sister to Dasypogonaceae, with moderate to strong support, is in agreement with recent plastid-based studies (e.g. ML analysis of Givnish et al., 2010; Barrett et al., 2013; but see Davis et al., 2013), as well as the placement of this clade as sister to a strongly supported clade of (Poales, (Zingiberales, Commelinales)). The increased taxon sampling in the current study based on nearly complete plastid gene sets has resulted in slightly stronger support for AD and ZCP (e.g. comparing support values for Fig. 2b and those in the ML analyses of Barrett et al., 2013), some of the most recalcitrant nodes in the monocot Tree of Life. Dense taxon sampling can ameliorate problems caused by the ‘node density artifact’ (Fitch & Bruschi, 1987; Venditti et al., 2006), specifically, that a lack of dense taxon sampling may result in extremely short internodes and a resulting lack of support. Since relationships are identical in this and earlier plastome-based studies among the major clades of the commelinids, there was therefore not a strong node-density artifact in earlier studies with sparser sampling – or at least the effect was not strong enough to affect topology. However, the observed slight increase in support in the current study may be due to the effects of denser taxon sampling for the two relatively short branches leading to AD and ZCP (Fig. 2).

Model based partitioning schemes affect support values for AD and ZCP; for the former, support values vary substantially, by 10. For both clades, codon partitioned models give lower support than gene or unpartitioned models, while those based on PartitionFinder give similar values to codon based models (Fig. 2b). Thus, methods that improve fit to the data based on gene × codon partitioning (e.g. PartitionFinder) are crucial to conclusions of support, especially for complex, heterogeneous datasets such as mono132.

Phylogenetic relationships among the palms based on protein coding genes and whole plastomes

There is complete resolution and strong support for relationships among the five palm subfamilies, reflecting relationships in previous multigene analyses (Figs 3, 4; Asmussen et al., 2006; Baker et al., 2009). Of particular interest is support for the placement of Nypa fruticans, which comprises the subfamily Nypoideae. Whereas earlier molecular studies were inconclusive on the positions of Nypa and Calamoideae relative to other palms (Uhl et al., 1995; Lewis & Doyle, 2001), later studies based on a few plastid loci but more complete taxon sampling suggested that Calamoideae are sister to Nypa + remaining Arecales. Here, the position of Nypa is supported as sister to (Coryphoideae, (Ceroxyloideae, Arecoideae)), and the calamoids sister to this entire clade, with 100% support in all analyses (Figs 3, 4). Asmussen et al. (2006) recovered strong support for this topology based on four plastid loci and nearly complete generic level sampling, while Baker et al. (2009) found strong support using all available data and generic level sampling in supertree analyses (e.g. > 90% of input trees supported Nypa in this position).

Asmussen et al. (2006) and Baker et al. (2000a,b, 2009; supertree analysis, but see supermatrix), recovered Eugeissona as sister to the rest of the subfamily Calamoideae. Here, Eugeissona is either sister to the remaining calamoids, or a clade of (Eremospatha, Mauritia) and then Eugeissona are successively sister to the remaining calamoids (Figs 3, 4; depending on the analysis). Findings of a strongly supported sister relationship between Eremospatha and Mauritia have implications for the monophyly of the tribe Lepidocaryeae, which was paraphyletic in the supermatrix analysis of Baker et al. (2009; but see supertree analysis). Sampling of additional calamoid genera, including Raphia, would likely yield stronger support in this part of the tree.

Support is strong for relationships among the tribes of Coryphoideae (Figs 3, 4). The current findings (Figs 3, 4) closely resemble the relationships of Baker et al. (2009) based on a supertree analysis using irreversible weights matrix representation with parsimony. Baker et al. (2009) recovered varying coryphoid relationships among several supertree and supermatrix analyses. Both their supertree and the trees recovered here contained two principal clades within Coryphoideae, each represented by four tribes; the only difference is the relative placement of Caryoteae and Chuniophoeniceae within the ‘syncaropus’ clade, which also contains Borasseae and Corypheae (Figs 3, 4; Dransfield et al., 2008; Baker et al., 2009). However, relationships among coryphoid tribes generally did not have strong support in the supertree analysis of Baker et al. (2009), whereas here they received strong support. Asmussen et al. (2006) recovered a weakly supported clade of (Trachycarpeae, Phoeniceae) as sister to (Chuniophoeniceae, (Caryoteae, (Corypheae, Borasseae))). Plastid genomes appear to resolve tribal relationships among the coryphoids with strong support.

The diverse and widespread tribe Trachycarpeae has been problematic from a phylogenetic perspective. Based on a multilocus approach with dense taxon sampling at the generic/species level, Bacon et al. (2012) were able to resolve many relationships among the genera of this tribe; our findings are similar, but the relative placement of Washingtonia, Pritchardia and Colpothrinax differs. Here (Figs 3, 4), the clade of (Pritchardia, Colpothrinax) is sister to (Washingtonia, remaining Trachycarpeae) with strong support, while Bacon et al. (2012) recovered (Washingtonia, Pritchardia) as sister to (Colpothrinax, remaining Trachycarpeae), but with weak support (< 75 in ML Bootstrap analysis). Support for a sister relationship between (Pritchardia, Colpothrinax) is moderate based on coding loci (Fig. 3), but strongly supported based on whole aligned plastomes (Fig. 4).

Except for the placement of Eugeissona among the calamoids, palm relationships are identical between coding and whole plastid matrices. This suggests that at least for slowly evolving groups, alignment of whole plastomes does not introduce enough homoplasy (i.e. among highly variable spacers and introns) as to drastically change relationships, and in many places increases support relative to more reduced matrices. The effect of data removal (relative to the ‘full’ plastome matrix) generally is a slight decrease in support (Fig. 4), with the exception of the placement of Licuala as sister to (Serenoa, Acoelorrhaphe).

Variation in plastid substitution rate among clades of the monocots

It has been hypothesized that monocots display extreme substitution rate heterogeneity, based on earlier plastid and nuclear studies (Bousquet et al., 1992; Gaut et al., 1992, 1996; Morton & Clegg, 1995), but these studies mainly included representatives of the palm and grass families and were based on few loci. Here we included sampling across all monocot orders, with representative plastomes of nearly all families of the commelinids. This study represents the largest genome-scale phylogeny of commelinids (e.g. compared with the 44 commelinids in Givnish et al., 2010; and 27 in Barrett et al., 2013) and reevaluation of a classic case of rate variation in the genomic era. Our findings (Figs 5-7) provide a genome-wide perspective on extreme plastid rate variation, support previous hypotheses, and further indicate that earlier findings are not an artifact of locus choice (e.g. rbcL, ADH), but that extreme substitutional rate variation is pervasive based on nearly complete coding complements of the plastome. Moreover, patristic distances are highly correlated among ML trees (coefficient = 1.0) based on codon positions 1 + 2 and position 3, suggesting that substitutional rate variation among clades is not driven exclusively by rate changes at codon position 3, but by changes across all codon positions. Based on Bayesian RLC analysis, we find strong evidence for c. 22 rate shifts among the commelinids (Fig. 7); additional sampling of taxa and loci may increase this estimate (but see earlier regarding issues with the convergence of the RLC model with all 75 genes). Accelerations mostly occur in the ZCP clade, and are prevalent among Commelinales, Zingiberales and Poales, while decelerations occur in Arecales.

These findings have phylogenetic implications, as extreme variation in substitution rates may cause issues in phylogenetic analysis (discussed in Rodriguez-Ezpeleta et al., 2007; Philippe & Roure, 2011), which is exacerbated by sparse taxon sampling. Here, we have sampled representatives of nearly all families of the commelinids, and observed highly similar large scale relationships (A, D), (ZC, P) relative to previous studies (Givnish et al., 2010; Barrett et al., 2013), with strong to moderate support, suggesting that the sparse sampling and rate heterogeneity in earlier studies did not severely obscure plastid relationships (Givnish et al., 2010; Barrett et al., 2013).

What are the potential correlates/drivers of such rate variation among commelinid clades? Some researchers have identified correlates across plants such as species diversity, generation time, growth form or environmental factors (Barraclough & Savolainen, 2001; Davies et al., 2004; Smith & Donoghue, 2008; reviewed in Burleigh, 2012; Lanfear et al., 2013). Lanfear et al. (2013) accounted for both phylogeny and a number of the aforementioned factors, and found that a strong relationship remained between plant height and substitution rate across angiosperms: taller angiosperms having slower substitution rates. This relationship was observed for both nuclear and plastid DNA. The authors explain their findings in terms of the ‘rate of mitosis’ (ROM) hypothesis. In plants, different tissues derived from the apical meristem may develop into reproductive tissues (accruing heritable, somatic mutations), unlike in animals, in which the germline is sequestered early in development. Furthermore, some plants may experience a deceleration of the rate of mitosis in the apical meristem as they near their maximum height (Lanfear et al., 2013), but whether this occurs in palms is unclear due to the need for replacement of ephemeral but essential organs such as leaves. Palms generally are the tallest of the monocots, and palm shoot tissues often derive from a single apical meristem (although many palms are sympodial, with clustered or branching stems). The ROM hypothesis is a good candidate for future investigation as a potential explanatory factor for diminished plastid substitution rates in the palms and rate accelerations in some lineages of Poales, Commelinales and Zingiberales (though some bamboos are > 10 m).

Different factors may affect substitution rates in different ways across the commelinids or the importance of multiple factors may also vary among clades. For example, accelerated substitution rates among clades within Poales (e.g. Poaceae) may be a combined product of high ROM (and short stature) and other life history traits such as C4 photosynthesis, potentially leading to high rates of speciation and widespread ecological dominance, but these remain as hypotheses to be tested. The potential correlates/drivers of rate variation among monocot orders warrant more explicit investigation, incorporating trait, environmental, nuclear and mitochondrial data, along with increased taxon sampling, and the use of multivariate, phylogenetic comparative methods (Lanfear et al., 2010).

Conclusion

Plastid gene sets generated by Illumina NGS, and representing nearly all families of the commelinid monocots, provide robust branch support and evidence for extreme heterogeneity of plastid substitution rates. Analysis of whole plastomes in the palms further resolves and provides strong support for relationships among the five subfamilies, and for relationships among tribes of the diverse subfamily Coryphoideae. Model based analyses suggest that commelinid orders deviate significantly from a global clock in terms of plastid substitution rates, and also that c. 22 rate shifts have occurred across the commelinids. These findings have implications for the systematics of the commelinid orders, which represent immense species diversity and ecological/economic importance, and lay the groundwork for future comparative tests of the drivers and correlates of substitution rates.

Acknowledgements

We thank the staff at Fairchild Tropical Botanical Garden/Montgomery Botanical Center, Nong Nooch Botanical Garden, Forest Research Institute Malaysia, Royal Botanic Gardens Kew, Huntington Botanical Gardens, and the Western Australian Department of Environment and Conservation for assistance collecting materials; Eric Antonieu, Raj Ayyampalayam, Connie Asmussen Lange, Thomas Couvreur, Ryan Ellingson, Elena Ghiban, Patrick Edger, John Kerry, Chang Liu, Michael McKain, Aakrosh Ratan, Cold Spring Harbor Laboratories, University of Missouri DNA Core Facility, Cornell Life Sciences Sequencing Center for technical assistance. Work was funded by National Science Foundation Grants DEB-0830020 (J.I.D.), DEB-0830009 (J.L-M. and W.B.Z.), DEB-0829849 (D.W.S.), and DEB-0829762 (J.C.P.).