Volume 214, Issue 2 p. 607-618
Full paper
Free Access

Can functional traits account for phylogenetic signal in community composition?

Daijiang Li

Corresponding Author

Daijiang Li

Department of Botany, University of Wisconsin, 430 Lincoln Drive, Madison, WI, 53706 USA

Author for correspondence:

Daijiang Li

Tel: +1 608 265 2191

Email: [email protected]

Search for more papers by this author
Anthony R. Ives

Anthony R. Ives

Department of Zoology, University of Wisconsin, 430 Lincoln Drive, Madison, WI, 53706 USA

Search for more papers by this author
Donald M. Waller

Donald M. Waller

Department of Botany, University of Wisconsin, 430 Lincoln Drive, Madison, WI, 53706 USA

Search for more papers by this author
First published: 03 January 2017
Citations: 37

Summary

  • Phylogenetic and functional trait-based analyses inform our understanding of community composition, yet methods for quantifying the overlap in information derived from functional traits and phylogenies remain underdeveloped. Does adding traits to analyses of community composition reduce the phylogenetic signal in the residual variation? If not, then measured functional traits alone may be insufficient to explain community assembly.
  • We propose a general statistical framework to quantify the proportion of phylogenetic pattern in community composition that remains after including measured functional traits. We then illustrate the framework with applications to two empirical data sets.
  • Both data sets showed strong phylogenetic attraction, with related species likely to co-occur in the same communities. In one data set, including traits eliminated all phylogenetic signals in the residual variation of both abundance and presence/absence patterns. In the second data set, including traits reduced phylogenetic signal in residuals by 25% and 98% for abundance and presence/absence data, respectively.
  • Our framework provides an explicit way to estimate how much phylogenetic community pattern remains in the residual variation after including measured functional traits. Knowing that functional traits account for most of the phylogenetic pattern should provide confidence that important traits for phylogenetic community structure have been identified. Conversely, knowing that there is unexplained residual phylogenetic information should spur the search for additional functional traits or other processes underlying community assembly.

Introduction

Functional traits, arising as innovations through evolution, capture essential aspects of species' morphology, ecophysiology, and life-history strategy (McGill et al., 2006; Violle et al., 2007). Although closely related species can differ greatly in some functional traits as a result of rapid evolution or ecological convergence (Losos, 2008, 2011), many functional traits are conserved enough to show strong phylogenetic signal (Freckleton et al., 2002; Webb et al., 2002; Moles et al., 2005; Donoghue, 2008). Functional traits, with or without phylogenetic signal, affect how environmental conditions filter species into communities and how species compete, mechanistically linking fundamental ecological processes to community structure (McGill et al., 2006; Violle et al., 2007; Adler et al., 2013). Functional traits also provide a common currency that facilitates comparisons among species and across regions, allowing us to assess the generality of patterns and predictions in community ecology (McGill et al., 2006). This has led to a proliferation of studies using functional traits to understand community composition. Functional trait-based approaches, however, are limited by the fact that it is impossible to measure all potentially important functional traits that affect the distribution of species.

Phylogenies play an important role in community ecology by providing information about evolutionary relationships among species (Graves & Gotelli, 1993; Losos, 1996; Cavender-Bares et al., 2006; Baum & Smith, 2012). Because phylogenetically related species often have similar functional trait values, we expect phylogenetically related species to co-occur more often in the same communities, reflecting their shared environmental tolerances. Conversely, if the similar traits that phylogenetically related species have cause them to compete strongly, then closely related species may be less likely to co-occur. These and other processes relating functional traits to community composition often lead to phylogenetic signatures in how species are distributed among communities (Webb et al., 2002).

We expect the information provided by functional traits and the information provided by phylogenetic analyses to overlap (Vane-Wright et al., 1991; Cavender-Bares et al., 2004; Cadotte et al., 2009; Ives & Helmus, 2011). We can test this expectation by incorporating functional trait information into analyses of community composition and asking whether there are residual phylogenetic patterns that remain unaccounted for by the functional traits. If so, this would suggest one of two things. First, there may be additional functional traits that have not been measured and incorporated into the analysis. Second, there may be phylogenetically correlated factors aside from functional traits that affect community structure; for example, phylogenetic patterns could be generated if there is immigration from another region that contains phylogenetically related species, or if stochastic processes leave a phylogenetic signature (Ricklefs & Schluter, 1993; Hubbell, 2001; Leibold et al., 2010). Thus, when residual phylogenetic patterns remain in analyses that incorporate functional trait information, we have evidence that there are additional factors affecting community assembly.

Here, we present a general three-step statistical framework to test for residual phylogenetic signal in community composition that remains unaccounted for after the inclusion of measured functional traits (Fig. 1). The first step is to test for phylogenetic pattern, either ‘phylogenetic attraction’ (phylogenetically related species more likely to occur in the same communities) or ‘phylogenetic repulsion’, using a generalized mixed model (Bolker et al., 2009) that incorporates phylogenetic correlations in the distributions of species (Ives & Helmus, 2011). If there is phylogenetic pattern, then it could be produced by measured functional traits that themselves have phylogenetic signal (Fig. 1a, arrows 2, 4, and 7), unmeasured functional traits with phylogenetic signal (Fig. 1a, arrows 2, 5, and 8), or phylogenetic processes unrelated to functional traits (Fig. 1a, arrow 6). The second step first adds measured functional traits to the model, testing to see if they help to explain the distribution of species among communities. After incorporating these explanatory traits, we then ask whether phylogenetic signal remains in the residual variation in community composition (Fig. 1b). If there is residual phylogenetic pattern, we continue to add traits to see if these account for this signal. Thus, the second step investigates the extent to which we can account for the phylogenetic pattern in community composition using the measured functional traits. If phylogenetic patterns remain even after incorporating all selected measured traits, we can then apply a third step. This uses environmental data to test whether phylogenetically related species respond similarly to environmental gradients across communities. Such parallel species' responses to environmental gradients allow us to indirectly identify possible unmeasured functional traits that could play a role in community assembly (Fig. 1a, arrow 8, unmeasured traits with phylogenetic signal). In cases where phylogenetically related species respond similarly to an environmental gradient, species presumably share traits that confer similar tolerances to, or preferences for, specific environmental conditions. Although these three steps employ existing statistical methodology, the specific models have not previously been incorporated into a general framework to investigate the overlap between information provided by functional traits and phylogenies. To illustrate this framework and explore its utility, we analyse two community data sets rich in information on traits, phylogenetic relationships, and environmental variables.

Details are in the caption following the image
Diagram of the conceptual framework of the study. (a) Some traits have phylogenetic signal that reflects phylogenetic history (arrows 2, 4 and 5) while other traits do not (arrows 1 and 3), possibly because these traits evolve rapidly or experience convergent evolution. Phylogenetic patterns in community composition can be generated by measured and unmeasured traits with phylogenetic signal (arrows 7 and 8), and by other phylogenetic processes such as biogeographical patterns in species distributions (arrow 6). The question we address is how much of the phylogenetic signal in community composition can be explained by measured functional traits, and whether there is residual phylogenetic signal that could have been generated by unmeasured traits or other phylogenetic processes. (b) Traits and phylogeny contain overlapping information about how communities are assembled. We estimate the proportion of this overlapping information by calculating the changes in the strength of community phylogenetic patterns without (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0001) and with (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0002) measured functional traits in the models.

Other studies have simultaneously investigated both trait-based and phylogenetic patterns in community composition (e.g. Cavender-Bares et al., 2006; Swenson & Enquist, 2009). As many functional traits can affect community composition, it is important to analyse several traits when testing whether residual phylogenetic patterns exist after incorporating traits. Previous studies investigating multiple functional traits generally use one metric to measure multidimensional differences between pairs of species in functional trait-space and a second metric to measure phylogenetic differences between pairs of species. They then compare the explanatory power of these two metrics. Recently, Cadotte et al. (2013) created a synthetic metric that combines trait and phylogenetic distances by varying a weighting parameter. This allowed them to analyse differences between communities using models that varied this metric to span the gradient from traits alone to phylogeny alone. Our approach differs from these metric-based approaches in several ways. First, rather than reducing information on traits to a one-dimensional gradient of differences among species, our modelling approach retains specific values for all traits for each species. This is consistent with typical phylogenetic analyses (Felsenstein, 1985; Harvey & Pagel, 1991; Garland et al., 2005) that treat species' traits as independent variables (the ‘mean’ part of the model), while phylogenetic patterns are assessed as covariances in the residuals (the ‘variance’ part of the model). Second, because our model-based approach retains more information than metric-based approaches, it is expected to have greater statistical power (Ives & Helmus, 2011). Third, although metric-based approaches can effectively identify trait-based and phylogenetic patterns in specific communities, they do not provide any overall test for overlap between information derived from measured functional traits and phylogenies. Providing such a test is the focus of our analyses. If measured traits together reduce all phylogenetic signal in the residual variation in species distribution patterns, we can conclude that these traits (or other traits that are strongly correlated) account for phylogenetic patterns in community composition. Conversely, if measured traits do not reduce most of the residual phylogenetic patterns, then we can infer that either additional traits or additional biogeographical processes need to be investigated.

Materials and Methods

We apply our three-step statistical framework to both species abundance and presence/absence (incidence) data drawn from two illustrative empirical studies. In the main text we present results for abundance data because abundance data typically provide more information about community assembly (Freilich & Connolly, 2015). In the Supporting Information we present parallel results for the presence/absence data.

Phylogenetic community composition

Our first step tests for phylogenetic community structure in species abundances without including environmental or functional trait information by applying a phylogenetic linear mixed model (PLMM). To build the PLMM, let n be the number of species distributed among m sites. Letting Y be the mn × 1 vector containing the abundance of species j (j  = 1, …, n) at site s (s  = 1, …, m), the PLMM is:
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0003(Eqn 1)

Here we use the convention of multilevel models, with fixed and random effects given by Greek and Latin letters, respectively (Gelman & Hill, 2007). The function spp[i] maps the observation i in vector Y to the identity of the species, so i takes values from 1 to mn (Gelman & Hill, 2007, 251–252). The intercept α estimates the overall average log abundance of species across all sites. The following three random variables aspp[i], bspp[i] and ci incorporate variation in abundance among plant species. Specifically, aspp[i] gives differences among species in mean log abundance across all sites, with differences among the n species assumed to be drawn independently from a Gaussian distribution with mean 0 and variance urn:x-wiley:0028646X:media:nph14397:nph14397-math-0004. bspp[i] also gives differences in mean log abundance among species across sites, but differences among species are assumed to be drawn from a multivariate Gaussian distribution with covariance matrix urn:x-wiley:0028646X:media:nph14397:nph14397-math-0005Ʃspp, where the × n matrix Ʃspp is derived from the phylogeny, and the scalar urn:x-wiley:0028646X:media:nph14397:nph14397-math-0006 dictates the overall strength of the phylogenetic signal (see next paragraph). Thus, aspp[i] and bspp[i] together capture variation in mean species log abundances that is either unrelated to phylogeny or has phylogenetic signal. The random variable ci accounts for covariance in the log abundances of plant species nested within sites (using the Kronecker product, kron). Specifically, ci assesses whether phylogenetically related plant species are more or less likely to co-occur at the same sites. Hence, ci measures the overall strength of phylogenetic attraction or repulsion, and it is the key term we are interested in. Random effect dsite[i] accounts for site-specific variation in the mean log abundances of all species among the m sites, with these m values assumed to be distributed by a Gaussian distribution with variance urn:x-wiley:0028646X:media:nph14397:nph14397-math-0007. Finally, ei captures the residual variance urn:x-wiley:0028646X:media:nph14397:nph14397-math-0008.

We derive the phylogenetic covariance matrix Ʃspp from the assumption of Brownian motion evolution. If a continuous-valued trait evolves up a phylogenetic tree with a constant probability of slight increases or decreases, the covariance in trait values between two species will be proportional to the length of shared evolution given by the distance on the phylogenetic tree between the root and the species' most recent common ancestor (Martins & Hansen, 1997). This gives a direct way to convert the phylogeny into a hypothesis about the covariance matrix. For the mean abundances of species among sites given by aspp[i] + bspp[i], the variance among species given by urn:x-wiley:0028646X:media:nph14397:nph14397-math-0009I + urn:x-wiley:0028646X:media:nph14397:nph14397-math-0010Ʃspp is equivalent to the model of evolution proposed by Pagel (1999). For the assessment of phylogenetic attraction within sites, ci, we use Ʃnested = Ʃspp. For phylogenetic repulsion, we use the matrix inverse of Ʃspp, Ʃnested = (Ʃspp)−1. Theoretical justification for Ʃnested = (Ʃspp)−1 comes from a model of competition among community members (Appendix A of Ives & Helmus, 2011). Briefly, if the strength of competition between species is given by Ʃspp, as expected if closely related species exploit similar resources, then the relative abundances of species will have covariance matrix (Ʃspp)−1.

Eqn 1 is the same as model I in Ives & Helmus (2011) except that model I includes variation among species in mean log abundance across sites as fixed effects rather than two random effects, aspp[i] and bspp[i]. This change allows us to align Eqn 1 with Eqn 2 (later) that includes variation in the relationship between trait values and log abundance within sites as random effects. In our analyses, treating variation among species in mean log abundance as fixed effects led to almost identical estimates of phylogenetic signal (estimates of urn:x-wiley:0028646X:media:nph14397:nph14397-math-0011), and so our treatment of aspp[i] and bspp[i] as random effects does not change our conclusions.

Statistical significance of the variance terms σ2 can be determined using a likelihood ratio test. Because the null hypothesis σ2 = 0 is on the boundary of the parameter space (σ2 cannot be negative), we use the 0.5 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0012 + 0.5 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0013 mixture distribution for significance tests (Self & Liang, 1987). urn:x-wiley:0028646X:media:nph14397:nph14397-math-0014 represents a distribution with a point mass at 0, so the P-values given by 0.5 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0015 + 0.5 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0016 are one-half the values that would be calculated from a standard likelihood ratio test using urn:x-wiley:0028646X:media:nph14397:nph14397-math-0017. Simulations suggest that P-values calculated in this way are more conservative (give higher P-values) than those derived from a parametric bootstrap (Supporting Information Notes S1).

Are there residual phylogenetic patterns after incorporating functional traits?

After the detection of phylogenetic patterns, our second step asks how much of the phylogenetic patterns identified in step 1 remains after including functional traits in the model. We selected measured functional traits to be incorporated into the model with two aims. First, we wanted to include functional traits that are important in explaining species composition of communities, fulfilling one of the most common goals of community ecology. Second, we wanted to include additional functional traits that can further reduce phylogenetic patterns in the residual variation, fulfilling the goal of this study.

We incorporated functional traits into a statistical model of community composition using the model:
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0018(Eqn 2)

This model is the same as Eqn 1, except that it includes the values of a single functional trait, tspp[i]. The fixed and random effects terms (β + fsite[i])tspp[i] allow the effect of trait tspp[i] to vary among sites, with the fixed term β giving the mean slope of the relationship between log abundance and tspp[i] across all sites, and the random term fsite[i] giving variation in this slope among sites. The proportion of phylogenetic signal in species composition (estimated by urn:x-wiley:0028646X:media:nph14397:nph14397-math-0019) reduced by the incorporation of trait tspp[i] is assessed by comparing urn:x-wiley:0028646X:media:nph14397:nph14397-math-0020 between models with and without this trait in the product fsite[i] tspp[i]. To assess more than one functional trait, we built a multivariate version of Eqn 2; each trait is incorporated in the same way by replicating the terms (β + fsite[i])tspp[i].

To identify functional traits that are important in explaining species composition of communities, we performed model selection using the Akaike information criterion (AIC). Because of the large number of possible models, we used forward selection as proposed by Jamil et al. (2012). We did not use backward selection, because backward selection is often unfeasible because of mathematical convergence problems if the full model has many variables. To select traits that explain patterns of community composition, we started with the model containing only species and sites as random terms and excluding phylogenetic variances (Eqn 1 without terms bspp[i] and ci). We then selected traits as fixed terms βtspp[i] (Eqn 2 without terms bspp[i] and ci). These fixed terms represent the average responses across all sites; thus, when a fixed term is included, it implies that, on average across all sites, species may have high or low abundance depending on their trait value. We then selected traits as random terms including also their fixed terms (β + fsite[i])tspp[i]. The random terms represent differences among sites in how species traits affect species abundances. Thus, if fsite[i] is included, then some sites might have high abundances of species with a given trait value, while other sites would have low abundances of species with the same trait value. We always included the fixed term for a trait when including it as a random term, because we did not want to force the random effects to have a mean of zero. After all traits that improved the fit of the model (either as fixed terms or as fixed and random terms) were selected, we selected and added the phylogenetic terms bspp[i] and ci (Eqn 2). This process should give the lowest AIC phylogenetic model, which we checked by removing each of the random and fixed effects while retaining the phylogenetic terms bspp[i] and ci in the model. To ask whether the addition of more traits could reduce the residual phylogenetic pattern (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0021) any further, regardless of whether this decreased the fit of the model (increased AIC), we then added each remaining trait as fixed and random terms. We tried all traits regardless of whether they were correlated with each other, because our goal was to test the overall phylogenetic pattern in the residual variation after including functional traits rather than to separate the contribution of each functional trait. For this purpose, multicollinearity does not interfere with the assessment of phylogenetic signal in the residual variation.

The process outlined in the above paragraph will identify functional traits that are important both in explaining species composition and in reducing phylogenetic signal in the residual variation. To provide additional evidence that the traits identified reduce phylogenetic signal in the residual variation, we analysed each trait separately to test for two anticipated properties. The first property is that the functional trait should itself show phylogenetic signal among species (i.e. related species have similar functional trait values); otherwise, a trait could not produce (and hence reduce in the model) phylogenetic signal in species' abundances. Therefore, we tested each continuous trait for phylogenetic signal among species using Pagel's λ (Pagel, 1999), and for binary traits we used phylogenetic logistic regression (Ives & Garland, 2010). We also tested for phylogenetic signal in continuous and binary functional traits using Blomberg's K (Blomberg et al., 2003). We used Blomberg's K in addition to model-based methods because K makes no specific assumption about the model form of phylogenetic signal (e.g. whether phylogenetic signal varies as a result of stabilizing selection vs accelerating rates of evolution). As shown in the Results section, K identified some traits as having phylogenetic signal that were not identified by Pagel's λ or phylogenetic logistic regression. The second property is that there should be variation among sites in the relationship between species trait values and abundances. If a trait has phylogenetic signal but there is no variation in relationships between plant functional trait values and abundances among sites, then the trait will contribute to the overall phylogenetic signal of species abundance as captured by bspp[i] in Eqn 1, but it probably will not affect phylogenetic patterns nested within sites captured by ci. Therefore, we tested for variation among sites in the relationship between trait values and log abundances using the linear mixed model (LMM):
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0022(Eqn 3)
where tspp[i] is the focal functional trait value of the species corresponding to observation i, and urn:x-wiley:0028646X:media:nph14397:nph14397-math-0023 gives the variation among sites in the relationship between trait values and log abundances. This formulation is closely related to the model used by Pollock et al. (2012), although we included a main effect for traits. If urn:x-wiley:0028646X:media:nph14397:nph14397-math-0024 > 0, we concluded that different sites select species differently based on the focal trait. We used a significance threshold of < 0.1 here to lower the risk of excluding potentially important functional traits.

Which environmental variables drive phylogenetic pattern?

Sometimes phylogenetic patterns in community composition are observed, and there is still strong phylogenetic signal in the residual variation after including measured functional traits. In such cases, how can we identify additional functional traits that might further reduce residual phylogenetic pattern? Phylogenetically related species are usually assumed to be ecologically similar as a consequence of niche conservatism (Wiens et al., 2010). Therefore, related species will tend to have similar responses to environmental variables. If these environmental variables are strong enough to drive phylogenetic patterns in community composition, then functional traits associated with tolerance or sensitivity to these environmental variables will probably affect community composition. We can therefore investigate phylogenetic patterns in how species respond to environmental variables in order to infer possible additional, unmeasured functional traits that might help to explain patterns in community composition.

We tested for phylogenetic patterns in the responses of species to environmental variables using the PLMM:
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0025(Eqn 4)

Here, gspp[i] and hspp[i] represent nonphylogenetic and phylogenetic variation among species in their response to environmental variable x, respectively (see Model II in Ives & Helmus, 2011). The key parameter of interest is urn:x-wiley:0028646X:media:nph14397:nph14397-math-0026 which we tested using a likelihood ratio test. If urn:x-wiley:0028646X:media:nph14397:nph14397-math-0027 > 0, phylogenetically related species respond to environmental variable x in similar ways, suggesting the existence of an unmeasured phylogenetically inherited trait associated with species tolerances or sensitivities to x. Similar to Eqn 2, multiple environmental variables can be included by replicating the term (β + gspp[i] + hspp[i])xsite[i] for each additional variable x.

Empirical examples

We applied our analytical framework to two empirical data sets including trait, community, and environmental data. The first is the Dune Meadows data from Jongman et al. (1995) and the second is from the Wisconsin Pine Barrens (Li & Waller, 2015). The Dune Meadows data set consists of 30 plant species in 20 sites. Five environmental variables were measured for each site: thickness of the soil A1 horizon (A1), moisture content of the soil (moisture), agriculture land use (use), amount of manure applied (manure), and grassland management type (management). Five functional traits for higher plants were obtained by Jamil et al. (2013), including specific leaf area (SLA; m2 kg−1), plant height (cm), leaf dry matter content (LDMC; %), seed mass (g per seed), and life history (annual or perennial). We removed the two moss species as they lack functional trait data, resulting in 28 species across the 20 sites.

The Pine Barrens data consist of 152 species distributed among 30 Pine Barrens forest sites in the central Wisconsin sand plains (Li & Waller, 2015). We measured 20 environmental variables at each site, including soil properties (17 variables), climatic conditions (minimum temperature and precipitation), and canopy cover. The vegetation data set and canopy cover have been published in Li & Waller (2015). For the 55 focal species that occurred in three or more of the 30 communities, we measured 11 continuous and four categorical functional traits on at least 12 individuals (four from each of at least three populations) using standard protocols (Perez-Harguindeguy et al., 2013). Continuous traits include seed mass (g per seed), plant height (cm), SLA (m2 kg−1), LDMC (%), leaf circularity (dimensionless), leaf length (cm), leaf width (cm), leaf thickness (mm), leaf carbon concentration (%), leaf nitrogen concentration (%), and stem dry matter content (SDMC; %). We aggregated each categorical trait into two levels: growth form (woody vs nonwoody), life cycle (annual vs nonannual), and pollination mode (biotic vs abiotic). We divided seed dispersal mode into three binary variables (wind dispersed vs not, animal dispersed vs not, and unassisted vs assisted dispersal).

For both data sets, the available functional traits cover the leaf–height–seed (LHS) plant ecology strategy (Westoby, 1998) and represent multidimensional functions of plants associated with resource use, competitive ability, dispersal ability, and so on. For analyses, we log-transformed highly skewed traits and then Z-transformed all numerical trait values to have means of 0 and standard deviations of 1, allowing coefficients in the mixed models to be interpreted as effect sizes.

We obtained phylogenies for both sets of species from the super-phylogeny provided by Zanne et al. (2014) using the program Phylocom (Webb et al., 2008). This time-calibrated phylogeny was constructed from seven gene regions for 32 223 plant species using maximum-likelihood estimates in RAxML (Stamatakis, 2014). Use of the same super-phylogeny ensures that any differences between data sets in phylogenetic patterns are free from biases that might arise if we had instead used phylogenies constructed using different methods.

We fitted the PLMMs and LMMs (Eqns Eqn 1-Eqn 4) with maximum likelihood using function communityPGLMM in the pez (Pearse et al., 2015) package of R (R Core Team, 2015). We have provided R code for the Dune Meadows as Notes S2. Phylogenetic signals of functional traits were tested using R packages phylolm (Ho & Ané, 2014) and picante (Kembel et al., 2010).

Results

Phylogenetic community composition

Phylogenetically related species co-occurred more often than expected by chance in both Dune Meadows and Pine Barrens communities (Fig. 2). The PLMM (Eqn 1) revealed significant phylogenetic attraction for abundance data in both the Dune Meadows and the Pine Barrens communities (= 0.005 and 0.013, respectively; Table 1). For presence/absence data, the Dune Meadows data had positive but nonsignificant phylogenetic attraction (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0028 = 0.032, P = 0.147; Table S1), whereas phylogenetic attraction in the Pine Barrens was positive and significant (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0029 = 0.044; P = 0.018; Table S1). We found no evidence for phylogenetic repulsion in either data set (Tables 1, S1).

Details are in the caption following the image
Phylogeny and relative abundance of plant species found in the Dune Meadows and the Pine Barrens communities. The size of dots is proportional to abundances within each site.
Table 1. Estimated components of variance from the phylogenetic linear mixed model (PLMM) of species abundances (log-transformed) in Dune Meadows and Pine Barrens data sets
Data set PLMM urn:x-wiley:0028646X:media:nph14397:nph14397-math-0030 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0031 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0032 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0033 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0034 P (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0035 = 0)
Dune Meadows

Phylogenetic attraction:

c ~ Gaussian(0, kron(Im, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0036Ʃspp))

0.80 0.00958 0.00781 0.00265 0.36 0.005

Phylogenetic repulsion:

c ~ Gaussian(0, kron(Im, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0037spp)−1)

0.90 0.00126 0 0.00291 0.39 0.500
Nonnested model: c removed 0.90 0.00291 0.00126 0.39
Pine Barrens

Phylogenetic attraction:

c ~ Gaussian(0, kron(Im, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0038Ʃspp))

0.98 0 0.00940 0.00266 0.51 0.013

Phylogenetic repulsion:

c ~ Gaussian(0, kron(Im, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0039spp)−1)

0.98 0 0 0.0228 0.53 0.500
Nonnested model: c removed 0.98 0.0229 0 0.53
  • Phylogenetic attraction and repulsion are estimated in separate models by urn:x-wiley:0028646X:media:nph14397:nph14397-math-0040 (Eqn 1). urn:x-wiley:0028646X:media:nph14397:nph14397-math-0041 and urn:x-wiley:0028646X:media:nph14397:nph14397-math-0042 are the estimated variances of the overall abundance of species partitioned into nonphylogenetic (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0043) and phylogenetic (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0044) components; urn:x-wiley:0028646X:media:nph14397:nph14397-math-0045 is the estimated variance of overall abundance for all sites; and urn:x-wiley:0028646X:media:nph14397:nph14397-math-0046 is the residual variance. Significant results (< 0.05) are given in bold.

Are there residual phylogenetic patterns after incorporating functional traits?

Because species abundances in both Dune Meadows and the Pine Barrens communities showed phylogenetic attraction, we investigated whether these phylogenetic patterns remained after incorporating information about functional traits. Based on Eqn 2, we first added traits as fixed and random effects that provided information (as measured by AIC) about the abundances of species among communities. We then added any additional traits to the model that reduced the remaining phylogenetic signal in the residual variation (if any) of species abundances among sites.

In the Dune Meadows communities, SLA and life history (annual or perennial) were selected as important traits in the model of species abundances as measured by the AIC (Eqn 2 without terms bspp[i] and ci). Including these two traits as fixed terms and SLA as a random term reduced the phylogenetic variation (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0047, Eqn 2) by 20% (Table 2a). Including the additional trait seed mass as both fixed and random terms further reduced urn:x-wiley:0028646X:media:nph14397:nph14397-math-0048 by 5%. SLA, life history, and seed mass thus reduced only a small fraction (25%) of the total phylogenetic signal in community composition (abundance patterns of species across sites), with strong phylogenetic attraction remaining after including the traits (Table 2b). This finding is consistent with additional evidence provided by analysing all five traits separately (Tables 3, S3). Although one or two (based on Pagel's λ or Blomberg's K, respectively) of the five traits showed phylogenetic signal among species, no trait explained any of the variance among species (regardless of phylogeny) in the log abundance of species among sites (in Eqn 3 the estimate of urn:x-wiley:0028646X:media:nph14397:nph14397-math-0049 was zero). By contrast, the parallel analyses of species presence/absence in the Dune Meadows data show that including four of five traits reduced phylogenetic signal to almost zero (although the initial value of urn:x-wiley:0028646X:media:nph14397:nph14397-math-0050 = 0.0398 was not significant; Table S2).

Table 2. Proportion of phylogenetic signal of species abundances in the Dune Meadows and Pine Barrens communities reduced after including measured functional traits (Eqn 2); (a) models for both data sets with measured functional traits that are important in explaining species composition; (b) models with additional measured functional traits that can further reduce phylogenetic variation in the residual variation; because there is no residual phylogenetic variance (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0051 = 0) for Pine Barrens after the first selection, additional functional traits were not added
Dune Meadows Pine Barrens
(a) Terms With traits Without traits Decrease in signal Terms With traits Without traits Decrease in signal
Random terms urn:x-wiley:0028646X:media:nph14397:nph14397-math-0052 0.0344 0.0316 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0053 0.8780 0.8771
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0054 0.0004 0.0005 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0055 0 0
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0056 0.0066* 0.0082* 20% urn:x-wiley:0028646X:media:nph14397:nph14397-math-0057 0.0000 0.0094* 100%
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0058 0.0030 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0059 0.0000
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0060 0.0984
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0061 0.0144 0.0026 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0062 0.0058 0.0027
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0063 0.3467 0.3554 urn:x-wiley:0028646X:media:nph14397:nph14397-math-0064 0.5120 0.5137
Fixed terms Intercept 0.510*** 0.510*** Intercept 1.612** 1.612**
SLA 0.172** 0.170*** Seed.mass 0.082* 0.083*
Annual −0.387** −0.387** Polli.mode −0.844 −0.844
Circ 0.249 0.249
L.width −0.029 −0.029
Dune Meadows
(b) Terms With traits Without traits Decrease in signal
Random terms urn:x-wiley:0028646X:media:nph14397:nph14397-math-0065 0.0364 0.0312
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0066 0.0004 0.0004
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0067 0.0062* 0.0083* 25%
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0068 0.0155
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0069 0.0134
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0070 0.0036 0.0027
urn:x-wiley:0028646X:media:nph14397:nph14397-math-0071 0.3332 0.3551
Fixed terms Intercept 0.456*** 0.455***
SLA 0.172** 0.169***
Annual −0.138** −0.140**
Seed.mass −0.029 −0.028
  • urn:x-wiley:0028646X:media:nph14397:nph14397-math-0072 and urn:x-wiley:0028646X:media:nph14397:nph14397-math-0073 are the estimated variances of the overall abundance of species partitioned into nonphylogenetic (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0074) and phylogenetic (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0075) components. SLA, specific leaf area; Seed.mass, seed mass; Polli.mode, pollination mode; Circ, leaf circularity; L.width, leaf width. *, < 0.05; **, < 0.01; ***, < 0.001.
Table 3. Phylogenetic signal present in the measured functional traits of Dune Meadows and Pine Barrens
Data set Trait Pagel's λ K P (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0076 = 0)
Dune Meadows Leaf specific area (SLA; m2 kg−1) 0.63 0.09 0.500
Leaf dry mass content (LDMC; %) 0.76* 0.11* 0.500
Plant height (cm) 0.00 0.12* 0.500
Seed mass (g per seed) 0.00 0.07 0.428
Life cycle (annual or nonannual) 0.00 0.06 0.500
Pine Barrens Leaf specific area (SLA; m 2  kg −1 ) 0.57** 0.32** 0.002
Leaf circularity (dimensionless) 1.00*** 0.95*** 0.001
Leaf thickness (mm) 0.75*** 0.78*** 0.001
Leaf width (cm) 1.00*** 0.61*** 0.008
Animal dispersal (yes or no) 0.83*** 0.52*** 0.054
Life cycle (annual or nonannual) 0.00 0.28 0.479
Growth habit (woody or nonwoody) 1.37*** 0.48** 0.500
Pollination mode (biotic or abiotic) 0.08 0.20 0.500
Seed mass (g per seed) 0.66 0.36* 0.373
Leaf dry mass content (LDMC; %) 0.54* 0.29** 0.500
Stem dry mass content (SDMC; %) 0.48 0.25* 0.500
Plant height (cm) 0.76** 0.31** 0.500
Leaf length (cm) 0.77*** 0.39** 0.500
Leaf carbon content (%) 0.68** 0.16 0.500
Leaf nitrogen content (%) 0.00 0.15 0.334
Wind dispersal (yes or no) 1.17*** 0.61*** 0.265
Unassisted dispersal (yes or no) 0.00 0.15 0.500
  • We expected that functional traits playing roles in phylogenetic patterns will have phylogenetic signal and differences among sites in how they affect the abundance of species (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0077 = 0; Eqn 3; P-values in last column). •, < 0.1; *, < 0.05; **, < 0.01; ***, < 0.001. Significant results (< 0.10) are given in bold.

In the Pine Barrens communities, leaf circularity, leaf width, seed mass, and pollination mode were identified as important in explaining species abundances among sites as measured by the reduction in AIC of the nonphylogenetic model of species abundances (Eqn 2 without terms bspp[i] and ci; Table 2a). Including these four traits in the phylogenetic model (all of them as fixed terms; leaf circularity and leaf width as random terms) eliminated phylogenetic variation in the residuals (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0078 went to 0 in Eqn 2). Thus, these functional traits suffice to account for all the phylogenetic signal in the composition of the Pine Barrens communities. In analyses of each trait, most traits showed strong phylogenetic signal (Table 3). Five traits – leaf circularity, leaf width, leaf thickness, SLA, and animal dispersal (marginally) – also affected plant species' abundances among sites (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0079 > 0, Eqn 3; Table 3), indicating that different sites favour particular species based on these traits. Among these five traits, each trait covaries with either leaf circularity or leaf width. For example, animal dispersal covaries with leaf circularity (r = 0.484), and SLA and leaf thickness covary with leaf width (r = 0.458 and −0.504, respectively). Therefore, even though only leaf circularity and leaf width were included via forward selection in the final model (Table 2a), these might have accounted for possible phylogenetic signal produced by the three other traits. This result provides supporting evidence that these traits are important in accounting for phylogenetic patterns in community structure. In parallel analyses of species presence/absence, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0080 again declined to zero after including traits via forward selection (Table S2a).

The overall fits of the models give a detailed statistical description of the variation in abundance and presence/absence among communities. For abundance in the Pine Barrens data set (Table 2a), the model with variation in the effects of traits among sites (random effects) has fixed effects of leaf circularity, leaf width, seed mass, and pollination mode. Coefficients of seed mass and leaf circularity are positive, suggesting that species with higher values of each of the traits are more common across all communities. Leaf width has associated nonzero random effects, implying that sites differ in the relationships between leaf width and species abundance. There is large among-species variation in mean log abundance (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0081 = 0.878). Interestingly, this variation lacks any phylogenetic component (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0082 = 0), meaning that there is no phylogenetic signal in overall abundance. There is some site-to-site variation in mean log total abundance across all species (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0083 = 0.0058). This reflects the fact that some sites have higher overall species abundance. As discussed in the preceding paragraph, there is no site-to-site variation in log abundances in which phylogenetically related species are more likely to show high abundance in the same sites (urn:x-wiley:0028646X:media:nph14397:nph14397-math-0084 = 0). Finally, the residual variance urn:x-wiley:0028646X:media:nph14397:nph14397-math-0085 = 0.512 implies a lot of variation in the log abundances of individual species among sites that is not explained by any other component of the model. This example shows that the fully fitted PLMM provides a detailed accounting of the effects of traits on variation in community composition, assessing patterns of both variation among species and the variation among sites.

Which environmental variables drive phylogenetic pattern?

If including trait information in the PLMM of community composition (Eqn 2) leaves residual phylogenetic signal in the unexplained variation, this would suggest that there are other unmeasured traits that affect community composition. To search for any such additional traits, we investigated whether there was phylogenetic signal in the response of species to different environmental variables. Such signals could help us narrow the search for additional traits to those that affect the response of species to the environmental variables.

In both data sets, species varied greatly in how their abundances responded to the measured environmental variables (Table 4). However, in the Dune Meadows data set we only found strong phylogenetic signal in the variation of species responses to manure (last column, Table 4). Thus, related species tend to respond similarly to similar applications of manure. This suggests that functional traits related to responses of plants to soil nutrients may help to further explain patterns in community composition.

Table 4. Variation in species abundances in relation to environmental variables (Eqn 4)
Data set Environmental variables P-values urn:x-wiley:0028646X:media:nph14397:nph14397-math-0086 (no phylogenetic signal) P-values for urn:x-wiley:0028646X:media:nph14397:nph14397-math-0087 (phylogenetic signal)
Dune Meadows Manure < 0.001 0.025
Soil moisture < 0.001 0.500
Soil A1 depth < 0.001 0.500
Management 0.221 0.186
Land use 0.099 0.350
Pine Barrens Minimum temperature < 0.001 0.500
Precipitation < 0.001 0.500
Canopy shade 0.002 0.500
Total exchange capacity 0.002 0.500
Organic matter 0.001 0.500
pH < 0.001 0.500
Nitrogen (N) < 0.001 0.500
Phosphorus (P) 0.039 0.500
Magnesium (Mg) 0.030 0.500
Potassium (K) 0.007 0.500
Sodium (Na) < 0.001 0.500
Manganese (Mn) < 0.001 0.500
Calcium (Ca) < 0.001 0.208
Clay 0.110 0.500
Silt 0.070 0.500
Sand 0.117 0.500
Iron (Fe) 0.500 0.500
Sulphur (S) 0.458 0.500
Zinc (Zn) 0.500 0.500
Aluminium (Al) 0.500 0.500
  • In the Dune Meadows data set, only species' responses to manure levels showed phylogenetic signal; in the Pine Barrens data set, although 13 of the 20 environmental variables co-varied with variation in species abundances among sites, none showed phylogenetic signal in these responses. Significant results (< 0.05) are given in bold.

In the Pine Barrens abundance data, we found no phylogenetic signal in the variation among species in their responses to any measured environmental variable. By contrast, in the presence/absence data we found strong phylogenetic signal in species' responses to minimum temperature, soil pH, and calcium (Ca) and manganese (Mn) concentrations (PGLMM, Table S4). Related species thus tend to occupy similar Pine Barrens sites as measured by these environmental variables. This pattern, again reflecting mostly soil conditions, suggests that functional traits related to nutrient acquisition and physiology are affecting species incidence across sites. This is true even though four traits – pollination mode, seed mass, leaf width, and wind dispersal – together eliminated phylogenetic signal in the residual variation (Table S2).

Discussion

Our analyses address the question: when models of community structure incorporate functional trait data, how much phylogenetic pattern remains in the unexplained variation? If no residual phylogenetic patterns in species abundance or presence/absence remain after including measured functional traits, then phylogenetic information itself may provide little additional insight into community assembly. In contrast, if significant residual phylogenetic patterns remain after including measured functional traits, then phylogenetic relationships include important information beyond what is provided by these traits. Therefore, answers to this question provide an important starting point for the search for additional traits or other factors that could underlie community assembly.

Our statistical framework addresses this question using phylogenetic mixed models (PLMMs and PGLMMs). We found that phylogenetically related species are indeed more likely to occupy the same sites and reach similar abundances within two quite distinct communities: Dune Meadows in the Netherlands and Pine Barrens forests in Wisconsin, USA. In the Pine Barrens communities with more species, sites, and functional traits, we found no phylogenetic signal in the residual variation in species abundance and presence/absence among sites after including functional traits in the phylogenetic mixed models (Eqn 2). In the simpler Dune Meadows communities, two of five traits had phylogenetic signal, and these traits removed all phylogenetic pattern in the residual variation in species presence/absence among sites. By contrast, these traits only reduced phylogenetic patterns in species abundance by 25% in the PLMM.

Incorporating functional traits reduced the phylogenetic component of residual variation in species composition for both data sets, but what could explain the remaining phylogenetic component of the variation in abundance of species among sites in the Dune Meadow communities (Fig. 1b, urn:x-wiley:0028646X:media:nph14397:nph14397-math-0088)? Some unknown historical biogeographical process or meta-community dynamics could account for this. However, sites in the Dune Meadow data sets all occur within the same 86-km2 island, making it unlikely that historical biogeographical processes played a key role in the assembly of these communities. It thus seems more likely that unmeasured functional traits played a role. Species abundances among Dune Meadows sites varied in apparent response to several measures of soil conditions, with phylogenetically related species responding similarly to nutrient concentrations (manure in Table 4). This response suggests that unmeasured traits (with a phylogenetic component) related to nutrient uptake and physiological processes affect the abundance of species across these Dune Meadows communities. The scarcity of data on traits related to belowground nutrient acquisition processes (root structure, mycorrhizal associations, etc.) may thus limit our ability to account for variation in species' abundances.

Despite the fact that measured traits appear unrelated to species' responses to the soil and climate gradients used in Table 4, measured traits still reduced much of the residual phylogenetic pattern in community composition, especially in the Pine Barrens communities. A possible explanation for this is that traits affecting how species respond to soil and climate variables covary with some of the measured traits. If traits covary, phylogenetic patterns driven by one trait could be statistically absorbed by another trait (similar to the effects of collinearity of independent variables). Conversely, soil and climate variables could covary with other environmental variables that affect plant species through the measured traits. It is difficult to infer just what mechanisms may be acting here without additional (ideally experimental) data. Although we have shown that the measured functional traits reduce most of the residual phylogenetic structure in the Pine Barrens communities, these associations could still mask important unmeasured traits.

We found that functional traits reduce a greater portion of the phylogenetic signal in species presence/absence than in species abundances in the Dune Meadows communities. This suggests that, although these functional traits strongly affect the suitability of sites for species, they have less effect on the ability of these species to attain large population sizes. Abundance may instead reflect local colonization dynamics or interactions with herbivores and pathogens influenced by traits that we did not measure. Thus, abundance data may provide more information about community assembly than just presence/absence data when analysing the phylogenetic components of community assembly (Freilich & Connolly, 2015).

Implications

It is often assumed that phylogenetic relationships among species contain additional, and possibly much more, ecological information relevant for predicting community assembly than what we find in commonly measured functional traits. This has led some community ecologists to argue that studies analysing community composition should incorporate information from both phylogenies and functional traits (e.g. Cadotte et al., 2013). With the methods we have presented, this is a testable hypothesis, and the Dune Meadows and Pine Barrens data sets give mixed results. On the one hand, the Dune Meadows example showed that phylogenies can indeed provide ecological information in addition to that contained in a small set of functional traits for predicting abundances of species among sites (Vane-Wright et al., 1991; Cadotte et al., 2009). Although functional traits are necessary to accurately infer the processes driving phylogenetic patterns (Kraft et al., 2007; Cavender-Bares et al., 2009), measured functional traits alone appear to provide an incomplete picture of abundances in the Dune Meadows. A recent study suggested that measured functional traits may contain less information about responses to environmental factors than the identity of species (Clark, 2016), implying limitations of using functional traits alone in explaining community assembly. On the other hand, our results from the Pine Barrens suggest that, in some communities, measured functional traits can reduce most of the phylogenetic pattern of community composition.

Our analyses are based on statistical models rather than metrics applied to data and tested using randomization methods. The current popularity of model-based methods in ecology reflects the fact that they are more interpretable, flexible, and statistically powerful than either null models or conventional algorithmic multivariate analyses (Warton et al., 2014). Here, we showed the power of phylogenetic mixed models (PLMMs and PGLMMs) for detecting and investigating phylogenetic patterns in community composition. This ability to combine phylogenies and functional traits into the same statistical model provides an integrated, quantitative framework for analysing ecological communities and predicting the incidence and abundance of one taxon from others. Using our methods to examine the overlaps of information about community composition derived from functional traits and phylogeny can provide useful insights and directions for subsequent analyses.

Statistical models also show the distinction between functional traits as explanations of community structure and phylogenies as patterns in the residual variation not accounted for by traits. Statistical models of community structure treat phylogenies as hypotheses for the pattern of covariances in the residual, unexplained variation. This implies that phylogenetic information does not explain anything in the same way that functional traits explain patterns as independent variables. Although researchers commonly talk about community patterns being ‘explained by phylogenies’, it is more accurate to say that covariances in the unexplained variation in community structure are consistent with the patterns anticipated to emerge from phylogenetic relationships.

We can use phylogenetic analyses to infer other unmeasured functional traits that may underlie patterns in community composition. Species often differ in how they respond to gradients in environmental conditions, but related species often respond similarly. In such cases, we expect some functional trait or traits to underlie these responses. The phylogenetic patterns we found here show that plant species commonly respond to edaphic conditions such as soil nutrient and chemistry. This highlights our frequent lack of data on functional traits related to roots and water/nutrient uptake. Thus, the integrated PLMM models provide valuable tools in cases where measured traits cannot fully account for phylogenetic patterns in ecological communities by suggesting which additional traits might be most informative for improving our ability to account for ecological patterns.

The assembly of plant and animal communities is clearly a complex phenomenon involving many processes. Some of these reflect differences among species assessed using the traits that can be measured. Some of these trait differences, in turn, reflect shared ancestry and conservative patterns of trait change among evolving lineages. Therefore, measured functional traits are expected to overlap with phylogenies in their information about species composition. The proportion of this overlap, however, varies from community to community, as found here between the two communities we studied. The tools presented here allowed us to explore these differences. We might also envision developing analogous methods to partition and assess the effects of other processes affecting community assembly (e.g. meta-community or neutral dynamics). This would require us to explicitly model the effects of these processes on variation in species incidence and abundance, and to be able to distinguish the patterns that emerge from those of alternative models.

Acknowledgements

This project was supported by the US-NSF Dimensions of Biodiversity programme under grants DEB-1046355 and DEB-1240804. DMW thanks the LabEX programme and ISEM group at the University of Montpellier for hosting his sabbatical stay. Comments from Editor David Ackerly and two anonymous reviewers greatly improved the manuscript.

    Author contributions

    D.L. and A.R.I. designed the study and performed the analyses. D.M.W. and D.L. designed the Pine Barrens field study and collected the functional trait data. D.L. collected the Pine Barrens vegetation and environmental data. D.L. wrote the initial manuscript and all authors collaborated in revising the paper.