Volume 37, Issue 5 p. 586-595
exaArticle
Free Access

DUALCOR: a phylogenetic comparative method to evaluate phylogenetic correlation between a character and a non-evolving external variable

Norberto P. Giannini

Corresponding Author

Norberto P. Giannini

Unidad Ejecutora Lillo, CONICET / Fundación Miguel Lillo, Miguel Lillo 251, San Miguel de Tucumán, Tucumán, 4000 Argentina

Facultad de Ciencias Naturales e Instituto Miguel Lillo, Universidad Nacional de Tucumán, Miguel Lillo 205, San Miguel de Tucumán, Tucumán, 4000 Argentina

Department of Mammalogy, Division of Vertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024 USA

Corresponding author:

Search for more papers by this author
Pablo A. Goloboff

Pablo A. Goloboff

Unidad Ejecutora Lillo, CONICET / Fundación Miguel Lillo, Miguel Lillo 251, San Miguel de Tucumán, Tucumán, 4000 Argentina

Department of Entomology, Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY, 10024 USA

Search for more papers by this author
First published: 29 January 2021

Abstract

A new phylogenetic comparative method called DUALCOR is presented to evaluate the evolutionary response of a character to non-evolving external factors, such as environmental variables. The method treats the character as a typical evolving feature of an organism that is reconstructed on a given tree, whereas the external factor is treated as unrelated to the phylogeny. DUALCOR first calculates the correlation/regression between the observed values of the character and the external factor; then it maps the character onto the phylogeny, shuffles the changes among branches, and re-evolves the character to yield new terminal values uncorrelated with the observed values of the external factor, allowing users to examine whether the observed degree of correlation can be attained at random. This is repeated n (say, 999) times, thereby using the dual nature of characters to construct a permutation test that is shown to satisfy requirements of Generalized Monte Carlo procedures. In addition, we provide an empirical example with a reverse test where the external variable (features determined largely by non-heritable factors) is the dependent variable.

Introduction

Phylogenetic comparative methods (PCMs) constitute an ever growing collection of statistical techniques with an explicit evolutionary framework, each intended for the quantitative analysis of biological data under a specific situation. The need for special techniques of analysis stems from the statistical dependence due to the phylogenetic relatedness of the sampling units—taxa, usually species. The field has been reviewed in numerous contributions along the past 30 years (e.g., Harvey and Pagel, 1991; Brooks and McLennan, 1991; Martins and Hansen, 1996, 1997; Diniz-Filho, 2000; Martins, 2000; Blomberg and Garland, 2002; Diniz-Filho and Bini, 2008; Freckleton, 2009; O'Meara, 2012; Paradis, 2012; Pennell and Harmon, 2013; Hansen, 2014; Cooper et al., 2016; Cornwell and Nakagawa, 2017; Uyeda et al., 2018). While some methodologies analyze diversification, i.e. speciation across lineages through time (e.g., Rabosky, 2006, Rabosky et al., 2014; but see Louca and Pennell, 2020), other techniques focus on character evolution, i.e., on the question of how and when characters evolve (e.g., Harmon et al., 2007), including correlations among characters based on comparisons transversal to the phylogeny (i.e., between sister nodes, as in Phylogenetic Independent Contrasts, PIC; Felsenstein, 1985) or along the phylogeny (i.e., in the ancestor-descendant line, as in Delayed Correlations, DELCOR; Giannini and Goloboff, 2010). Many of these techniques use specific models of lineage or character evolution, and take into consideration one or all sources of uncertainty in a Bayesian framework, namely tree, trait, and model uncertainty (Revell, 2012; Cornwell and Nakagawa, 2017). Other methods aim to discover the patterns of correlated character evolution on a given tree (e.g., Revell and Collar, 2009; Giannini and Goloboff, 2010).

In a previous contribution, we explored aspects of character correlation and regression in an explicitly phylogenetic framework (Giannini and Goloboff, 2010). We noted that evolutionary lags (i.e. the dependent character responding to changes in the independent character, but with some evolutionary time delay) may easily produce false negatives in the estimation of character correlation, unless the delays in response are taken into account. We showed that changes for each character should be effectively tracked along the tree and matched with those of the other character such that responses in the dependent character, if delayed, are allowed to form appropriate data pairs (with some penalty, as compared with immediate responses). Within that framework, we realized that the standard procedure of permuting states of the dependent character within terminals followed by ancestral reconstruction is inadequate because it wrongly destroys any congruence between the dependent character and the phylogeny—and what is under test is the dependence from the other character, through the phylogeny. Therefore, the appropriate statistical test would involve randomly reassigning reconstructed changes in ancestral states to other branches of the tree, so the connection between character evolution in the dependent character and the phylogeny is preserved. This process is repeated a number of times (e.g. 999), every time re-matching the changes of characters and calculating the now random correlation between them, to generate a null distribution for the test statistic—the slope or the correlation coefficient. Three major advantages emerged from this randomization test. First, the randomization of ancestral state changes effectively breaks down the dependence from the other character. Second, steps (and hence the amount of reconstructed change) remain constant across all permutations, thereby assuring a strict comparability of the amount of evolution for each and all permutations (as opposed to randomizing the terminals, which leads to different amounts of evolution in each replication; e.g., in Laurin, 2004). Third, this scheme makes it possible for the original arrangement of changes to be produced as one of the randomizations (one of the key requirements for generalized Monte Carlo permutation tests to be valid; see Manly, 1997). We implemented this method in the script DELCOR (Giannini and Goloboff, 2010).

The correlation problem outlined above is one between two evolving characters. Sometimes, the need arises to test whether changes in one character (e.g., body size) may depend on external factors such as latitudinal range, island size, environmental energy, which may change through time and may well influence evolution of taxa, but have no biological evolution themselves and are not transmitted through any genetic mechanism to the next generations (as opposed to true characters; see Freudenstein et al., 2003). Methods that link environmental or other non-evolving variation to comparative variation in traits have been proposed chiefly in the framework of phylogenetic generalized least squares (PGLS) models, in which the phylogenetic structure is incorporated in the error term (e.g., Martins and Hansen, 1997); with few exceptions, PGLSs treat the entire phylogeny as one structure that is a source of covariation between taxa that must be accounted for (see Garamszegi, 2014; cf. Giannini, 2003). We propose here a new method within the same framework of DELCOR, i.e., based on the specifics of changes as reconstructed along the tree (i.e., in the ancestor–descendant time line), also making use of all major advantages of statistical testing outlined above. The method resolves the question of how to correlate the specific changes of an evolving character (as optimized on a tree of choice), with variation of that external factor that reasonably seems to be driving the observed variation in terminals. It is clear that a phylogenetically informed analysis should be used to treat the character because of the evolutionary dependence that influences its changes. It is equally clear, however, that the same does not apply to the external character, which changes independently of the biology of the organisms involved in the comparison, including its evolutionary history (cf. Graham et al., 2004; Evans et al., 2009). Some standard phylogenetic tests such as PIC (e.g., Cornwell and Nakagawa, 2017) correlate ancestral states or changes along tree branches of both variables, but the non-evolving variable clearly cannot be reconstructed on the tree. Thus, this new PCM overcomes this difficulty by applying a differential treatment to the evolving character and to the non-evolving external factor, within a robust statistical framework. We illustrate the utility of this new approach with an empirical example on bats, in which we additionally show the potential for reverse testing, i.e., loosely heritable features as dependent on typical evolving characters of organisms. Some features (e.g. population size, various demographic parameters) result from the interaction between heritable factors and the environment, and thus cannot be mapped onto trees with conventional methods for character state reconstruction. The randomization method proposed here can be used to test whether the heritable component of such features may have been influenced by changes in the standard character, but without the need to apply unjustified mapping methods to the features largely depending on external factors. For this reverse testing to be meaningful, of course, the dependent feature must have some component of heritability.

The method

Data and preliminaries

DUALCOR is devised to calculate the regression, or correlation, between character data Y, of the usual type (e.g. morphology, but see all possibilities in Freudenstein et al., 2003), and external factors X (which partially or completely depend on environmental variables). The method is intended to answer the question of whether external factor X influenced evolutionary changes in character Y (see also Cornwell and Nakagawa, 2017); hence, variation on Y is hypothesized to be dependent on X (but see the empirical example below, for the opposite situation). Data are arranged in a usual character matrix encoding continuous characters and at least one tree should be available in the same or another file. The values of the non-evolving variable are scored in the matrix as if they were characters, although the test will never reconstruct values on ancestors.

Step 0: A preparatory step

DUALCOR first generates, stores, and counts all possible, equally parsimonious reconstructions of the evolving character Y (by default, character 1) on the tree (by default, tree 0) using the command iterrecs; the number of total reconstructions is reported.

Step 1: Calculate the observed correlation

The observed terminal values of Y (by default, character 1) and X (by default, character 0) are used to calculate a regression slope β and a correlation coefficient r (Fig. 1a). If any terminal is given a range for Y, the middle value is used. The value of the correlation coefficient r is used in subsequent steps, although β and r are equivalent test statistics (see Discussion). An alternative, non-parametric correlation is also available (see below).

Details are in the caption following the image
Hypothetical example constructed to demonstrate the basic procedure implemented in DUALCOR. Top: Observed values of dependent character Y and independent external factor X. In step 1 the observed correlation between the observed values is computed. In step 2 the evolution of character Y is reconstructed on the given phylogeny; for simplicity, the example implies a single most parsimonious reconstruction, with seven signed changes with specific nodal locations as seen in the tree. The root value is fixed in this step. Bottom: reconstructed changes are randomly allocated to internal branches (step 3) and the character re-evolves by summing up the root value to all other changes with their sign that are encountered between the root and the terminal (step 4). For instance, the value of the fourth terminal from top has re-evolved to value 3 (previously 8) by summing the fixed root +2 and the changes −1, +1 and +1 encountered along the path between root and the terminal. In step 5 a new correlation is calculated between the randomly re-evolved character Y-rand and the original external factor X that is left unchanged. Notice that in both reconstructions, top (with observed values) and bottom (with changes randomly distributed in the branches), tree cost remains the same.

Step 2: Optimize character Y on tree 0; calculate changes ΔYi

Character Y is optimized on the chosen tree as a continuous character (Fig. 1a; see Goloboff et al., 2006). For simplicity (but see below) a single, randomly chosen reconstruction of Y is taken from all possible most parsimonious reconstructions, and the (signed) ancestor—descendant differences ΔYi for each branch are stored (the sum of these values is the total steps for the chosen tree). The root is assigned an initial value; for simplicity, the midpoint of the range assigned to the node in all most parsimonious reconstructions, which is subsequently used but has no effect on the results of the test. This fixed value is taken as the departure point for randomly re-evolving the character in step 4 (see below). On the basis of nodal assignments, changes between nodes, ΔYi, are calculated as the difference between the assignment of the descendant node and the immediate ancestral node. The collection of changes ΔYi is stored in an array.

Step 3: Randomize the location of changes, ΔYi

The array with the collection of changes ΔYi is internally randomized. This effectively relocates the observed nodal assignments randomly to other nodes on the tree, changing the original pattern of evolution of character Y on tree 0, thereby destroying the internal phylogenetic covariance between observed states of the terminals in character Y, and their potential dependence on the non-evolving variable. Both the total amount of evolutionary change (tree cost) and the actual magnitude of each evolutionary event (ΔYi) are maintained in the randomization.

Step 4: Re-evolve Y using randomized ΔYi

The root value fixed in step 2 is used as the departure point to recalculate new terminal values. This is done by re-evolving character Y along the tree; i.e., summing up the root value and all the random nodal assignments with their sign (increases or decreases), between the root and each terminal (Fig. 1b). This is done using the command travtree up terms and additional calculations. This procedure creates a new, randomly generated character vector YRAND stored in a new array.

Step 5: Calculate the correlation for the re-evolved terminals and the original variable X

The new character YRAND is matched with the original external variable X. The correlation of YRAND and X is calculated, saving the randomized correlation coefficient value, rRAND (Fig. 1b).

Step 6: Replicate, generate null distribution, calculate test statistic

Steps 3 to 5 are repeated n times, storing rRAND and calculating a null distribution using the n correlation coefficients; the latter represents a sample of those expected under evolution of the Y character without influence from the non-evolving variable. The original r value (calculated in step 1) is then compared with this distribution, and the number of times the r value is equaled or exceeded by rRAND values is computed. The proportion of these values in the, say, 999 (random) + 1 (observed) r values is the empirical statistical significance for this randomization test. Naturally, the sign of r is taken into account—either positive or negative correlations, and the corresponding slopes.

Further details and implementation

DUALCOR is here presented in univariate form, i.e., single Y and X, both coded as continuous characters. Changes are considered here as the difference in states between descendant and ancestor in particular reconstructions and unlike steps, which are summed up in absolute value to yield tree cost, both the magnitude and sign (increase or decrease) of changes are preserved for calculations—specifically, in re-evolving the shuffled character changes in step 4. While all reconstructions of Y are generated exhaustively, a random sample of them (not just one, as presented for simplicity in step 2 above) is used in calculations. The tree may not be fully resolved; polytomies are optimized as implemented in TNT (see Goloboff et al., 2003, 2008).

A conventional (model I) linear regression is applied in steps 1 and 5. Alternatives to this regression model can be implemented to accommodate variables or characters coded in a different measurement scale; e.g., a logistic model if the responding character (Y) is binary (see Hair et al., 1995). The Pearson’s product-moment correlation coefficient r is calculated in steps 1 and 5; r assumes a linear relationship between two paired variables that follow a bivariate normal distribution with no extreme outliers, but it is very robust to violation of these assumptions (Zar, 1996). A common alternative to r when assumptions are not met is the non-parametric Spearman´s rank-order correlation rS (Zar, 1996). An option to apply rS (instead of default r) is available to users, but note that non-parametric statistics generally exhibit less statistical power than the corresponding parametric statistics, including rS as compared to r (Zar, 1996). In addition, the simulation for the significance of observed rS (step 5) can be done in a fast way permuting ranks only (i.e., skipping step 4) or a complete way including re-evolution of the character and determination of ranks in each repetition (as in simulating r).

As DELayed CORrelations (DELCOR, Giannini and Goloboff, 2010), DUAL CORrelations are implemented in a script that uses the TNT macro language (Goloboff et al., 2003, 2008). The scripts DUALCOR.run (implementing parametric linear correlation) and SDUALCOR (implementing non-parametric rank correlation) are freely available at www.lillo.org.ar/phylogeny/tnt/scripts.

Empirical example

Olival (2012) presented a data matrix (in his table 8.2) reporting the character “aspect ratio” (AR) and the demographic parameter “fixation index” (FST) for 61 species of bats (Mammalia: Chiroptera) classified in 10 families. AR (dimensionless) is a widely used measure of aerodynamic efficiency (see Norberg and Rayner, 1987); in this sample, AR varied between 5.4 (less efficient) and 8.2 (more efficient) and it was missing in eight species of the original dataset. FST and its analogue for sequence data ΦST (calculated either on mitochondrial or nuclear sequence, FST_mt or FST_nuc, respectively) are among the most widely used measures of population genetic differentiation (Olival, 2012). FST varies between 0 (panmixis) and 1 (complete genetic subdivision among subpopulations); while perhaps having some heritable component, it must also be strongly influenced by environmental factors. In the study of Olival (2012), FST_all was reported as either FST_mt or FST_nuc, or as the average between FST_mt and FST_nuc when both were available; a significant, negative relationship between corrected FST_all (residuals after controlling for geographic distance between samples) and AR was found using phylogenetically independent contrasts (PIC). Only taxa with complete design (no missing values) were included in each analysis by Olival (2012).

We choose this example, first, as an interesting biological problem: because bats use powered flight to explore their environment, home range and commuting distances are immense—up to three orders of magnitude larger than in non-volant mammals of comparable size—and interspecific variation is also huge (Giannini, 2012). It is expected then that interspecific differences in flight efficiency (estimated by AR) may affect demographic parameters of species; this is in fact reported in Olival (2012), who found that (out of many aerodynamic and other factors), AR was best able to explain a significant fraction of interspecific variation in population fragmentation in bats. Therefore demographic structure as estimated by the fixation index depends to some extent on the flight capacity of species—less demographic structuring is expected in highly commuting species (e.g., Eidolon helvum, FST_mt = 0.153), and the opposite is expected in sedentary species with small home range (e.g., Haplonycteris fischeri, FST_mt = 0.606), particularly in species scattered in many distant small islands (e.g., Pteropus hypomelanus, FST_nuc = 0.888; see Olival, 2012). Second, we choose this example because it shows that a morphological character can modify a variable such as a demographic parameter, that is not directly evolving as other biological features; i.e., model specification here is reversed so the response variable Y is the non-evolving factor, and X is the evolving character. Third, of the available information in Olival (2012) we used raw data of FST_all (i.e., average FST not corrected by geographic distance), as the strongest test (least likely to yield a significant result) of our methodology.

Results

Applying DUALCOR to the raw dataset from Olival (2012; included here in Appendix 1), the explanatory variable AR was the evolving character and we obtained 864 reconstructions (using the high and low end point of value intervals), as optimized in the tree of Appendix 1. This tree is different from the one used by Olival (2012), which was based on a supertree from Jones et al. (2002); we used the most recent supermatrix phylogeny (Amador et al., 2018), which was obtained with a comprehensive dataset of 800+ ingroup terminals and nine genetic markers from both the mitochondrial and nuclear genomes. The observed correlation r was −0.226, and the observed slope −0.056. We ran 999 replications on this dataset and obtained 23 rRAND values that were equal or greater than the observed r; the significance of this negative regression/correlation analysis was P = 0.024. We therefore conclude that degree of population fragmentation, as measured by the FST or equivalently by ΦST, significantly and inversely depends on flight efficiency—species with high AR and concomitantly large home ranges and commuting distances tend toward panmixis across their geographic range, while species with low AR tend to have highly structured subpopulations. This is a detectable but only a modest tendency though, as typical of noisy data, because albeit statistically significant at alpha = 0.05, the interspecific FST_all variation explained by AR is quite low (r2 = 5% as compared with 8.7% in Olival’s FST_all corrected data, but much higher for FST_mt at 36%).

Discussion

Map, shuffle, re-evolve

DUALCOR provides an adequate, differential treatment to each of the variables involved in the regression/correlation between one tree-dependent, evolving character, and an external, tree-independent variable or factor. This is straightforward: the character is optimized on the tree and the changes are used in the statistical test in a randomized fashion, thereby dealing with phylogenetic relatedness, while the external factor is left unchanged. This unique treatment of the evolving character, summarized as <map, shuffle, re-evolve>, is the basis for the proper treatment of phylogenetic dependence given that tree structure, ancestral states, and total amount of evolution (steps or tree cost) are all preserved during the randomization process. This is taken from developments in DELCOR, here carrying the <re-evolve> step to produce new terminal values, and it is relevant for the full validity of the simulation test; this is key because this procedure destroys the phylogenetic dependence that otherwise impedes proper statistical testing when using related taxa as sampling units (see Giannini and Goloboff, 2010). Briefly (but see details in Manly, 1997), an effective way of generating a null distribution of the test statistic (either r or β, see below) involves the creation of alternative datasets that are likely to have occurred in the absence of any interaction with the non-evolving variable (among them, of course, the observed dataset). This is achieved by shuffling ancestral states randomly among nodes, and the re-evolving step generates new terminal values that may have resulted from the same evolutionary events (increases or decreases of the ancestral character value), just occurring in other nodes. This cannot be achieved by other procedures, for instance randomization of values among terminals, because each new character set so generated likely produces a different set of evolutionary events, and a different overall magnitude of evolution, for the same tree, as if it was a different character (e.g., in Laurin, 2004). Thus, under this randomization procedure shared with DELCOR, the requirements for generalized Monte Carlo testing are fully satisfied (see Manly, 1997), including the observed dataset being one of the many possible sets that could have been obtained by evolution on the same phylogeny (Giannini and Goloboff, 2010).

The correlation coefficient r and the slope β are distinct statistics but they are equivalent from the randomization perspective, meaning that any result (significant or not) applies equally to both; this is because the only part of the statistic formulation that is affected by permutation is the same in both r and β, specifically, the sum of the crossproducts of x and y (see Manly, 1997: 149). Therefore all randomization is done with regard to r, but the P value so obtained also applies to β (see also Giannini and Goloboff, 2010). We have argued that the acceptable type I error rate and power in DELCOR are largely a consequence of the robustness of the regression analysis per se, and of the correct randomization scheme (Giannini and Goloboff, 2010). We conjecture that this also holds for DUALCOR, as it is based on the same regression type (model I) and the same randomization procedure (map and shuffle).

We stated previously that this type of dual correlation is necessary because the types of variables involved should be distinguished and treated differently. In our empirical example (see above), DUALCOR is set to map, shuffle and re-evolve X instead of Y. DUALCOR was able to statistically demonstrate the expected inverse relationship between flight efficiency as estimated by aspect ratio, and population structuring, even in the worst-case scenario for the available demographic parameters, i.e. using raw data instead of distance-corrected FST values, and using the average FST_all rather than the more efficient population-structure sampler FST_mit (see Olival, 2012). In addition, the validity of the DUALCOR test is further supported because demographic parameters such as the widely used FST are non-evolving variables and are left unchanged in this test (cf. PIC; Felsenstein, 1985).

Comparison with other methods

Phylogenetic generalized least squares (PGLS)

The phylogenetic relatedness of species violates the explicit assumption of GLS that the residuals of model fitting are independent (see Martins and Garland, 1991; Hunt and Carrano, 2010). But PGLS can be specified in many ways (Martins and Hansen, 1997), so for the general model:
urn:x-wiley:07483007:media:cla12450:cla12450-math-0001(1)
the response-variables matrix is Y (usually encoding the character data), the vector of slopes is β, the external-variables matrix is X, and ɛ is the error term, where the phylogeny is introduced conventionally (Martins and Hansen, 1997). This can be treated separately as error due to common ancestry (ɛS), and error due to uncertainty in the reconstruction of the phylogenetic history (ɛP; additional possible sources of error, such as within-population variance ɛM, are not phylogenetic; Garamszegi, 2014). So the key aspect of PGLS is that phylogeny is encoded as a covariance matrix determined by relatedness and this specification is used in fitting the model (Hunt and Carrano, 2010). Assuming for instance raw Brownian motion as model of character evolution (but other models are used generally), that covariance matrix ɛS is proportional to the patristic distance, so “taxa that share most of their path from the root are expected to have higher covariance” for the trait and this is “used to downweight the joint influence of points” that are partially redundant due to shared history (Hunt and Carrano, 2010, pp. 258–259). Thus the phylogenetic component of the regression problem is encapsulated as a confounding factor (Garamszegi, 2014) and the phylogeny is treated as the expected structural covariance between taxa that must be corrected for.

Unlike PGLS, DUALCOR directly uses the (inferred) evolutionary changes in the statistical test. This fulfills the actual goal set when using these models, which are often applied to answer the question: “across many species, does environmental variable X influence values of trait Y?” (Cornwell and Nakagawa, 2017: R334). This is because “values of trait Y” actually result from evolutionary changes of Y on the tree; most parsimonious optimization is used in DUALCOR to detect position, magnitude, and sign of changes in the evolving character, and the randomization mechanism shuffles those changes in the tree.

However, PGLS methods are highly flexible and not all are constructed the same; for instance, canonical phylogenetic ordination (CPO) encodes the phylogeny not in the error term ɛ but in the explanatory part of the model X (Giannini, 2003). In CPO the phylogeny (treated as an unrooted network) is not encoded as covariance (α distance) but it is decomposed instead into its constituent network partitions (i.e., clades in a rooted tree); additionally, environmental variables can also be part of the explanation (i.e., are coded in X when available; Giannini, 2003). Here the chief difference with DUALCOR in terms of model specification is again the direct use of changes, but when evolutionary variation is coded in X as in our empirical example of bats (see above), CPO comes conceptually closer to DUALCOR in the sense that phylogenetic structure (coded in X) explains variation in Y (Giannini, 2003).

Thus, in terms of eq. (1), the DUALCOR specification is: Y contains the scoring of the character, X contains the external factor observed in each taxon, and the error term ɛ is empirical, i.e., not predetermined to fit any covariance structure such as the phylogeny. The reverse is also possible, as shown in the empirical example above, in which the character is coded in X and the response variable Y is a non-evolving variable. The phylogeny is incorporated in the statistical test so that specific evolutionary changes in Y (or X), and the total amount of evolution, both remain constant while generating a null distribution of r with re-evolving y.

Phylogenetically independent contrasts (PIC)

As originally proposed by Felsenstein (1985; see also Garland et al., 1992), this phylogenetically explicit method calculates a regression between two conventional characters; however, the method of PIC has also been used to estimate the regression between a character and an environmental variable (e.g., Olival, 2012; see also Cornwell and Nakagawa, 2017). For each character, a contrast is calculated between two terminals as the arithmetic difference between their character values, and the algorithm progresses down the tree calculating n—1 new contrasts between each pair of descendants, internal and terminal. For this, internal-node values are calculated as variance-corrected averages between descendants, estimated by assuming a BM model of evolution. In eq. (1), x and y are contrasts.

As applied to our problem, the contrasts of both the character and the environmental variable should be calculated (as in Olival, 2012). Here, and compared to PGLS, a more direct link between (some transformation of) changes in the character and changes in the external variable is verified node by node, but 1. the contrasts of an environmental variable, which does not evolve, are difficult to justify, just like nodal values resulting from optimization instead of PIC for this type of variable (see above); and 2. changes themselves are not calculated, nodal values are only estimated on the basis of a global standard deviation for contrasts. The latter procedure (as equivalently translated into squared-changes parsimony) has long been known for its tendency to generate changes where none is required (see Hormiga et al., 2000: 444; it can be shown that this is exacerbated in the presence of stasis in portions of the tree). By contrast, DUALCOR uses actual changes as reconstructed in the tree, and only for the evolving character.

Alternatives to optimization

DUALCOR uses optimization of continuous characters (Goloboff et al., 2006) to assign states to internal nodes, selecting a random sample from all most parsimonious reconstructions (with the iterrecs command) to calculate the null distribution of the slope. Other implementations are possible. Instead of sampling reconstructions, it is possible to sample nodal assignments from a Bayesian set with different probabilities; e.g., sampling from the distribution of states from mapping using some evolutionary model (e.g., Revell, 2012). This would deal with uncertainty in determining the ancestor-descendant change. In addition, it would be possible to deal with uncertainty in determining the phylogeny by sampling groups from the Markov chain that generated the topology; this can be combined in turn with sampling states. Once the set of terminal states set is sampled, this set should be re-evolved along the tree under some preferred model of character change (e.g., BM, OU, EB, or other; Uyeda et al., 2014). The terminal values as determined by these sampling processes should be paired with the unchanged values of the external factor, and the procedure repeated over n replicates. Therefore the concept of matching dual variables for regression analysis, i.e. evolving characters and non-evolving external factors that are treated as such, could in principle be implemented using optimization as in DUALCOR, or under model-based approaches, provided that sources of uncertainty are accounted for, and appropriate evolutionary models are used.

Conclusions

DUALCOR is presented here as a new, flexible phylogenetic comparative method to address the statistical problem of evolution of a character responding to a non-evolving external factor. Our implementation (single dependent and independent characters, the evolving character optimized on the tree, randomization of changes in this character) can be extended to other ways of evolving the character under evolutionary models, to include a host of credible trees in the test, and to consider other regression models (e.g., logistic if Y is binary). The major contribution of this approach is thus the concept of distinguishing the statistical treatment of evolving characters (mapping, shuffling, re-evolving it n times) versus the non-evolving external factors (left unchanged with respect to phylogeny). In this way all aspects of the evolutionary responses to external factors including phylogenetic dependence, or the reverse when applicable, are adequately addressed within a general Monte Carlo statistical framework.

Acknowledgments

We thank our institutions Unidad Ejecutora Lillo, CONICET-Fundación Miguel, and Facultad de Ciencias Naturales e Instituto Miguel Lillo, Universidad Nacional de Tucumán. This work was supported by Agencia Nacional de Promoción Científica y Tecnológica (PICT 2015-2389 and PICT 2016-3682 to NPG), and Consejo Nacional de Ciencia y Tecnología (PUE 0070 to PAG).

    Conflict of interest

    None declared.

    Appendix 1

    Data matrix and tree of the empirical example from Olival (2012)

    • nstates cont ;

    • xread 'AR - FST'

    • 2 61 &[cont]

    • sacc_bili 6.10 0.059

    • rhin_aura 6.29 0.375

    • tria_firc 6.12 0.089

    • tria_rufu 7.04 0

    • macr_giga 6.10 0.571

    • otom_mart 9.30 0.090

    • tada_bras 8.20 0.080

    • myst_tube 7.00 0.338

    • myzo_auri ? 0.686

    • ardo_nich 6.40 0.005

    • arti_jama 6.36 0.013

    • brac_cave 6.10 0.005

    • caro_pers 6.70 0.060

    • desm_rotu 5.71 0.468

    • glos_long 5.92 0.725

    • lept_cura 7.60 0.109

    • phyl_hast 6.30 0.031

    • urod_bilo 7.70 0.010

    • cyno_brac 7.90 0.074

    • cyno_hors ? 0.054

    • cyno_nusa ? 0.171

    • cyno_sphin 6.70 0.164

    • eido_helv 6.90 0.153

    • eony_spel 8.60 0.120

    • hapl_fisc 5.68 0.490

    • macr_mini 6.50 0.105

    • pten_jago 5.40 0.081

    • pten_mino ? 0.041

    • pter_alec 6.81 0.023

    • pter_hypo 5.52 0.882

    • pter_poli 7.41 0.014

    • pter_rodr ? 0.026

    • pter_scap 7.30 0.028

    • pter_vamp 8.40 0

    • rous_aegy 5.90 0.493

    • rous_ampl 6.30 0.100

    • thoo_nigre 6.70 0.605

    • hipp_rube 6.55 0.410

    • hipp_speo 7.40 0.210

    • hipp_turp 5.60 0.041

    • rhin_corn 5.23 0.088

    • rhin_ferr 6.10 0.117

    • rhin_mono 5.45 0.156

    • antr_pall 6.55 0.244

    • cory_town 6.61 0.331

    • epte_fusc 6.40 0.464

    • mini_schr 7.00 0.620

    • myot_bech 6.00 0.265

    • myot_capa ? 0.487

    • myot_daub 6.30 0.017

    • myot_muri 6.80 0.360

    • myot_myot 6.30 0.281

    • myot_natt 6.40 0.017

    • myot_sept 5.80 0.002

    • nyct_azor ? 0.211

    • nyct_noct 7.40 0.125

    • pipi_pipi 5.46 0.044

    • pipi_pygm 5.82 0.024

    • plec_auri 5.70 0.019

    • scot_kuhl ? 0.070

    • vesp_muri 7.00 0.071;

    • tread 'tree(s) from TNT, for data in C:\Users\user\Desktop\TNT\kevin-ok.txt'

    • ((((0 8 )(7 (13 (16 ((12 (9 (10 17 )))(11 (14 15 )))))))((5 6 )(46 ((59 ((43 (44 58 ))(45 (60 ((54 55 )(56 57 ))))))(53 (((47 49 )(51 52 ))(48 50 )))))))((4 ((1 (2 3 ))((37 (38 39 ))(41 (40 42 )))))((((19 20 (18 21 ))(26 27 ))(24 36 ))(((22 (32 ((30 (28 29 ))(31 33 ))))(23 25 ))(34 35 )))));proc-;

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.