Full text loading...
Review Article
Free
Simulation Tests of Methods in Evolution, Ecology, and Systematics: Pitfalls, Progress, and Principles
- Katie E. Lotterhos1, Matthew C. Fitzpatrick2, and Heath Blackmon3
-
View Affiliations Hide AffiliationsAffiliations: 1Department of Marine and Environmental Sciences, Northeastern University, Nahant, Massachusetts, USA; email: [email protected] 2Appalachian Lab, University of Maryland Center for Environmental Science, Frostburg, Maryland, USA 3Department of Biology, Texas A&M University, College Station, Texas, USA
- Vol. 53:113-136 (Volume publication date November 2022) https://doi.org/10.1146/annurev-ecolsys-102320-093722
- First published as a Review in Advance on July 29, 2022
-
Copyright © 2022 by Annual Reviews. All rights reserved
Abstract
Complex statistical methods are continuously developed across the fields of ecology, evolution, and systematics (EES). These fields, however, lack standardized principles for evaluating methods, which has led to high variability in the rigor with which methods are tested, a lack of clarity regarding their limitations, and the potential for misapplication. In this review, we illustrate the common pitfalls of method evaluations in EES, the advantages of testing methods with simulated data, and best practices for method evaluations. We highlight the difference between method evaluation and validation and review how simulations, when appropriately designed, can refine the domain in which a method can be reliably applied. We also discuss the strengths and limitations of different evaluation metrics. The potential for misapplication of methods would be greatly reduced if funding agencies, reviewers, and journals required principled method evaluation.
Article metrics loading...
Literature Cited
- Adrion JR, Cole CB, Dukler N, Galloway JG, Gladstein AL et al. 2020. A community-maintained standard library of population genetic models. eLife 9:54967
- AIAA (Am. Inst. Aeronaut. Astronaut.) 1998. Guide for the Verification and Validation of Computational Fluid Dynamics Simulations (AIAA G-077–1998(2002)) Reston, VA: Am. Inst. Aeronaut. Astronaut.
- Anderson D, Burnham K. 2004. Model Selection and Multi-Model Inference New York: Springer. , 2nd Ed..
- Arnold B, Corbett-Detig RB, Hartl D, Bomblies K. 2013. RADseq underestimates diversity and introduces genealogical biases due to nonrandom haplotype sampling. Mol. Ecol. 22:113179–90
- Austin MP, Belbin L, Meyers JA, Doherty LM. 2006. Evaluation of statistical models used for predicting plant species distributions: role of artificial data and theory. Ecol. Model. 199:2197–216Describes the use of virtual species for testing ecological theory related to modeling plant distributions.
- Barbet-Massin M, Jiguet F, Albert CH, Thuiller W. 2012. Selecting pseudo-absences for species distribution models: how, where and how many?. Methods Ecol. Evol. 3:2327–38
- Bay RA, Harrigan RJ, Underwood VL, Gibbs HL, Smith TB, Ruegg K. 2018. Genomic signals of selection predict climate-driven population declines in a migratory bird. Science 359:637183–86
- Beaulieu JM, O'Meara BC. 2016. Detecting hidden diversification shifts in models of trait-dependent speciation and extinction. Syst. Biol. 65:4583–601
- Beaulieu JM, O'Meara BC. 2018. Can we build it? Yes we can, but should we use it? Assessing the quality and value of a very large phylogeny of campanulid angiosperms. Am. J. Bot. 105:3417–32
- Berg JJ, Coop G. 2014. A population genetic signal of polygenic adaptation. PLOS Genet 10:8e1004412
- Bergstrom CT, West JD. 2021. Calling Bullshit: The Art of Skepticism in a Data-Driven World New York: Random House
- Blackmon H, Demuth JP. 2016. An information-theoretic approach to estimating the composite genetic effects contributing to variation among generation means: moving beyond the joint-scaling test for line cross analysis. Evolution 70:2420–32
- Blanquart F, Kaltz O, Nuismer SL, Gandon S. 2013. A practical guide to measuring local adaptation. Ecol. Lett. 16:91195–205
- Borowiec ML, Cover SP, Rabeling C. 2021. The evolution of social parasitism in Formica ants revealed by a global phylogeny. PNAS 118:38e2026029118
- Boulesteix A-L, Binder H, Abrahamowicz M, Sauerbrei W, Simul. Panel STRATOS Initiat. 2018. On the necessity and design of studies comparing statistical methods. Biom. J. 60:1216–18
- Boulesteix A-L, Wilson R, Hapfelmeier A. 2017. Towards evidence-based computational statistics: lessons from clinical research on the role and design of real-data benchmark studies. BMC Med. Res. Methodol. 17:1138Highlights how principles of clinical trial design can be applied to method evaluations.
- Box GEP 1979. Robustness in the strategy of scientific model building. Robustness in Statistics RL Launer, GN Wilkinson 201–36 New York: Academic
- Brier GW. 1950. Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78:11–3
- Brown JM, Hedtke SM, Lemmon AR, Lemmon EM. 2010. When trees grow too long: investigating the causes of highly inaccurate Bayesian branch-length estimates. Syst. Biol. 59:2145–61
- Capblancq T, Luu K, Blum MGB, Bazin E. 2018. Evaluation of redundancy analysis to identify signatures of local adaptation. Mol. Ecol. Resour. 18:61223–33
- Carnell R. 2021. lhs: Latin Hypercube Samples. Stat. Softw. Package, CRAN-R Proj. https://CRAN.R-project.org/package=lhs
- Chivers C, Leung B, Yan ND. 2014. Validation and calibration of probabilistic predictions in ecology. Methods Ecol. Evol. 5:101023–32
- Cottingham KL, Lennon JT, Brown BL. 2005. Knowing when to draw the line: designing more informative ecological experiments. Front. Ecol. Env. 3:3145–52
- Cunningham CW, Zhu H, Hillis DM. 1998. Best-fit maximum-likelihood models for phylogenetic inference: empirical tests with known phylogenies. Evolution 52:978–87
- Davis J, Goadrich M. 2006. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning WW Cohen, A Moore 233–40 New York: Assoc. Comput. Mach.
- Dormann CF, Schymanski SJ, Cabral J, Chuine I, Graham C et al. 2012. Correlation and process in species distribution models: bridging a dichotomy. J. Biogeogr. 39:122119–31
- EFSA Sci. Comm 2011. Statistical significance and biological relevance. Europ. Food Safety Auth. Journal. 9:92372
- Elith J, Graham CH, Anderson RP, Dudík M, Ferrier S et al. 2006. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29:2129–51
- Elith J, Leathwick JR. 2009. Species distribution models: ecological explanation and prediction across space and time. Annu. Rev. Ecol. Evol. Syst. 40:677–97
- Fawcett T. 2004. ROC graphs: notes and practical considerations for data mining researchers Tech. Rep. HPL-2003–4 HP Lab. Palo Alto, CA: https://www.hpl.hp.com/techreports/2003/HPL-2003-4.pdf
- Felsenstein J. 1985. Phylogenies and the comparative method. Am. Nat. 125:11–15
- Flagel L, Brandvain Y, Schrider DR. 2019. The unreasonable effectiveness of convolutional neural networks in population genetic inference. Mol. Biol. Evol. 36:2220–38
- Flanagan SP, Jones AG. 2018. Substantial differences in bias between single-digest and double-digest RAD-seq libraries: a case study. Mol. Ecol. Resour. 18:2264–80
- Forester BR, Jones MR, Joost S, Landguth EL, Lasky JR. 2016. Detecting spatial genetic signatures of local adaptation in heterogeneous landscapes. Mol. Ecol. 25:1104–20
- Forester BR, Lasky JR, Wagner HH, Urban DL. 2018. Comparing methods for detecting multilocus adaptation with multivariate genotype–environment associations. Mol. Ecol. 27:92215–33
- Fourcade Y, Besnard AG, Secondi J. 2018. Paintings predict the distribution of species, or the challenge of selecting environmental predictors and evaluation statistics. Glob. Ecol. Biogeogr. 27:2245–56
- Freedman AH, Clamp M, Sackton TB. 2021. Error, noise and bias in de novo transcriptome assemblies. Mol. Ecol. Resour. 21:118–29
- Gautier M. 2015. Genome-wide scan for adaptive divergence and association with population-specific covariates. Genetics 201:41555–79
- Gautier M, Gharbi K, Cezard T, Foucaud J, Kerdelhué C et al. 2013. The effect of RAD allele dropout on the estimation of genetic variation within and between populations. Mol. Ecol. 22:113165–78
- Gelman A, Meng X-L, Stern H. 1996. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6:4733–60
- Gilbert KJ, Andrew RL, Bock DG, Franklin MT, Kane NC et al. 2012. Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program STRUCTURE. Mol. Ecol. 21:204925–30
- Grimm DG, Azencott C-A, Aicheler F, Gieraths U, MacArthur DG et al. 2015. The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Hum. Mutat. 36:5513–23
- Guillera-Arroita G, Lahoz-Monfort JJ, Elith J, Gordon A, Kujala H et al. 2015. Is my species distribution model fit for purpose? Matching data and models to applications. Glob. Ecol. Biogeogr. 24:3276–92
- Harismendy O, Ng PC, Strausberg RL, Wang X, Stockwell TB et al. 2009. Evaluation of next generation sequencing platforms for population targeted sequencing studies. Genome Biol 10:3R32
- Harris RB, Sackman A, Jensen JD. 2018. On the unfounded enthusiasm for soft selective sweeps II: examining recent evidence from humans, flies, and viruses. PLOS Genet 14:12e1007859
- Hendrycks D, Zhao K, Basart S, Steinhardt J, Song D. 2021. Natural adversarial examples. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)15257–66 Piscataway, NJ: IEEE
- Hoban S, Kelley JL, Lotterhos KE, Antolin MF, Bradburd G et al. 2016. Finding the genomic basis of local adaptation: pitfalls, practical solutions, and future directions. Am. Nat. 188:4379–97
- Höhna S, Coghill LM, Mount GG, Thomson RC, Brown JM. 2018. P3: phylogenetic posterior prediction in RevBayes. Mol. Biol. Evol. 35:41028–34Demonstrates the use of posterior predictive simulations to evaluate models and validate their use with specific data sets.
- Hölzer M, Marz M. 2019. De novo transcriptome assembly: a comprehensive cross-species comparison of short-read RNA-Seq assemblers. Gigascience 8:5giz039
- Inouye BD. 2001. Response surface experimental designs for investigating interspecific competition. Ecology 82:102696–706
- Jiménez-Valverde A. 2012. Insights into the area under the receiver operating characteristic curve (AUC) as a discrimination measure in species distribution modelling. Global Ecol. Biogeogr. 21:498–507
- Lanfear R, Frandsen PB, Wright AM, Senfeld T, Calcott B. 2017. PartitionFinder 2: new methods for selecting partitioned models of evolution for molecular and morphological phylogenetic analyses. Mol. Biol. Evol. 34:3772–73
- Láruson ÁJ, Fitzpatrick MC, Keller SR, Haller BC, Lotterhos KE. 2022. Seeing the forest for the trees: assessing genetic offset predictions from gradient forest. Evolutionary Appl 15:3403–16
- LeCun Y, Cortes C, Burges CJC. 1998. The MNIST database of handwritten digits http://yann.lecun.com/exdb/mnist/
- Lee A. 2015. pyDOE: the experimental design package for Python. Softw. Package. https://pythonhosted.org/pyDOE/
- Lewis PO, Holder MT, Holsinger KE. 2005. Polytomies and Bayesian phylogenetic inference. Syst. Biol. 54:2241–53
- Lin CD, Bingham D, Sitter RR, Tang B. 2010. A new and flexible method for constructing designs for computer experiments. Ann. Stat. 38:31460–77
- Liu C, Newell G, White M. 2019. The effect of sample size on the accuracy of species distribution models: considering both presences and pseudo-absences or background sites. Ecography 42:3535–48
- Lobo JM, Tognelli MF. 2011. Exploring the effects of quantity and location of pseudo-absences and sampling biases on the performance of distribution models with limited point occurrence data. J. Nat. Conserv. 19:11–7
- Lotterhos KE. 2019. The effect of neutral recombination variation on genome scans for selection. G3 Genes Genomes Genet 9:61851–67
- Lotterhos KE, François O, Blum MGB. 2016. Not just methods: User expertise explains the variability of outcomes of genome-wide studies. bioRxiv 055046. https://doi.org/10.1101/055046
- Lotterhos KE, Moore JH, Stapleton AE. 2018. Analysis validation has been neglected in the Age of Reproducibility. PLOS Biol 16:12e3000070
- Lotterhos KE, Whitlock MC. 2014. Evaluation of demographic history and neutral parameterization on the performance of FST outlier tests. Mol. Ecol. 23:92178–92
- Lotterhos KE, Whitlock MC. 2015. The relative power of genome scans to detect local adaptation depends on sampling design and statistical method. Mol. Ecol. 24:51031–46
- Louca S, Pennell MW. 2020. Extant timetrees are consistent with a myriad of diversification histories. Nature 580:7804502–5
- Lucas TCD. 2020. A translucent box: interpretable machine learning in ecology. Ecol. Monogr. 90:4e01422
- Luu K, Bazin E, Blum MGB. 2017. pcadapt: an R package to perform genome scans for selection based on principal component analysis. Mol. Ecol. Resour. 17:167–77
- Maddison WP, FitzJohn RG. 2015. The unsolved challenge to phylogenetic correlation tests for categorical characters. Syst. Biol. 64:1127–36
- Maguire KC, Nieto-Lugilde D, Blois JL, Fitzpatrick MC, Williams JW et al. 2016. Controlled comparison of species- and community-level models across novel climates and communities. Proc. R Soc. B 283:182620152817
- Martínez-Abraín A. 2008. Statistical significance and biological relevance: a call for a more cautious interpretation of results in ecology. Acta Oecol 34:19–11
- McKay MD, Beckman RJ, Conover WJ. 1979. A comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:2239–45
- Mellin C, Lurgi M, Matthews S, MacNeil MA, Caley MJ et al. 2016. Forecasting marine invasions under climate change: Biotic interactions and demographic processes matter. Biol. Conserv. 204:459–67
- Meynard CN, Leroy B, Kaplan DM. 2019. Testing methods in species distribution modelling using virtual species: What have we learnt and what are we missing?. Ecography 42:122021–36Review of testing SDM methods and methodological decisions using virtual species.
- Meynard CN, Quinn JF. 2007. Predicting species distributions: a critical comparison of the most common statistical models using artificial species. J. Biogeogr. 34:81455–69
- Miettinen K. 2012. Nonlinear Multiobjective Optimization New York: Springer Sci. Bus. Media
- Molnar C. 2021. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable https://christophm.github.io/interpretable-ml-book/index.html
- Morton JT, Toran L, Edlund A, Metcalf JL, Lauber C, Knight R 2017. Uncovering the horseshoe effect in microbial analyses. mSystems 2:1e00166–16
- Narum SR, Hess JE. 2011. Comparison of FST outlier tests for SNP loci under selection. Mol. Ecol. Resour. 11:1184–94
- Natl. Res. Counc 2007. Models in Environmental Regulatory Decision Making Washington, DC: Natl. Acad. Press
- Natl. Res. Counc 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification Washington, DC: Natl. Acad. PressReview of principles in model evaluation for physics and engineering.
- Ng J, Smith SD. 2014. How traits shape trees: new approaches for detecting character state-dependent lineage diversification. J. Evol. Biol. 27:102035–45
- Norberg A, Abrego N, Blanchet FG, Adler FR, Anderson BJ et al. 2019. A comprehensive evaluation of predictive performance of 33 species distribution models at species and community levels. Ecol. Monogr. 89:3e01370
- Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. 2018. The preregistration revolution. PNAS 115:112600–6
- Oreskes N, Shrader-Frechette K, Belitz K. 1994. Verification, validation, and confirmation of numerical models in the Earth sciences. Science 263:5147641–46
- Otto SP, Day T. 2011. A Biologist's Guide to Mathematical Modeling in Ecology and Evolution Princeton, NJ: Princeton Univ. Press
- Ovaskainen O, Tikhonov G, Norberg A, Blanchet FG, Duan L et al. 2017. How to make more out of community data? A conceptual framework and its implementation as models and software. Ecol. Lett. 20:5561–76
- Pearce J, Ferrier S. 2000. Evaluating the predictive performance of habitat models developed using logistic regression. Ecol. Model. 133:225–45
- Pennell MW, FitzJohn RG, Cornwell WK, Harmon LJ. 2015. Model adequacy and the macroevolution of angiosperm functional traits. Am. Nat. 186:2E33–50Shows how different summary statistics can highlight different aspects of model performance.
- Pérez-Figueroa A, García-Pereira MJ, Saura M, Rolán-Alvarez E, Caballero A. 2010. Comparing three different methods to detect selective loci using dominant markers. J. Evol. Biol. 23:102267–76
- Peterson M. 2009. An Introduction to Decision Theory Cambridge, UK: Cambridge Univ. Press. , 1st ed..
- Phillips SJ, Dudík M. 2008. Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31:2161–75
- Pratt J, Raiffa H, Schlaifer R. 2008. Introduction to Statistical Decision Theory Cambridge, MA: MIT Press
- Qiao H, Soberón J, Peterson AT. 2015. No silver bullets in correlative ecological niche modelling: insights from testing among many potential algorithms for niche estimation. Methods Ecol. Evol. 6:101126–36
- Rabosky DL, Goldberg EE. 2015. Model inadequacy and mistaken inferences of trait-dependent speciation. Syst. Biol. 64:2340–55Illustrates the process and advantages of simulating data in conjunction with real data.
- Ranjan P, Bingham D, Michailidis G. 2008. Sequential experiment design for contour estimation from complex computer codes. Technometrics 50:4527–41
- Rellstab C, Dauphin B, Exposito-Alonso M. 2021. Prospects and limitations of genomic offset in conservation management. Evol. Appl. 14:51202–12
- Revell LJ. 2014. Ancestral character estimation under the threshold model from quantitative genetics. Evolution 68:3743–59
- Rice A, Mayrose I. 2021. Model adequacy tests for probabilistic models of chromosome-number evolution. New Phytol 229:63602–13
- Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J et al. 2017. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40:8913–29
- Ruegg K, Bay RA, Anderson EC, Saracco JF, Harrigan RJ et al. 2018. Ecological genomics predicts climate vulnerability in an endangered southwestern songbird. Ecol. Lett. 21:71085–96
- Saito T, Rehmsmeier M. 2015. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE 10:3e0118432
- Santini L, Benítez-López A, Maiorano L, Čengić M, Huijbregts MAJ. 2021. Assessing the reliability of species distribution projections in climate change research. Divers. Distrib. 27:61035–50
- Sarkar A, Yang Y, Vihinen M 2020. Variation benchmark datasets: update, criteria, quality and applications. Database 2020:baz117
- Schrider DR, Kern AD. 2016. S/HIC: robust identification of soft and hard sweeps using machine learning. PLOS Genet 12:3e1005928
- Schrider DR, Kern AD. 2018. Supervised machine learning for population genetics: a new paradigm. Trends Genet 34:4301–12
- Sofaer HR, Hoeting JA, Jarnevich CS. 2019. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 10:4565–77
- Starfield AM. 1997. A pragmatic approach to modeling for wildlife management. J. Wildl. Manag. 61:2261–70
- Tang B. 1993. Orthogonal array-based Latin hypercubes. J. Am. Stat. Assoc. 88:4241392–97
- Thacker BH, Doebling SW, Hemez FM, Anderson MC, Pepin JE, Rodriguez EA. 2004. Concepts of model verification and validation LA-14167-MS Los Alamos Natl. Lab. Los Alamos, NM: https://www.osti.gov/servlets/purl/835920/ Review of concepts in model evaluation for physics and engineering.
- Thompson JD, Plewniak F, Poch O. 1999. BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics 15:187–88
- Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A et al. 2019. Essential guidelines for computational method benchmarking. Genome Biol 20:1125Review of approaches in model evaluation for genomics.
- Whitlock MC, Lotterhos KE. 2015. Reliable detection of loci responsible for local adaptation: inference of a null model through trimming the distribution of FST. Am. Nat. 186:S1S24–36
- Whitlock MC, Schluter D. 2020. The Analysis of Biological Data New York: W. H. Freeman. , 3rd ed..
- Zurell D, Berger U, Cabral JS, Jeltsch F, Meynard CN et al. 2010. The virtual ecologist approach: simulating data and observers. Oikos 119:4622–35Review of the use of simulated data to evaluate methods in ecology.
- Zurell D, Jeltsch F, Dormann CF, Schröder B. 2009. Static species distribution models in dynamically changing systems: How good can predictions really be?. Ecography 32:5733–44
- Zurell D, Pollock LJ, Thuiller W. 2018. Do joint species distribution models reliably detect interspecific interactions from co-occurrence data in homogeneous environments?. Ecography 41:111812–19
Data & Media loading...
Supplementary Data
Download the Supplemental Methods (PDF).
Download Supplemental Appendix 1 (R code) (ZIP).
Download Supplemental Appendix 2 (PDF).
Download Supplemental Table 1 (XLSX).
- Article Type: Review Article
Most Read This Month
Most Cited Most Cited RSS feed
-
-
-
-
Ecological and Evolutionary Responses to Recent Climate Change
Vol. 37 (2006), pp. 637–669
-
-
-
-
-
-
-
Species Distribution Models: Ecological Explanation and Prediction Across Space and Time
Vol. 40 (2009), pp. 677–697
-
-
-
-
-
Phylogenies and Community Ecology
Vol. 33 (2002), pp. 475–505
-
-
-
Species Richness of Parasite Assemblages: Evolution and Patterns
Vol. 28 (1997), pp. 341–358
-
-
-
-
-
Landscapes and Riverscapes: The Influence of Land Use on Stream Ecosystems
Vol. 35 (2004), pp. 257–284
-
- More Less