Uncovering the drivers of host-associated microbiota with joint species distribution modelling
Corresponding Author
Johannes R. Björk
Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana
Theoretical and Experimental Ecology Station, CNRS-University Paul Sabatier, Moulis, France
Correspondence
Johannes R. Björk, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN.
Email: [email protected]
Search for more papers by this authorFrancis K. C. Hui
Mathematical Sciences Institute, The Australian National University, Canberra, Australia
Search for more papers by this authorRobert B. O'Hara
Department of Mathematical Sciences, NTNU, Trondheim, Norway
Biodiversity and Climate Research Centre, Frankfurt, Germany
Search for more papers by this authorJose M. Montoya
Theoretical and Experimental Ecology Station, CNRS-University Paul Sabatier, Moulis, France
Search for more papers by this authorCorresponding Author
Johannes R. Björk
Department of Biological Sciences, University of Notre Dame, Notre Dame, Indiana
Theoretical and Experimental Ecology Station, CNRS-University Paul Sabatier, Moulis, France
Correspondence
Johannes R. Björk, Department of Biological Sciences, University of Notre Dame, Notre Dame, IN.
Email: [email protected]
Search for more papers by this authorFrancis K. C. Hui
Mathematical Sciences Institute, The Australian National University, Canberra, Australia
Search for more papers by this authorRobert B. O'Hara
Department of Mathematical Sciences, NTNU, Trondheim, Norway
Biodiversity and Climate Research Centre, Frankfurt, Germany
Search for more papers by this authorJose M. Montoya
Theoretical and Experimental Ecology Station, CNRS-University Paul Sabatier, Moulis, France
Search for more papers by this authorAbstract
In addition to the processes structuring free-living communities, host-associated microbiota are directly or indirectly shaped by the host. Therefore, microbiota data have a hierarchical structure where samples are nested under one or several variables representing host-specific factors, often spanning multiple levels of biological organization. Current statistical methods do not accommodate this hierarchical data structure and therefore cannot explicitly account for the effect of the host in structuring the microbiota. We introduce a novel extension of joint species distribution models (JSDMs) which can straightforwardly accommodate and discern between effects such as host phylogeny and traits, recorded covariates such as diet and collection site, among other ecological processes. Our proposed methodology includes powerful yet familiar outputs seen in community ecology overall, including (a) model-based ordination to visualize and quantify the main patterns in the data; (b) variance partitioning to assess how influential the included host-specific factors are in structuring the microbiota; and (c) co-occurrence networks to visualize microbe-to-microbe associations.
REFERENCES
- Aivelo, T., & Norberg, A. (2018). Parasite–microbiota interactions potentially affect intestinal communities in wild mammals. Journal of Animal Ecology, 87(2), 438–447. https://doi.org/10.1111/1365-2656.12708
- Balint, M., Bahram, M., Eren, A. M., Faust, K., Fuhrman, J. A., Lindahl, B., … Tedersoo, L. (2016). Millions of reads, thousands of taxa: Microbial community structure and associations analyzed via marker genes. FEMS Microbiology Reviews, 40(5), 686. https://doi.org/10.1093/femsre/fuw017
- Berendsen, R. L., Pieterse, C. M., & Bakker, P. A. (2012). The rhizosphere microbiome and plant health. Trends in Plant Science, 17(8), 478–486. https://doi.org/10.1016/j.tplants.2012.04.001f
- Bhattacharya, A., & Dunson, D. B. (2011). Sparse Bayesian infinite factor models. Biometrika, 98, 291–306. https://doi.org/10.1093/biomet/asr013
- Bolker, B. M., Brooks, M. E., Clark, C. J., Geange, S. W., Poulsen, J. R., Stevens, M. H. H., & White, J. S. S. (2009). Generalized linear mixed models: A practical guide for ecology and evolution. Trends in Ecology & Evolution, 24(3), 127–135. https://doi.org/10.1016/j.tree.2008.10.008
- Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems, 1695, 1–9.
- Denwood, M. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09
- Easson, C. G., & Thacker, R. W. (2014). Phylogenetic signal in the community structure of host-specific microbiomes of tropical marine sponges. Frontiers in Microbiology, 5, 532. https://doi.org/10.3389/fmicb.2014.00532
- Fang, H., Huang, C., Zhao, H., & Deng, M. (2015). CCLasso: correlation inference for compositional data through Lasso. Bioinformatics, 31(19), 3172–3180. https://doi.org/10.1093/bioinformatics/btv349
- Felsenstein, J. (1985). Phylogenies and the comparative method. The American Naturalist, 125(1), 1–15. https://doi.org/10.1086/284325
- Friedman, J., & Alm, E. J. (2012). Inferring correlation networks from genomic survey data. PLOS Computational Biology, 8(9), 1–11. https://doi.org/10.1371/journal.pcbi.1002687
- Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y. S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383. https://doi.org/10.1214/08-AOAS191
- Geweke, J. F. (1991). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. Oxford, UK: Clarendon Press.
10.21034/sr.148 Google Scholar
- Geweke, J., & Zhou, G. (1996). Measuring the price of the Arbitrage Pricing Theory. The Review of Financial Studies, 9(2), 557–587. https://doi.org/10.1093/rfs/9.2.557
- Gloeckner, V., Wehrl, M., Moitinho-Silva, L., Gernert, C., Schupp, P., Pawlik, J. R., … Hentschel, U. (2014). The HMA-LMA dichotomy revisited: An electronmicroscopical survey of 56 sponge species. The Biological Bulletin, 227(1), 78–88. https://doi.org/10.1086/BBLv227n1p78
- Gloor, G. B., Macklaim, J. M., Pawlowsky-Glahn, V., & Egozcue, J. J. (2017). Microbiome datasets are compositional: And this is not optional. Frontiers in Microbiology, 8, 2224. https://doi.org/10.3389/fmicb.2017.02224
- Golding, N. (2018). greta: Simple and Scalable Statistical Modelling in R. Retrieved from https://cran.r-project.org/web/packages/greta/
- Grantham, N. S., Reich, B. J., Borer, E. T., & Gross, K. (2017). MIMIX: A bayesian mixed-effects model for microbiome data from designed experiments. eprint: arXiv:1703.07747.
- Groussin, M., Mazel, F., Sanders, J. G., Smillie, C. S., Lavergne, S., Thuiller, W., & Alm, E. J. (2017). Unraveling the processes shaping mammalian gut microbiomes over evolutionary time. Nature Communications, 8, 14319. https://doi.org/10.1038/ncomms14319
- Haegeman, B., Sen, B., Godon, J. J., & Hamelin, J. (2014). Only simpson diversity can be estimated accurately from microbial community fingerprints. Microbial Ecology, 68(2), 169–172. https://doi.org/10.1007/s00248-014-0394-5
- Hird, S. M., Sánchez, C., Carstens, B. C., & Brumfield, R. T. (2015). Comparative gut microbiota of 59 neotropical bird species. Frontiers in Microbiology, 6, 1403. https://doi.org/10.3389/fmicb.2015.01403
- Hui, F. K. C. (2016). boral–Bayesian ordination and regression analysis of multivariate abundance data in R. Methods in Ecology and Evolution, 7(6), 744–750. https://doi.org/10.1111/2041-210X.12514
- Hui, F. K. C. (2017). Model-based simultaneous clustering and ordination of multivariate abundance data in ecology. Computational Statistics & Data Analysis, 105, 1–10. https://doi.org/10.1016/j.csda.2016.07.008
- Hui, F. K., Taskinen, S., Pledger, S., Foster, S. D., & Warton, D. I. (2015). Model-based approaches to unconstrained ordination. Methods in Ecology and Evolution., 6(4), 399–411. https://doi.org/10.1111/2041-210X.12236
- Ives, A. R., & Helmus, M. R. (2010). Phylogenetic metrics of community similarity. The American Naturalist, 176(5), E128–E142. https://doi.org/10.1086/656486
- Ives, A. R., & Helmus, M. R. (2011). Generalized linear mixed models for phylogenetic analyses of community structure. Ecological Monographs, 81(3), 511–525. https://doi.org/10.1890/10-1264.1
- Kaldhusdal, A., Brandl, R., Müller, J., Möst, L., & Hothorn, T. (2015). Spatio-phylogenetic multispecies distribution models. Methods in Ecology and Evolution, 6, 187–197. https://doi.org/10.1111/2041-210X.12318
- Kurtz, Z. D., Müller, C. L., Miraldi, E. R., Littman, D. R., Blaser, M. J., & Bonneau, R. A. (2015). Sparse and compositionally robust inference of microbial ecological networks. PLOS Computational Biology, 11(5), 1–25. https://doi.org/10.1371/journal.pcbi.1004226
- Legendre, L., & Legendre, P. (1983). Numerical ecology. Developments in environmental modelling. Amsterdam, Netherlands: Elsevier. ISBN 9780444538680.
- Letten, A. D., Keith, D. A., Tozer, M. G., & Hui, F. K. (2015). Fine-scale hydrological niche differentiation through the lens of multi-species cooccurrence models. Journal of Ecology, 103(5), 1264–1275. https://doi.org/10.1111/1365-2745.12428
- Li, H. (2015). Microbiome, metagenomics, and high-dimensional compositional data analysis. Annual Review of Statistics and Its Application, 2(1), 73–94. https://doi.org/10.1146/annurev-statistics-010814-020351
- Li, D., & Ives, A. R. (2017). The statistical need to include phylogeny in traitbased analyses of community composition. Methods in Ecology and Evolution, 8(10), 1192–1199. https://doi.org/10.1111/2041-210X.12767
- Liu, S., da Cunha, A. P., Rezende, R. M., Cialic, R., Wei, Z., Bry, L., … Weiner, H. L. (2016). The host shapes the gut microbiota via fecal microRNA. Cell Host & Microbe, 19(1), 32–43. https://doi.org/10.1016/j.chom.2015.12.005
- McFall-Ngai, M., Hadfield, M. G., Bosch, T. C., Carey, H. V., Domazet-Lošo, T., Douglas, A. E., … Hentschel, U. (2013). Animals in a bacterial world, a new imperative for the life sciences. Proceedings of the National Academy of Sciences, 110(9), 3229–3236. https://doi.org/10.1073/pnas.1218525110
- McKay, C. S. (2015) Create Plots from MCMC Output. Retrieved from https://cran.r-project.org/web/packages/mcmcplots/
- Muegge, B. D., Kuczynski, J., Knights, D., Clemente, J. C., González, A., Fontana, L., … Gordon, J. I. (2011). Diet drives convergence in gut microbiome functions across mammalian phylogeny and within humans. Science, 332(6032), 970–974. https://doi.org/10.1126/science.1198719
- Nakagawa, S., & Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods in Ecology and Evolution, 4(2), 133–142. https://doi.org/10.1111/j.2041-210x.2012.00261.x
- Ovaskainen, O., Abrego, N., Halme, P., & Dunson, D. (2016). Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution, 7(5), 549–555. https://doi.org/10.1111/2041-210X.12501
- Ovaskainen, O., Roy, D. B., Fox, R., & Anderson, B. J. (2015). Uncovering hidden spatial structure in species communities with spatially explicit joint species distribution models. Methods in Ecology and Evolution., 7(4), 428–436. https://doi.org/10.1111/2041-210X.12502
- Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., … Abrego, N. (2017). How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters, 20(5), 561–576. https://doi.org/10.1111/ele.12757
- Plummer, M. (2003) JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Retrieved from https://www.r-project.org/conferences/DSC-2003/
- Plummer, M., Best, N., Cowles, K., & Vines, K. (2016) CODA: convergence diagnosis and output analysis for MCMC. Retrieved from https://cran.r-project.org/web/packages/coda/
- Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., O'Hara, R. B., Parris, K. M., … McCarthy, M. A. (2014). Understanding co-occurrence by modelling species simultaneously with a Joint Species Distribution Model (JSDM). Methods in Ecology and Evolution, 5(5), 397–406. https://doi.org/10.1111/2041-210X.12180
- Polson, N. G., & Scott, J. G. (2012). On the half-Cauchy prior for a global scale parameter. Bayesian Analysis, 7(4), 887–902. https://doi.org/10.1214/12-BA730
- R Core Team (2016). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- Schwager, E., Mallick, H., Ventz, S., & Huttenhower, C. (2017). A Bayesian method for detecting pairwise associations in compositional data. PLOS Computational Biology, 13(11), 1–21. https://doi.org/10.1371/journal.pcbi.1005852
- Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Chapman Hall & Hall/CRC Interdisciplinary Statistics. CRC Press. ISBN: 9780203489437. Retrieved from https://books.google.com/books?id=YUpDqCzb-WMC
10.1201/9780203489437 Google Scholar
- Sohn, M. B., & Li, H. (2017). A GLM-based latent variable ordination method for microbiome samples. Biometrics. https://doi.org/10.1111/biom.12775
- Thompson, J. N. (1994). The coevolutionary process. Chicago, IL: University of Chicago Press. https://doi.org/10.7208/chicago/9780226797670.001.0001
10.1046/j.1420-9101.2001.00348.x Google Scholar
- Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., & Kristensen, K. (2015). Spatial factor analysis: A new tool for estimating joint species distributions and correlations in species range. Methods in Ecology and Evolution., 6(6), 627–637. https://doi.org/10.1111/2041-210X.12359
- Tikhonov, G., Abrego, N., Dunson, D., & Ovaskainen, O. (2017). Using joint species distribution models for evaluating how speciestospecies associations depend on the environmental context. Methods in Ecology and Evolution, 8(4), 443–452. https://doi.org/10.1111/2041-210X.12723
- Tsilimigras, M. C., & Fodor, A. A. (2016). Compositional data analysis of the microbiome: Fundamentals, tools, and challenges”. Annals of Epidemiology, 26(5), 330–335. https://doi.org/10.1016/j.annepidem.2016.03.002
- Vellend, M. (2010). Conceptual synthesis in community ecology. The Quarterly Review of Biology, 85(2), 183–206. https://doi.org/10.1086/652373
- Warton, D. I., Blanchet, F. G., O'Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C., & Hui, F. K. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology and Evolution, 30, 1–14. https://doi.org/10.1016/j.tree.2015.09.007
- Warton, D. I., & Guttorp, P. (2011). Compositional analysis of overdispersed counts using generalized estimating equations. Environmental and Ecological Statistics, 18(3), 427–446. https://doi.org/10.1007/s10651-010-0145-9
- Warton, D. I., Wright, S. T., & Wang, Y. (2012). Distance-based multivariate analyses confound location and dispersion effects. Methods in Ecology and Evolution, 3(1), 89–101. https://doi.org/10.1111/j.2041-210X.2011.00127.x
- Xia, F., Chen, J., Fung, W. K., & Li, H. (2013). A logistic normal multinomial regression model for microbiome compositional data analysis. Biometrics, 69(4), 1053–1063. https://doi.org/10.1111/biom.12079
- Xu, L., Paterson, A. D., & Xu, W. (2017). Bayesian latent variable models for hierarchical clustered count outcomes with repeated measures in microbiome studies. Genetic Epidemiology, 41(3), 221–232. https://doi.org/10.1002/gepi.22031
- Yatsunenko, T., Rey, F. E., Manary, M. J., Trehan, I., Dominguez-Bello, M. G., Contreras, M., … Heath, A. C. (2012). Human gut microbiome viewed across age and geography. Nature, 486(7402), 222–227. https://doi.org/10.1038/nature11053
- Zhang, X., Mallick, H., Tang, Z., Zhang, L., Cui, X., Benson, A. K., & Yi, N. (2017). Negative binomial mixed models for analyzing microbiome count data. BMC Bioinformatics, 18(1), 4. https://doi.org/10.1186/s12859-016-1441-7
- Zurell, D., Pollock, L. J., & Thuiller, W. (2018). Do joint species distribution models reliably detect interspecific interactions from co occurrence data in homogenous environments? Ecography. https://doi.org/10.1111/ecog.03315
10.1111/ecog.03315 Google Scholar