skip to main content
10.1145/3097983.3098042acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article
Open Access

Estimation of Recent Ancestral Origins of Individuals on a Large Scale

Published:13 August 2017Publication History

ABSTRACT

The last ten years have seen an exponential growth of direct-to-consumer genomics. One popular feature of these tests is the report of a distant ancestral inference profile-a breakdown of the regions of the world where the test-taker's ancestors may have lived. While current methods and products generally focus on the more distant past (e.g., thousands of years ago), we have recently demonstrated that by leveraging network analysis tools such as community detection, more recent ancestry can be identified. However, using a network analysis tool like community detection on a large network with potentially millions of nodes is not feasible in a live production environment where hundreds or thousands of new genotypes are processed every day. In this study, we describe a classification method that leverages network features to assign individuals to communities in a large network corresponding to recent ancestry. We recently launched a beta version of this research as a new product feature at AncestryDNA.

Skip Supplemental Material Section

Supplemental Material

curtis_ancestral_origins.mp4

mp4

375.5 MB

References

  1. D. H. Alexander, J. Novembre, and K. Lange. Fast model-based estimation of ancestry in unrelated individuals. Genome Research 19:1655--1664, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  2. Ancestry Corporate Communications. Ancestry Sets AncestryDNA Sales Record Over Holiday Period and Fourth Quarter. Press Release available at: http://www.ancestry.com/corporate/newsroom/press-releases/ancestry-sets-ancestrydna-sales-record-over-holiday-period-and-fourth, 2017.Google ScholarGoogle Scholar
  3. C. Ball, et al. AncestryDNA Matching White Paper: Discovering genetic matches across a massive, expanding database. Ancestry. Available at: https://www.ancestry.com/corporate/sites/default/files/AncestryDNA-Matching-White-Paper.pdfGoogle ScholarGoogle Scholar
  4. V. D. Blondel, J. L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment 10(P10008), 2008. Google ScholarGoogle ScholarCross RefCross Ref
  5. S. R. Browning and B. L. Browning. Haplotype Phasing: existing methods and new developments. Nature Reviews Genetics 12:703--714, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  6. C. Chen, A. Liaw, L. Breiman. Using Random Forest to Learn Imbalanced Data. Statistics Technical Reports 666, 2004.Google ScholarGoogle Scholar
  7. G. Csárdi and T. Nepusz. The Igraph Software Package for Complex Network Research. InterJournal Complex Systems 1695, 2006.Google ScholarGoogle Scholar
  8. G. Forman and M. Scholz. Apples-to-Apples in Cross-Validation Studies: Pitfalls in Classifier Performance Measurement. SIGKDD Explorations: 12(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. S. Fortunato. Community detection in graphs. Physics Reports, 486:3--5:75--174, 2010.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Girvan and M. E. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences 99(12): 7821--7826, 2002. Google ScholarGoogle ScholarCross RefCross Ref
  11. R. C. Griffiths and S. Tavare. The age of a mutation in a general coalescent tree. Commun. Statist-Stochastic Models, 14 (1&2), 273--295, 1998. Google ScholarGoogle ScholarCross RefCross Ref
  12. A. Gusev et al. Whole population genome wide mapping of hidden relatedness. Genome Research, 2008. Google ScholarGoogle ScholarCross RefCross Ref
  13. E. Han et al. Clustering of 770,000 genomes reveals post-colonial population structure of North America. Nature Communications 8, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  14. Illumina. Omni Whole-Genome DNA Analysis BeadChips. https://www.illumina.com/content/dam/illumina-marketing/documents/products/datasheets/datasheet_omni_whole-genome_beadchips.pdf, 2017.Google ScholarGoogle Scholar
  15. D. J. Lawson, G. Hellenthal, S. Myers, and D. Falush. Inference of population structure using dense haplotype data. PLoS Genetics 8(e1002453), 2012. Google ScholarGoogle ScholarCross RefCross Ref
  16. S. Leslie et al. The fine-scale genetic structure of the British population. Nature 519:309--314, 2015. Google ScholarGoogle ScholarCross RefCross Ref
  17. B. K. Maples, S. Gravel, E. E. Kenny, and C. D. Bustamante. RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference. American Journal of Human Genetics 93(2), 278--288, 2013. Google ScholarGoogle ScholarCross RefCross Ref
  18. Moreno-Estrada et al. The Genetics of Mexico Recapitulates Native America Substructure and Affects Biomedical Traits. Science 344:1280--1285, 2014. Google ScholarGoogle ScholarCross RefCross Ref
  19. M. Nei. Genetic Distance between populations. Am. Nat. 106: 283--292, 1972. Google ScholarGoogle ScholarCross RefCross Ref
  20. M. E. Newman. The structure and function of complex networks. SIAM Review 45(2):167--256, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Nielsen, J. M. Akey, M. Jakobsson, J. K. Pritchard, S. Tishkoff, and E. Willerslev. Tracing the peopling of the world through genomics. Nature 541: 302--310, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Noto et al. Underdog: A Fully-Supervised Phasing Algorithm that Learns from Hundreds of Thousands of Samples and Phases in Minutes. Presented at the 64th Annual Meeting of the American Society of Human Genetics, 2014.Google ScholarGoogle Scholar
  23. J. K. Pritchard, M. Stephens, P. J. Donnelly. Inference of population structure using multilocus genotype data. Genetics 155:945--959, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  24. J. S. Roberts et al. Direct-Consumer Genetic Testing: User Motivations, Decision Making, and Perceived Utility of Results. Public Health Genomics, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  25. US Census Bureau. 2010 Census Shows Multiple-Race Population Grew Faster Than Single-Race Population, https://www.census.gov/newsroom/releases/archives/race/cb12--182.html, 2012.Google ScholarGoogle Scholar

Index Terms

  1. Estimation of Recent Ancestral Origins of Individuals on a Large Scale

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              • Published in

                cover image ACM Conferences
                KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
                August 2017
                2240 pages
                ISBN:9781450348874
                DOI:10.1145/3097983

                Copyright © 2017 Owner/Author

                Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

                Publisher

                Association for Computing Machinery

                New York, NY, United States

                Publication History

                • Published: 13 August 2017

                Check for updates

                Qualifiers

                • research-article

                Acceptance Rates

                KDD '17 Paper Acceptance Rate64of748submissions,9%Overall Acceptance Rate1,133of8,635submissions,13%

                Upcoming Conference

                KDD '24

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader