Advertisement
Report| Volume 32, ISSUE 6, P1412-1419.e3, March 28, 2022

Download started.

Ok

Ancestral genomic contributions to complex traits in contemporary Europeans

Open ArchivePublished:February 08, 2022DOI:https://doi.org/10.1016/j.cub.2022.01.046

      Highlights

      • Ancient groups differentially contributed to complex traits in contemporary Europeans
      • In contemporary Estonians 11 out of 27 traits show association with some ancestry
      • Hunter-Gatherer and Yamnaya ancestries divergently influence cholesterol levels
      • Post-admixture selection is not necessary to have trait-ancestry associations

      Summary

      The contemporary European genetic makeup formed in the last 8,000 years when local Western Hunter-Gatherers (WHGs) mixed with incoming Anatolian Neolithic farmers and Pontic Steppe pastoralists.
      • Lazaridis I.
      • Patterson N.
      • Mittnik A.
      • Renaud G.
      • Mallick S.
      • Kirsanow K.
      • Sudmant P.H.
      • Schraiber J.G.
      • Castellano S.
      • Lipson M.
      • et al.
      Ancient human genomes suggest three ancestral populations for present-day Europeans.
      • Haak W.
      • Lazaridis I.
      • Patterson N.
      • Rohland N.
      • Mallick S.
      • Llamas B.
      • Brandt G.
      • Nordenfelt S.
      • Harney E.
      • Stewardson K.
      • et al.
      Massive migration from the steppe was a source for Indo-European languages in Europe.
      • Allentoft M.E.
      • Sikora M.
      • Sjögren K.G.
      • Rasmussen S.
      • Rasmussen M.
      • Stenderup J.
      • Damgaard P.B.
      • Schroeder H.
      • Ahlström T.
      • Vinner L.
      • et al.
      Population genomics of Bronze Age Eurasia.
      This encounter combined genetic variants with distinct evolutionary histories and, together with new environmental challenges faced by the post-Neolithic Europeans, unlocked novel adaptations.
      • Racimo F.
      • Woodbridge J.
      • Fyfe R.M.
      • Sikora M.
      • Sjögren K.G.
      • Kristiansen K.
      • Vander Linden M.
      The spatiotemporal spread of human migrations during the European Holocene.
      Previous studies inferred phenotypes in these source populations, using either a few single loci
      • Olalde I.
      • Allentoft M.E.
      • Sánchez-Quinto F.
      • Santpere G.
      • Chiang C.W.K.
      • DeGiorgio M.
      • Prado-Martinez J.
      • Rodríguez J.A.
      • Rasmussen S.
      • Quilez J.
      • et al.
      Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.
      • Mathieson I.
      • Lazaridis I.
      • Rohland N.
      • Mallick S.
      • Patterson N.
      • Roodenberg S.A.
      • Harney E.
      • Stewardson K.
      • Fernandes D.
      • Novak M.
      • et al.
      Genome-wide patterns of selection in 230 ancient Eurasians.
      • Saag L.
      • Vasilyev S.V.
      • Varul L.
      • Kosorukova N.V.
      • Gerasimov D.V.
      • Oshibkina S.V.
      • Griffith S.J.
      • Solnik A.
      • Saag L.
      • D’Atanasio E.
      • et al.
      Genetic ancestry changes in Stone to Bronze Age transition in the East European plain.
      or polygenic scores based on genome-wide association studies,
      • Cox S.L.
      • Ruff C.B.
      • Maier R.M.
      • Mathieson I.
      Genetic contributions to variation in human stature in prehistoric Europe.
      • Ju D.
      • Mathieson I.
      The evolution of skin pigmentation-associated variation in West Eurasia.
      • Berens A.J.
      • Cooper T.L.
      • Lachance J.
      The genomic health of ancient hominins.
      and investigated the strength and timing of natural selection on lactase persistence or height, among others.
      • Mathieson I.
      • Lazaridis I.
      • Rohland N.
      • Mallick S.
      • Patterson N.
      • Roodenberg S.A.
      • Harney E.
      • Stewardson K.
      • Fernandes D.
      • Novak M.
      • et al.
      Genome-wide patterns of selection in 230 ancient Eurasians.
      ,
      • Berg J.J.
      • Coop G.
      A population genetic signal of polygenic adaptation.
      ,
      • Racimo F.
      • Berg J.J.
      • Pickrell J.K.
      Detecting Polygenic Adaptation in Admixture Graphs.
      However, how ancient populations contributed to present-day phenotypic variation is poorly understood. Here, we investigate how the unique tiling of genetic variants inherited from different ancestral components drives the complex traits landscape of contemporary Europeans and quantify selection patterns associated with these components. Using matching individual-level genotype and phenotype data for 27 traits in the Estonian biobank
      • Leitsalu L.
      • Haller T.
      • Esko T.
      • Tammesoo M.L.
      • Alavere H.
      • Snieder H.
      • Perola M.
      • Ng P.C.
      • Mägi R.
      • Milani L.
      • et al.
      Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu.
      and genotype data directly from the ancient source populations, we quantify the contributions from each ancestry to present-day phenotypic variation in each complex trait. We find substantial differences in ancestry for eye and hair color, body mass index, waist/hip circumferences, and their ratio, height, cholesterol levels, caffeine intake, heart rate, and age at menarche. Furthermore, we find evidence for recent positive selection linked to four of these traits and, in addition, sleep patterns and blood pressure. Our results show that these ancient components were differentiated enough to contribute ancestry-specific signatures to the complex trait variability displayed by contemporary Europeans.

      Keywords

      Results

      We identified 27 complex traits of interest, based on information availability in the Estonian Biobank
      • Leitsalu L.
      • Haller T.
      • Esko T.
      • Tammesoo M.L.
      • Alavere H.
      • Snieder H.
      • Perola M.
      • Ng P.C.
      • Mägi R.
      • Milani L.
      • et al.
      Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu.
      (EstBB) and genome-wide association studies (GWAS) catalog.
      • Buniello A.
      • MacArthur J.A.L.
      • Cerezo M.
      • Harris L.W.
      • Hayhurst J.
      • Malangone C.
      • McMahon A.
      • Morales J.
      • Mountjoy E.
      • Sollis E.
      • et al.
      The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
      EstBB contains matching genotype and phenotype information for individuals from a relatively homogeneous population, which contains all three ancestry components found in Europe, with the proportion of remnant Hunter-Gatherer ancestry among the highest in Europe and an additional minor (<5%) Siberian component associated with Iron Age movements.
      • Tambets K.
      • Yunusbayev B.
      • Hudjashov G.
      • Ilumäe A.M.
      • Rootsi S.
      • Honkola T.
      • Vesakoski O.
      • Atkinson Q.
      • Skoglund P.
      • Kushniarevich A.
      • et al.
      Genes reveal traces of common recent demographic history for most of the Uralic-speaking populations.
      ,
      • Saag L.
      • Laneman M.
      • Varul L.
      • Malve M.
      • Valk H.
      • Razzak M.A.
      • Shirobokov I.G.
      • Khartanovich V.I.
      • Mikhaylova E.R.
      • Kushniarevich A.
      • et al.
      The Arrival of Siberian Ancestry Connecting the Eastern Baltic to Uralic Speakers further East.
      In order to associate specific ancient European ancestry components with predicted phenotypes, we introduce covA, a measure of the relative similarity between any contemporary individual and the ancestries that contributed to its genetic makeup. For each sample in the EstBB, we computed its covA with each of the ancestral source populations, focusing on genomic regions potentially connected to each trait. We then used covA as a predictor to model traits, also in comparison with the same statistic computed for the whole genome. Finally, we test if those regions associated with the genetic contribution from a specific ancestry experienced a post-admixture selective pressure on top of the observed local unbalance in contributing ancestries.

      covA measures similarity with ancestral groups

      Here we introduce covA, the covariance between allele frequency (p) in a contemporary individual i (i.e., its allele dosage) and a given ancestral population j with respect to the contemporary and ancient average frequencies ( p C and p A respectively):
      c o v A ( i , j ) = ( p i p C ) ( p j p A )
      (1)


      The covA statistic is expected to be high when the allele frequencies of the individual i and the ancestry j are similar in comparison with the differences within the contemporary population and across the ancestries that contributed to its genetic makeup. Furthermore, covA can be computed averaging over the contribution of multiple single nucleotide polymorphisms (SNPs), either across the whole genome or for specific regions of interest.
      In order to test the potential of covA to distinguish between genetic contributions from different ancestries, we simulated polygenic traits in a modern population composed of three ancestral groups and verified that when predicting simulated traits, covA estimated coefficient correlates well with their ancestral specificity (Pearson’s correlation coefficient ρ = 0.919-0.937, Figure S1A). See STAR Methods for further discussion of c o v A properties and simulation details, including the definition of ancestral specificity.
      For each EstBB individual, we computed genome-wide covA between the individual and each of the ancestries among Western Hunter-Gatherers (WHG), Neolithic Farmers from Anatolia (Anatolia_N), Yamnaya Pastoralists from the Pontic Steppes (Yamnaya), and Siberian (Siberia). We defined these ancestry groups based on genetic and chronological proximity to a set of identified focal individuals (see STAR Methods and Table S1 for a list of the ancient genomes assigned to each group). As expected, covAs calculated on the different ancestries are strongly interdependent because they include as term the average ancestral frequency ( p A ) and because of varying grades of similarity among the ancestries for historical demographic reasons (see covA joint distributions in Figures S1C–S1F). In particular, covA tends to be negatively correlated between different ancestral components with the exception of Yamnaya and WHG, reflecting complex demographic relationships between the two, involving WHG-like Eastern Hunter Gatherer ancestry presence in Yamnaya.
      • Haak W.
      • Lazaridis I.
      • Patterson N.
      • Rohland N.
      • Mallick S.
      • Llamas B.
      • Brandt G.
      • Nordenfelt S.
      • Harney E.
      • Stewardson K.
      • et al.
      Massive migration from the steppe was a source for Indo-European languages in Europe.
      ,
      • Allentoft M.E.
      • Sikora M.
      • Sjögren K.G.
      • Rasmussen S.
      • Rasmussen M.
      • Stenderup J.
      • Damgaard P.B.
      • Schroeder H.
      • Ahlström T.
      • Vinner L.
      • et al.
      Population genomics of Bronze Age Eurasia.
      ,
      • Damgaard P.B.
      • Marchi N.
      • Rasmussen S.
      • Peyrot M.
      • Renaud G.
      • Korneliussen T.
      • Moreno-Mayar J.V.
      • Pedersen M.W.
      • Goldberg A.
      • Usmanova E.
      • et al.
      137 ancient human genomes from across the Eurasian steppes.
      Furthermore, although the Estonian population is considered relatively genetically uniform, some geographic differences exist with the southeastern inland counties having higher haplotype sharing with Latvians, Lithuanians, and Russians compared with the rest of the country, as recently shown in Pankratov et al., 2020.
      • Pankratov V.
      • Montinaro F.
      • Kushniarevich A.
      • Hudjashov G.
      • Jay F.
      • Saag L.
      • Flores R.
      • Marnetto D.
      • Seppel M.
      • Kals M.
      • et al.
      Differences in local population history at the finest level: the case of the Estonian population.
      This result is also mirrored in our analyses with median covA for WHG being higher in southeastern inland counties (see Figure S1G). Conversely, as shown by median covA for the Siberian component in Figure S1I, the Siberian ancestry seems to be more abundant in northeast Estonia, consistently with Finnish ancestry shown by Pankratov and colleagues.
      • Pankratov V.
      • Montinaro F.
      • Kushniarevich A.
      • Hudjashov G.
      • Jay F.
      • Saag L.
      • Flores R.
      • Marnetto D.
      • Seppel M.
      • Kals M.
      • et al.
      Differences in local population history at the finest level: the case of the Estonian population.
      Yamnaya and Anatolia_N covAs are instead more evenly distributed (Figures S1H and S1J).

      Phenotype-associated genomic regions show specific ancestry similarity patterns

      We examined 27 complex traits (31 if considering separate classes of pigmentation) for which we had sufficient records in the Estonian Biobank (see Table S2). We corrected and adjusted them for confounding covariates, including sex, age, genotyping platform, and others as specified in Table S2. Because our analysis relies on SNPs overlapping between ancient and contemporary genetic data, a portion of the genetic influence over these traits—especially when conveyed by rare alleles—might elude our experimental setting.
      • Yang J.
      • Bakshi A.
      • Zhu Z.
      • Hemani G.
      • Vinkhuyzen A.A.
      • Lee S.H.
      • Robinson M.R.
      • Perry J.R.
      • Nolte I.M.
      • van Vliet-Ostaptchouk J.V.
      • et al.
      LifeLines Cohort Study
      Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.
      Nevertheless, our dataset captures a genetic basis for most of them, as confirmed by the trait heritability measured in our sample (Figure 1).
      Figure thumbnail gr1
      Figure 1Traits and their heritability
      All traits analyzed and their estimated heritability after covariate adjustment. Bars indicate standard errors of the estimate. Numbers in parentheses indicate the number of unrelated samples for which phenotypic information was available for each trait.
      See for partial heritability estimates on candidate regions.
      We defined three sets of candidate regions for any given trait by considering windows of 5kb, 50kb, or 500kb centered around GWAS catalog
      • Buniello A.
      • MacArthur J.A.L.
      • Cerezo M.
      • Harris L.W.
      • Hayhurst J.
      • Malangone C.
      • McMahon A.
      • Morales J.
      • Mountjoy E.
      • Sollis E.
      • et al.
      The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
      hits for corresponding trait categories (see STAR Methods and Table S3). As shown in Figure S2, these genomic regions harbor a higher heritability intensity ( h 2 / Mb) than the whole genome, supporting their suitability as candidate regions for the traits of interest.
      Next, we used covAs computed on the candidate regions as a predictor to model traits and asked whether they showed significantly different regression coefficients when compared to 50 size-matching random genomic sets; this was found true in 11 out of 27 traits (double-sided Z-test, Benjamini-Hochberg false discovery rate (FDR) = 0.05), see Z-scores in Figure 2. This analysis has the advantage of automatically controlling for virtually all potential confounders that apply to the genome in its entirety—e.g., social, economic, and cultural statuses—thus allowing us to not include any such covariates in the model. In addition, this analysis pinpoints genetic signals that are likely to be functionally connected to the trait. Among others, blood cholesterol levels are shown to be positively correlated with similarity to Yamnaya in cholesterol-associated regions with respect to the rest of the genome, while the opposite is true for WHG.
      Figure thumbnail gr2
      Figure 2Ancestry-trait association on candidate regions
      (A) Z-scores of covA coefficients, the color refers to the ancestry tested.
      (B) Z-scores of coefficients associated with covA independent components (IC) computed with whole genome-based covA PC loadings. Each color is associated with one of the three ICs. For each trait we show the Z-score of the standardized coefficient associated with candidate regions against a distribution of 50 random genomic regions of matching size. Candidate regions are determined around GWAS hits for appropriate traits as windows with three different widths: 5 (small dot), 50 (medium dot), and 500 (large dot) kilobases. Pastel dots are deemed not significant at Benjamini-Hochberg FDR = 0.05, p value from double-sided Z-test; asterisks mark traits to be considered significant according to (B); dotted lines correspond to absolute Z-scores = 2 .
      (C) Loading matrix for genome-wide covAs and their PCs, used to transform covAs into their ICs. The three genome-wide (GW) PCs accounted for 0.498, 0.327, and 0.175 covAs variance, respectively. PCs and relative ICs can be interpreted as axes defined by 2 or 3 covAs.
      See for an interpretation of significant cases.
      Since covA exhibits a high correlation across ancestries, we avoided implementing a model with largely multicollinear predictors including covA for all ancestries and instead adopted separate models for each ancestry, complementing them with a regression on covA independent components (ICs) (Figure 2B). We used the loadings from a principal component (PC) analysis on whole genome covAs (Figure 2C) to transform region-specific covAs into ICs. This, though not returning actual PCs in each candidate region, drastically reduces the collinearity (highest variance inflation factor = 1.62 in hair color 50kb candidate regions), while allowing simpler interpretation and, crucially, cross-region comparisons required for Z-scores computation. While covAs (Figure 2A) highlight the overall excess or lack of certain ancestries in relation with a given phenotype but are largely intertwined, ICs (Figure 2B) can be interpreted as independent axes defined by 2 or 3 covAs. We therefore adopted ICs to discriminate significant ancestry-trait associations, as they are independent variables in a comprehensive predictive model. Significant results, interpreted in light of the ICs, are summarized in Table S4 and discussed below. Among others, this analysis confirms the association between cholesterol levels and the Yamnaya-WHG axis previously mentioned.

      Comparison with genome-wide ancestry similarity

      We followed up the association between phenotypes and local excess or lack of a given ancestry and explored whether a similar pattern held at whole genome level by computing genome-wide covAs. Here, being unable to correct for environmental confounders with a Z-score approach and avoiding genotype-based PCs as covariates in order not to hinder potential genome-wide signals, we run the risk of obtaining spurious ancestry-trait associations. This is due to uneven ancestry similarity across Estonia concurrent with geographically associated socio-economic differences that can potentially confound genotype-phenotype associations. Although the confounding effect of population structure is minimized by the inclusion of a relatively uniform population, small differences related to historical reasons
      • Pankratov V.
      • Montinaro F.
      • Kushniarevich A.
      • Hudjashov G.
      • Jay F.
      • Saag L.
      • Flores R.
      • Marnetto D.
      • Seppel M.
      • Kals M.
      • et al.
      Differences in local population history at the finest level: the case of the Estonian population.
      are still visible in covA (see Figures S1G–S1J). Therefore, we include a city/countryside residency covariate in the models, defined as 1 for people living in the wealthiest and most populous county (Harju county) and 0 otherwise, and a covariate for educational attainment, which is a good proxy for family socioeconomic status.
      • Liu H.
      Genetic architecture of socioeconomic outcomes: Educational attainment, occupational status, and wealth.
      ,
      • Morris T.T.
      • Davies N.M.
      • Hemani G.
      • Smith G.D.
      Population phenomena inflate genetic associations of complex social traits.
      This control allows us to suggest a significant influence of genome-wide ancestry on 16 traits out of 27, as shown in Figure S3, even when geographical and social stratification is present (coefficient p value significant at Benjamini-Hochberg FDR = 0.05). Again, covA-based PCs were used to interpret significant results.
      Interestingly, we do not always observe concordance between the region-specific and genome-wide results, as shown in Figure 3, pointing to the fact that region specific trends are not always sufficient to drive genome-wide signals to significance or might even arise in a contrasting genomic background. This is especially true for less polygenic traits (e.g., pigmentation) but also for more polygenic ones, as indicated by height association with WHG. On the other hand, we also find genome-wide ancestry-trait connections that are not exacerbated in candidate regions, thus losing Z-score significance. This can occur for a single ancestry (e.g., Anatolia_N or Siberia and height) or cause the loss of trait associations altogether, as for alcohol consumption, depression, sleep duration, social jetlag, diopters, pulse pressure, and creatinine levels. Finally, we observed that genome-wide covAs for WHG and Yamnaya tend to be linked to most phenotypes in a similar fashion, in contrast with results found in candidate regions where the two ancestries behave in a more independent manner (Figure S3).
      Figure thumbnail gr3
      Figure 3Comparison between genome-wide and region-specific ancestry-trait associations
      (A and B) The y axis represent Z-scores of covA coefficients, for covA computed on candidate regions (CR) of 5 kilobases as in . X-axes represent genome-wide (GW) covA estimated coefficients: we report b e t a effect sizes for continuous traits in (A) and Odds Ratios for categorical traits in (B). Independent models are run for different covAs. Colors label the ancestry tested, while inner and outer color intensity represents significance of CR covA Z-score and GW covA coefficients, respectively. Pastel colors indicate not significant results at Benjamini-Hochberg FDR = 0.05 (double-sided Z-test p value for CR covA Z-score or double-sided coefficient p value for GW covA coefficients). Labels indicate selected outlying ancestry-trait associations.
      GW covA coefficients are also reported in .

      Selection signatures at candidate regions with ancestry-trait association

      So far, we only explored associations between a given trait and a local or genome-wide excess of a given ancestry. The observed local admixture imbalance points to a role of that ancient contribution in explaining a given phenotype. However, these results alone do not show whether after the admixture event the incoming genetic material also underwent a selective sweep within the recipient population, altering population-wide allele frequencies as investigated in Mathieson et al., 2015.
      • Mathieson I.
      • Lazaridis I.
      • Rohland N.
      • Mallick S.
      • Patterson N.
      • Roodenberg S.A.
      • Harney E.
      • Stewardson K.
      • Fernandes D.
      • Novak M.
      • et al.
      Genome-wide patterns of selection in 230 ancient Eurasians.
      In other words, the local admixture imbalances we detected so far are not necessarily transferred to the whole population.
      We independently asked whether the phenotype-associated regions above also exhibit signs of recent natural selection. We applied CLUES
      • Stern A.J.
      • Wilton P.R.
      • Nielsen R.
      An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.
      to the list of GWAS hits used as index for our candidate regions to obtain per-SNP evidence of recent (up to 500 generations ago) natural selection and to see which phenotypes show enrichment in SNPs with strong selection signals compared to a random set of GWAS hits. Out of the genomic regions responsible for ancestry-trait association shown in Figure 2, pigmentation-related SNPs (eye and hair color) showed extremely high CLUES logLR values (Figure 4A; Figure S4) in accordance with previous results,
      • Mathieson I.
      • Lazaridis I.
      • Rohland N.
      • Mallick S.
      • Patterson N.
      • Roodenberg S.A.
      • Harney E.
      • Stewardson K.
      • Fernandes D.
      • Novak M.
      • et al.
      Genome-wide patterns of selection in 230 ancient Eurasians.
      ,
      • Ju D.
      • Mathieson I.
      The evolution of skin pigmentation-associated variation in West Eurasia.
      ,
      • Key F.M.
      • Fu Q.
      • Romagné F.
      • Lachmann M.
      • Andrés A.M.
      Human adaptation and population differentiation in the light of ancient genomes.
      as well as SNPs related to BMI and cholesterol, pointing to ongoing or recent selection at these loci. Diastolic blood pressure (DBP) and sleep-related SNPs also showed the same extreme signature, but the candidate regions encompassing them did not reach significance in ancestry-trait association.
      Figure thumbnail gr4
      Figure 4Selection signatures
      (A) CLUES log likelihood ratios (logLR) values distribution for GWAS hits for six selected phenotypes. For each phenotype, at most 100 top SNPs with highest logLR values and the corresponding ranks from the random GWAS hits distribution are shown. Gray dots show mean values for each rank in the background distribution while the whiskers show the 5-95 percentile range. The logLR values for tested SNPs are shown in red or blue depending on whether the value lies above the 95th percentile of the values from the background distribution with a given rank. Number of tested SNPs for each phenotype are shown in panel titles. Sleep indicates SNPs connected to all sleep traits as indicated in . See also .
      (B) Maximum likelihood estimates of derived allele frequency trajectories for top three SNPs with highest logLR values for each phenotype. When more than one SNPs come from the same locus, only the top-scoring SNP is shown.
      See also and .
      The recent and putatively ongoing nature of the inferred selective pressure on the six traits shown in Figure 4A is further exemplified by the steep increase in derived allele frequencies over time inferred for the top three SNPs of each trait and shown in Figure 4B. These include some loci previously shown to be selected in West Eurasians (rs4988235 at MCM6/LCT,
      • Bersaglieri T.
      • Sabeti P.C.
      • Patterson N.
      • Vanderploeg T.
      • Schaffner S.F.
      • Drake J.A.
      • Rhodes M.
      • Reich D.E.
      • Hirschhorn J.N.
      Genetic signatures of strong recent positive selection at the lactase gene.
      pigmentation-related SNPs at HERC2/OCA2, TYRP1, TYR, TPCN2,
      • Ju D.
      • Mathieson I.
      The evolution of skin pigmentation-associated variation in West Eurasia.
      ,
      • Key F.M.
      • Fu Q.
      • Romagné F.
      • Lachmann M.
      • Andrés A.M.
      Human adaptation and population differentiation in the light of ancient genomes.
      ,
      • Pickrell J.K.
      • Coop G.
      • Novembre J.
      • Kudaravalli S.
      • Li J.Z.
      • Absher D.
      • Srinivasan B.S.
      • Barsh G.S.
      • Myers R.M.
      • Feldman M.W.
      • Pritchard J.K.
      Signals of recent positive selection in a worldwide sample of human populations.
      and rs653178 at ATXN2
      • Ding K.
      • Kullo I.J.
      Geographic differences in allele frequencies of susceptibility SNPs for cardiovascular disease.
      ) and some others yet to be explored. In particular, rs17630235, associated with BMI and DBP, is an expression quantitative trait locus (QTL) in several epithelial tissues
      GTEx Consortium
      The GTEx Consortium atlas of genetic regulatory effects across human tissues.
      for ALDH2, an aldehyde dehydrogenase known for its role in the alcohol metabolism.
      • Wang W.
      • Wang C.
      • Xu H.
      • Gao Y.
      Aldehyde dehydrogenase, liver disease and cancer.
      Although this selective signal might be due to rs17630235 proximity with ATXN2, it is tempting to speculate about the changed role of ALDH2 in a post-neolithic society, which made available several fermentable substrates. Other selected SNPs include rs74555583 and rs11539148, both associated with sleep patterns (chronotype). Most notably, the latter is a missense variant in the catalytic domain of QARS1, for which also functions as splicing QTL.
      GTEx Consortium
      The GTEx Consortium atlas of genetic regulatory effects across human tissues.
      QARS1 itself encodes an enzyme involved in the glutaminyl-tRNA synthesis and, when mutated, leads to microcephaly, cerebral-cerebellar atrophy, and seizures.
      • Zhang X.
      • Ling J.
      • Barcia G.
      • Jing L.
      • Wu J.
      • Barry B.J.
      • Mochida G.H.
      • Hill R.S.
      • Weimer J.M.
      • Stein Q.
      • et al.
      Mutations in QARS, encoding glutaminyl-tRNA synthetase, cause progressive microcephaly, cerebral-cerebellar atrophy, and intractable seizures.

      Discussion

      Here we combined existing knowledge on genotype-phenotype associations and the availability of ancient genomes to assess the impact of ancient migrations on the phenotypic landscape of contemporary Europeans. We leveraged on traits measured in living individuals, complementing previous works where phenotypes were inferred for ancient genomes instead. As a whole, the most affected traits include pigmentation and anthropometric traits together with blood cholesterol levels, caffeine consumption, heart rate, and age at menarche.
      Importantly, while our genome-wide results highlight an overall excess of an ancestry in the carriers of a given phenotype, this is not necessarily mirrored at the genetic loci for which the genotype-phenotype association is ascertained in the literature. A genome-wide excess can completely explain a regional signal, leading to non-significant Z-scores, and even indicate a different direction for the same ancestry. While the first scenario can be due to the extreme polygenicity of a trait, possibly coupled with an inaccurate tagging of the actual functional regions by the GWAS catalog hits, the second might indicate an incomplete correction of non-genetic factors in the genome-wide analysis. Indeed, it is possible that place of residence and educational attainment alone cannot fully account for confounding environmental effects such as socioeconomic status. Conversely, candidate region Z-scores are disentangled from background confounders and virtually free from collinearity issues when they also agree with the relevant ICs. In this light, we here chose to report and discuss results showing region-specific significance for covAs and matching ICs (as reported in Table S4), thus refraining from making inferences on traits such as eye pigmentation in Yamnaya, among others.
      WHG ancestry in present day individuals is linked to lower cholesterol levels, higher BMI, and putatively contributed brown hair and light eye color to the contemporary Estonian population. This last association has been previously described based on the HERC/OCA2 haplotypes found in ancient WHG samples.
      • Olalde I.
      • Allentoft M.E.
      • Sánchez-Quinto F.
      • Santpere G.
      • Chiang C.W.K.
      • DeGiorgio M.
      • Prado-Martinez J.
      • Rodríguez J.A.
      • Rasmussen S.
      • Quilez J.
      • et al.
      Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.
      ,
      • Key F.M.
      • Fu Q.
      • Romagné F.
      • Lachmann M.
      • Andrés A.M.
      Human adaptation and population differentiation in the light of ancient genomes.
      In addition, loci associated with these features also appear to have undergone selection in Estonians. Other region-specific associations for this ancestry include decreased hip circumference and increased caffeine consumption and heart rate.
      An enriched Yamnaya ancestry is linked to a strong build, with tall stature (in agreement with previous studies
      • Mathieson I.
      • Lazaridis I.
      • Rohland N.
      • Mallick S.
      • Patterson N.
      • Roodenberg S.A.
      • Harney E.
      • Stewardson K.
      • Fernandes D.
      • Novak M.
      • et al.
      Genome-wide patterns of selection in 230 ancient Eurasians.
      ,
      • Cox S.L.
      • Ruff C.B.
      • Maier R.M.
      • Mathieson I.
      Genetic contributions to variation in human stature in prehistoric Europe.
      ) and increased hip and waist circumferences, both at genome-wide and region-specific levels, but also to black hairs and high-cholesterol concentrations when focusing on candidate regions. The associations of Yamnaya and WHG ancestries to respectively higher and lower cholesterol levels, together with the observed signatures of selection at loci connected to cholesterol and BMI, add a new component to our understanding of post-neolithic dietary adaptation
      • Saag L.
      • Vasilyev S.V.
      • Varul L.
      • Kosorukova N.V.
      • Gerasimov D.V.
      • Oshibkina S.V.
      • Griffith S.J.
      • Solnik A.
      • Saag L.
      • D’Atanasio E.
      • et al.
      Genetic ancestry changes in Stone to Bronze Age transition in the East European plain.
      ,
      • Buckley M.T.
      • Racimo F.
      • Allentoft M.E.
      • Jensen M.K.
      • Jonsson A.
      • Huang H.
      • Hormozdiari F.
      • Sikora M.
      • Marnetto D.
      • Eskin E.
      • et al.
      Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes.
      ,
      • Mathieson S.
      • Mathieson I.
      FADS1 and the Timing of Human Adaptation to Agriculture.
      with potential implications to disease risk and outcomes in present-day populations.
      Anatolia_N enrichment in trait-related genomic regions is connected with a reduced BMI-corrected waist-to-hip ratio, reduced BMI, light (but not green) eyes and fair hair, increased age at menarche, and reduced heart rate. Notably, covA(i,Anatolia_N) has a substantial weight only in IC2, the single IC that reaches significance when predicting heart rate, suggesting a prominent role for this ancestry in determining this trait.
      Finally, the Siberian ancestry is connected with dark hair pigmentation, higher heart rate, lower caffeine consumption, and most prominently, green eye color and lower age at menarche. Importantly, while the results connected to the Siberian ancestry are not of broad applicability to all European populations, covA(i,Siberia) and relative ICs received effect-sizes with mixed significance in all the previous traits except for age at menarche and pigmentation, suggesting that other ancestries might have a larger impact. In other words, we do not find other phenotypes that can be best explained by similarity with Siberia, implying that the presence of this ancestry in the Estonian genome does not significantly affect the inference based on the other, pan-European ancient components.
      A general caveat about significance levels observed in this study is that as we refrain from reducing interdependent traits by arbitrary choices, even testing multiple alternatives of the same trait, we expose ourselves to inflated false negatives. We deemed it best to acknowledge and control this risk by avoiding overly stringent multiple testing corrections as Bonferroni and adopting the Benjamini-Hochberg procedure to control FDR instead. In addition, as highly significant traits tend to have higher heritability, it is likely that our analysis might not have enough statistical power for poorly heritable traits. Nevertheless, as we are able to highlight ancestry-trait associations for caffeine consumption ( h 2 = 0.087 ± 0.009 ) , brown hair color ( h 2 = 0.079 ± 0.009 ) , and even heart rate ( h 2 = 0.044 ± 0.009 ) , this condition should be limited only to the very few traits exhibiting lower heritabilities.
      Importantly, our inferences are applicable to contemporary individuals of European ancestry, where the phenotypes were collected. Conversely, using them to extrapolate features of ancient populations, although tempting, should be done with caution due to the interaction of their genetic legacy with a radically different lifestyle and environment. Furthermore, when seeking a biological interpretation of our results, it should be kept in mind that certain narrowly defined, contemporary phenotypes such as caffeine consumption may point to broader biological pathways.
      Taken together, our results show that the ancient components that form the contemporary European landscape were differentiated enough at a functional level to contribute ancestry-specific signatures on the phenotypic variability displayed by contemporary individuals, regardless of which target population one may examine. In particular, when looking at Estonians, for 11 out of 27 traits surveyed here we could confirm a significant relationship between presence of a given ancestry in genetic regions associated with a given phenotype and how this is expressed by contemporary individuals. While showing that both autochthonous (WHG) and incoming groups contributed genetic material that shapes the phenotype landscape observed today, we also demonstrated that a subset of these loci further underwent positive selection in the last 500 generations. Although not determining whether the selected alleles (and phenotypes) were predominantly contributed by the autochthonous or incoming groups, by connecting genotypic ancestry and complex traits measured in a large dataset, our results reveal both neutral and adaptive consequences of the post-neolithic admixture events on the European phenotype landscape.

      STAR★Methods

      Key resources table

      Tabled 1
      REAGENT or RESOURCE SOURCE IDENTIFIER
      Deposited data
      Estonian genetic and phenotypic data Estonian Biobank https://genomics.ut.ee/en/access-biobank
      AADR and HumanOrigins dataset Allen Ancient DNA Resource https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data
      GWAS hits GWAScatalog on 20/11/2020 https://www.ebi.ac.uk/gwas/
      EQTL and sQTL data GTEx portal on 18/10/2021 https://www.gtexportal.org/home/
      1000 Genomes strict mask
      The 1000 Genomes Project Consortium
      A global reference for human genetic variation.
      http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/supporting/accessible_genome_masks/
      Software and algorithms
      PLINK 1.9
      • Chang C.C.
      • Chow C.C.
      • Tellier L.C.
      • Vattikuti S.
      • Purcell S.M.
      • Lee J.J.
      Second-generation PLINK: rising to the challenge of larger and richer datasets.
      https://www.cog-genomics.org/plink2
      Eigensoft-6.1.4
      • Patterson N.
      • Price A.L.
      • Reich D.
      Population Structure and Eigenanalysis.
      https://alkesgroup.broadinstitute.org/EIGENSOFT/
      msprime
      • Peter B.M.
      Admixture, population structure, and f-statistics.
      https://github.com/tskit-dev/msprime
      LDAK 5.0
      • Reich D.
      • Thangaraj K.
      • Patterson N.
      • Price A.L.
      • Singh L.
      Reconstructing Indian population history.
      https://dougspeed.com/ldak/
      Relate
      • Speidel L.
      • Forest M.
      • Shi S.
      • Myers S.R.
      A method for genome-wide genealogy estimation for thousands of samples.
      https://myersgroup.github.io/relate/
      CLUES
      • Stern A.J.
      • Wilton P.R.
      • Nielsen R.
      An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.
      https://github.com/35ajstern/clues
      Analysis pipeline This paper https://bitbucket.org/dmarnetto/anccontributions

      Resource availability

      Lead contact

      Further information and requests for resources and reagents should be directed to and will be fulfilled by the lead contact, Davide Marnetto ([email protected]).

      Materials availability

      This study did not generate new unique reagents.

      Experimental model and subject details

      We relied on a set of 50,341 individuals from the Estonian Biobank
      • Leitsalu L.
      • Haller T.
      • Esko T.
      • Tammesoo M.L.
      • Alavere H.
      • Snieder H.
      • Perola M.
      • Ng P.C.
      • Mägi R.
      • Milani L.
      • et al.
      Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu.
      with coupled genotypic and phenotypic data (contemporary Estonian sample set). Among them, 49,799 samples were extracted for the following analyses as described in ”Method details” section, including 32,709 adult (age 18) females and 17,090 adult males, with median age 43. The Estonian Biobank managed recruitment, inclusion and exclusion criteria, and obtained informed consent for all participants in the study. Data were accessed upon approval of the Estonian Committee on Bioethics and Human Research: Approval Number 285/T-13 obtained on 17/09/2018 by the University of Tartu Ethics Committee.

      Method details

      Sample selection and ancient European grouping

      After removing second-degree relatives (pi-hat > 0.25 ) from the contemporary Estonian sample set (n = 50,341) we obtained a subset of 37,952 individuals and used it as a scaffold to perform a PC Analysis (PCA) with Eigensoft-6.1.4. Other individuals were projected on the same PCA space. Outliers identified in this process (with parameters numoutlieriter: 5 numoutlierevec: 10 outliersigmathreshold: 6) were discarded. Samples that on the first round of genome-wide covAs were more distant than 8 Interquartile Ranges (IQR) from the upper or lower quartile against any of the ancestries were also discarded, resulting in 49,799 individuals included in our sample set. For each trait of interest we first removed individuals with missing data for traits and covariates and subsequently discarded second-degree relatives.
      To define ancestral European groups we started from the Allen Ancient DNA Resource (AADR) V44.3 merged with present-day individuals typed on the Human Origins array (see Data Availability section). From this set we defined a manually curated core set for each ancestral group, then performed a PCA on a space defined by modern Eurasian and North African individuals west of Iran (included), where the ancient samples were projected. We expanded these core sets to other individuals from AADR dataset using multi-dimensional ellypses with diameters equal to 3 core set SDs. We used 4 dimensions: the annotated dating and the first 3 PCs generated above. With this process we selected 90 WHG, 92 Anatolia_N, 74 Yamnaya S1. In addition, from the ones available from the same dataset, we took 7 samples as representative of the broader Siberian ancestry, assuming any Siberian individual would be equidistant to the other ancestral European groups: S_Even-3.DG, S_Even-1.DG, S_Even-2.DG, Bur1.SG, Bur2.SG, Kor1.SG and Kor2.SG. 957,869 SNPs remained in our dataset after merging the contemporary and ancient sets.

      covA definition

      covA is the covariance in allele frequency (p) within a contemporary individual i (i.e., its allele dosage) with the ancestral group of interest j, computed respectively against the allele frequency p C of the contemporary population C and the average frequency p A in all the A ancient groups:
      c o v A ( i , j ) = ( p i p C ) ( p j p A )
      (2)


      When comparing covA with outgroup f 3 ( i , j ; Y o r u b a ) ,
      • Reich D.
      • Thangaraj K.
      • Patterson N.
      • Price A.L.
      • Singh L.
      Reconstructing Indian population history.
      where j is one of the four ancestral groups, the statistics are different but strongly correlated (data not shown): this is expected when the f 3 outgroup population is an outlier to all populations, contemporary and ancestral, considered in covA, as in f 3 ( i , j ; Y o r u b a ) . Indeed c o v A ( i , j ) has a strong relationship with f-statistics,
      • Peter B.M.
      Admixture, population structure, and f-statistics.
      i.e., c o v A ( i , j ) = f 4 ( i , C ; j , A ) = f 3 ( i , j , A ) f 3 ( C , j , A ) where C is the contemporary population (Estonians in our case) and A is an ideal population with p = p A . Nevertheless, as opposed to f-statistics, which include allele frequencies in groups that portray actual populations, c o v A ( i , j ) includes p A , an average allele frequency which only serves as balanced comparison for the ancestries under analysis. In relation with our aim, this constitutes an advantage of covA, which does not take into account drift or selection occurred in the branch that connects the outgroup population with the internal node shared by the other populations under analysis. Related to this, see our simulation in Figures S1A and S1B. See also covA distributions in Estonia in Figures S1C–S1F.

      Candidate genomic regions

      We downloaded GWAS hits from GWAS catalog
      • Buniello A.
      • MacArthur J.A.L.
      • Cerezo M.
      • Harris L.W.
      • Hayhurst J.
      • Malangone C.
      • McMahon A.
      • Morales J.
      • Mountjoy E.
      • Sollis E.
      • et al.
      The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
      (date of download: 20/11/2020) and then extracted for each trait a set of hits connected to it filtering on the reported trait (”TRAIT/DISEASE” field) or selecting the appropriate trait in the Experimental Factor Ontology (EFO) field, as specified in Table S3. Then we took windows of 5, 50 and 500 Kbs centered on the selected hits and merged them where overlapping, obtaining three sets of candidate regions for each trait.

      Quantification and statistical analysis

      Simulation

      We used msprime
      • Kelleher J.
      • Etheridge A.M.
      • McVean G.
      Efficient coalescent simulation and genealogical analysis for large sample sizes.
      to simulate a contemporary population of 5000 individuals with the following composition: 50% EUR1, 30% EUR2, 20% EUR3. These are simulated groups which separated 10kya and admixed 3kya. Each genome consists of 1000 independent loci of 50kb. We then simulated complex traits as following: we extract randomly one causal SNP in the central quartiles of each window, among those with MAF > 0.1 , and assign standard-distributed effect sizes. We forced an ancestry-trait interaction by enforcing uniformly distributed correlations between the effect sizes and the causal allele frequency specificity, i.e., the allele frequency difference between the contributing ancestry and the other ancestries. As we modeled a completely symmetrical admixture among the ancestral groups, for each simulated trait we could directly measure its shift in the ancestry j as a proxy for the true ancestry-trait association. This directional shift is described as average trait value for that ancestral group standardized over all ancestral individuals ( Z ¯ j ) . See also Figures S1A and S1B.

      Phenotypes treatment and heritability

      Continuous traits were treated as specified in Table S2 and regressed against the covariates according to the same table. Individuals with traits or covariates more distant than 4 IQRs from the upper or lower quartile were considered as outliers and discarded. After adjusting traits as described, their heritability was computed using LDAK 5.0.
      • Speed D.
      • Holmes J.
      • Balding D.J.
      Evaluating and improving heritability models using summary statistics.
      First we computed a kinship matrix with the LDAK-Thin Model: we thinned down SNPs on the non-related sample set defined above with parameters–window-prune 0.98–window-kb 100, then used–calc-kins-direct with the resulting weights and–power 0.25. Finally we estimated heritability using REML solver.

      Testing for ancestry-trait association with covA and covA-based PCs

      We fitted each standardized trait t i with a model including one standardized covA for each ancestry j and estimated its coefficient: t i = β j c o v A j + ε i . We adopted a logistic regression for categorical traits, which were transformed to {0, 1} where 1 stands for the specified category and 0 for all the others. covAs were computed on the corresponding candidate region for each trait. In addition, each trait was regressed against three PC-transformed covAs: t i = β 1 P C 1 + β 2 P C 2 + β 3 P C 3 + ε i . Notably, we transformed all covAs using the loadings obtained from a PC analysis run on whole genome covAs, thus obtaining components that were largely independent, yet not strictly principal. These Independent Components (ICs) were standardized and included together as predictors. To evaluate association we used coefficient Z-scores computed against the same parameter extracted from 50 random genomic sets with matching size (double-sided Z-test). Sample sizes are reported in Figure 1 and Table S2.
      In the genome-wide analysis, we adopted similar steps, performing individual regressions for all the covAs and coupling this with a model including all covAs PCs, but socioeconomic variables were added as covariates in all models as described in the result section. Note that in this analysis ICs are not needed anymore, but actual PCs are used. Then, the standardized coefficient (β or effect size), or the Odds Ratio (OR) were directly used to assess ancestry-trait association for continuous and categorical traits respectively (double-sided coefficient p value from glm R function). This analysis was restricted to samples for which socioeconomic covariates were defined, i.e., 38,996 samples (including relatives): the actual sample size for this analysis is therefore less than reported in Figure 1 and Table S2.
      Significance was evaluated at Benjamini-Hochberg False Discovery Rate = 0.05 on the results from IC/PC regressions. See comments on the implications of this definition in the ”Discussion” section.

      Testing for signals of positive selection

      In order to test individual SNPs for signatures of positive selection we utilized the Relate/CLUES pipeline.
      • Stern A.J.
      • Wilton P.R.
      • Nielsen R.
      An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.
      ,
      • Speidel L.
      • Forest M.
      • Shi S.
      • Myers S.R.
      A method for genome-wide genealogy estimation for thousands of samples.
      This was applied on a curated subset of 1800 unrelated samples; further details on its application are described in ”Relate/CLUES application” section. CLUES was run once for each of the 14,712 unique GWAS hits for traits analyzed here with a derived allele frequency (DAF) above 1% and passing the 1000 Genomes strict mask (SNP size for each trait given as number in Figures 4 and S4). To obtain an expected distribution we randomly sampled 10,000 GWAS hits from the GWAS catalog meeting the same conditions and ran CLUES for positions not present among the 14,712 SNPs. Next, for each phenotype we compared its distribution of the logLR values to that of random GWAS hits. We took 1000 random subsets (with replacement) from the 10,000 logLR values each of the same length as the number of GWAS hits for a given phenotype and ranked the logLR values from lowest to highest within each subset. In this way we obtained 1000 values for each logLR rank from 1 to N where N is the number of SNPs analyzed for a given phenotype. For each rank we calculated the mean and the 5 t h and 95 t h percentiles. Finally, we rank SNPs within each trait and compare each logLR value to the mean and 5 t h 95 t h percentiles range for the corresponding rank of the background distribution. As we are interested in deviations in the higher ranks we focus on the top 100 ranks for each phenotype. Such an approach is conservative as we are testing not against presumably neutral SNPs but against random GWAS hits that are shown to be enriched in signals on natural selection compared to random SNPs in the genome.
      • Speidel L.
      • Forest M.
      • Shi S.
      • Myers S.R.
      A method for genome-wide genealogy estimation for thousands of samples.

      Relate/CLUES application

      We built local genealogical trees by applying Relate v1.1.4
      • Speidel L.
      • Forest M.
      • Shi S.
      • Myers S.R.
      A method for genome-wide genealogy estimation for thousands of samples.
      to 2420 phased whole genome sequences from the Estonian Biobank.
      • Pankratov V.
      • Montinaro F.
      • Kushniarevich A.
      • Hudjashov G.
      • Jay F.
      • Saag L.
      • Flores R.
      • Marnetto D.
      • Seppel M.
      • Kals M.
      • et al.
      Differences in local population history at the finest level: the case of the Estonian population.
      ,
      • Kals M.
      • Nikopensius T.
      • Läll K.
      • Pärn K.
      • Tõnis Sikka T.
      • Suvisaari J.
      • Salomaa V.
      • Ripatti S.
      • Palotie A.
      • Metspalu A.
      • et al.
      Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses.
      We used the GRCh37_e71 ancestral genome to polarize alleles and applied the strict callability mask from The 1000 Genomes Project.
      The 1000 Genomes Project Consortium
      A global reference for human genetic variation.
      We used the GRCh37 recombination map,
      The 1000 Genomes Project Consortium
      A global reference for human genetic variation.
      mutation rate of 1.25x10-8 and effective population size of 30000. We then removed: a) 115 related samples; b) PCA outliers (2.5% in both directions along 1st and 2nd PCs both for Estonian-only PCA and when projecting Estonian samples on a “European” PC space); c) singleton count outliers (top/bottom 2.5%); d) samples with pairwise total IBD sharing 166.2 cM with more than one other sample; d) 57 random samples. These filtering steps lead to a set of 1800 samples and are described in Pankratov et al., 2020.
      • Pankratov V.
      • Montinaro F.
      • Kushniarevich A.
      • Hudjashov G.
      • Jay F.
      • Saag L.
      • Flores R.
      • Marnetto D.
      • Seppel M.
      • Kals M.
      • et al.
      Differences in local population history at the finest level: the case of the Estonian population.
      We estimated coalescence rate through time (EstimatePopulationSize.sh module) with a random subset of 100 samples (for which subtrees were extracted with RelateExtract SubTreesForSubpopulation), mutation rate 1.25x10-8, generation time 28 years, number of iterations 5, tree dropping threshold 0.5 and time bins defined as 10x years ago where x changes from 2 to 7 with an increment of 0.1. We ran CLUES
      • Stern A.J.
      • Wilton P.R.
      • Nielsen R.
      An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.
      using Relate trees for the 1800 individuals as input. We first sampled branch length (SampleBranchLengths.sh module) for the tree containing a SNP of interest (200 samples). We removed 100 of 200 trees as burn-in and selected every 5th tree for importance sampling by CLUES. We focused on the time window between 0 and 500 generations ago.

      Data and code availability

      This paper analyzes existing, publicly available data. The accession numbers for these datasets are listed in the key resources table. Data from Estonian Biobank are under managed access and subject to approval of the Estonian Committee on Bioethics and Human Research. All original code has been deposited at https://bitbucket.org/dmarnetto/anccontributions and is publicly available as of the date of publication.
      Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.

      Acknowledgments

      This work is supported by the European Union through the European Regional Development Fund, project no. 2014-2020.4.01.16-0024, MOBTT53 (D.M., K.P., L.M., and L.P.); MOBEC008 (V.P., M.Mondal, M.Metspalu, and A.E.); 2014-2020.4.01.16-0030 (F.M. and M.Metspalu); 2014-2020.4.01.15-0012 (M.Metspalu); through the Horizon 2020 research and innovation program grant no. 810645 (V.P., M.Mondal, M.Metspalu, and A.E.); and through the Horizon 2020 MSCA Initial Training Network, grant no. 765937 (R.C.). L.S. and M.Metspalu are supported by the Estonian Research Council through PUT PRG243. S.M. is supported by the STARS@UNIPD 2019 Consolidator Grant for the project CircadianCare. We would like to thank Paolo Provero and Fernando Racimo for insightful discussion and comments. Most of the analysis was run on the High-Performance Computing Center at the University of Tartu. The Estonian Biobank Research Team includes Mari Nelis, Lili Milani, Tōnu Esko, Andres Metspalu, and Reedik Mägi.

      Author contributions

      D.M. and L.P. conceived and designed the study; A.E. contributed in the statistical design; D.M. and V.P. performed data analysis; M.Mondal, F.M., K.P., L.V., L.M., and L.P. contributed to data analysis; S.M. and R.C. provided analyses and expertise about sleep traits; F.M., L.S., L.L., and M.Metspalu contributed with ancient genetics expertise; D.M. and L.P. drafted the manuscript; all authors reviewed and approved the submitted paper.

      Declaration of interests

      The authors declare no competing interests.

      Supplemental information

      • Table S1: Ancestral reference samples, related to STAR Methods

        The table contains for each ancient sample used in the study: ID, ancestral group assignation, and if the sample is part of the core set for that group.

      • Table S2: Traits analyzed, related to STAR Methods

        The table contains the traits analyzed and how they were encoded, corrected, and adjusted for covariates. In addition, for each trait we include sample size and case number for categorical traits.

      References

        • Lazaridis I.
        • Patterson N.
        • Mittnik A.
        • Renaud G.
        • Mallick S.
        • Kirsanow K.
        • Sudmant P.H.
        • Schraiber J.G.
        • Castellano S.
        • Lipson M.
        • et al.
        Ancient human genomes suggest three ancestral populations for present-day Europeans.
        Nature. 2014; 513: 409-413
        • Haak W.
        • Lazaridis I.
        • Patterson N.
        • Rohland N.
        • Mallick S.
        • Llamas B.
        • Brandt G.
        • Nordenfelt S.
        • Harney E.
        • Stewardson K.
        • et al.
        Massive migration from the steppe was a source for Indo-European languages in Europe.
        Nature. 2015; 522: 207-211
        • Allentoft M.E.
        • Sikora M.
        • Sjögren K.G.
        • Rasmussen S.
        • Rasmussen M.
        • Stenderup J.
        • Damgaard P.B.
        • Schroeder H.
        • Ahlström T.
        • Vinner L.
        • et al.
        Population genomics of Bronze Age Eurasia.
        Nature. 2015; 522: 167-172
        • Racimo F.
        • Woodbridge J.
        • Fyfe R.M.
        • Sikora M.
        • Sjögren K.G.
        • Kristiansen K.
        • Vander Linden M.
        The spatiotemporal spread of human migrations during the European Holocene.
        Proc. Natl. Acad. Sci. USA. 2020; 117: 8989-9000
        • Olalde I.
        • Allentoft M.E.
        • Sánchez-Quinto F.
        • Santpere G.
        • Chiang C.W.K.
        • DeGiorgio M.
        • Prado-Martinez J.
        • Rodríguez J.A.
        • Rasmussen S.
        • Quilez J.
        • et al.
        Derived immune and ancestral pigmentation alleles in a 7,000-year-old Mesolithic European.
        Nature. 2014; 507: 225-228
        • Mathieson I.
        • Lazaridis I.
        • Rohland N.
        • Mallick S.
        • Patterson N.
        • Roodenberg S.A.
        • Harney E.
        • Stewardson K.
        • Fernandes D.
        • Novak M.
        • et al.
        Genome-wide patterns of selection in 230 ancient Eurasians.
        Nature. 2015; 528: 499-503
        • Saag L.
        • Vasilyev S.V.
        • Varul L.
        • Kosorukova N.V.
        • Gerasimov D.V.
        • Oshibkina S.V.
        • Griffith S.J.
        • Solnik A.
        • Saag L.
        • D’Atanasio E.
        • et al.
        Genetic ancestry changes in Stone to Bronze Age transition in the East European plain.
        Science Advances. 2021; 7: eabd6535
        • Cox S.L.
        • Ruff C.B.
        • Maier R.M.
        • Mathieson I.
        Genetic contributions to variation in human stature in prehistoric Europe.
        Proc. Natl. Acad. Sci. USA. 2019; 116: 21484-21492
        • Ju D.
        • Mathieson I.
        The evolution of skin pigmentation-associated variation in West Eurasia.
        Proc. Natl. Acad. Sci. USA. 2021; 118 (e2009227118)
        • Berens A.J.
        • Cooper T.L.
        • Lachance J.
        The genomic health of ancient hominins.
        Hum. Biol. 2017; 89: 7-19
        • Berg J.J.
        • Coop G.
        A population genetic signal of polygenic adaptation.
        PLoS Genet. 2014; 10: e1004412
        • Racimo F.
        • Berg J.J.
        • Pickrell J.K.
        Detecting Polygenic Adaptation in Admixture Graphs.
        Genetics. 2018; 208: 1565-1584
        • Leitsalu L.
        • Haller T.
        • Esko T.
        • Tammesoo M.L.
        • Alavere H.
        • Snieder H.
        • Perola M.
        • Ng P.C.
        • Mägi R.
        • Milani L.
        • et al.
        Cohort profile: Estonian biobank of the Estonian genome center, university of Tartu.
        Int. J. Epidemiol. 2015; 44: 1137-1147
        • Buniello A.
        • MacArthur J.A.L.
        • Cerezo M.
        • Harris L.W.
        • Hayhurst J.
        • Malangone C.
        • McMahon A.
        • Morales J.
        • Mountjoy E.
        • Sollis E.
        • et al.
        The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019.
        Nucleic Acids Res. 2019; 47: D1005-D1012
        • Tambets K.
        • Yunusbayev B.
        • Hudjashov G.
        • Ilumäe A.M.
        • Rootsi S.
        • Honkola T.
        • Vesakoski O.
        • Atkinson Q.
        • Skoglund P.
        • Kushniarevich A.
        • et al.
        Genes reveal traces of common recent demographic history for most of the Uralic-speaking populations.
        Genome Biol. 2018; 19: 139
        • Saag L.
        • Laneman M.
        • Varul L.
        • Malve M.
        • Valk H.
        • Razzak M.A.
        • Shirobokov I.G.
        • Khartanovich V.I.
        • Mikhaylova E.R.
        • Kushniarevich A.
        • et al.
        The Arrival of Siberian Ancestry Connecting the Eastern Baltic to Uralic Speakers further East.
        Curr. Biol. 2019; 29: 1701-1711.e16
        • Damgaard P.B.
        • Marchi N.
        • Rasmussen S.
        • Peyrot M.
        • Renaud G.
        • Korneliussen T.
        • Moreno-Mayar J.V.
        • Pedersen M.W.
        • Goldberg A.
        • Usmanova E.
        • et al.
        137 ancient human genomes from across the Eurasian steppes.
        Nature. 2018; 557: 369-374
        • Pankratov V.
        • Montinaro F.
        • Kushniarevich A.
        • Hudjashov G.
        • Jay F.
        • Saag L.
        • Flores R.
        • Marnetto D.
        • Seppel M.
        • Kals M.
        • et al.
        Differences in local population history at the finest level: the case of the Estonian population.
        Eur. J. Hum. Genet. 2020; 28: 1580-1591
        • Yang J.
        • Bakshi A.
        • Zhu Z.
        • Hemani G.
        • Vinkhuyzen A.A.
        • Lee S.H.
        • Robinson M.R.
        • Perry J.R.
        • Nolte I.M.
        • van Vliet-Ostaptchouk J.V.
        • et al.
        • LifeLines Cohort Study
        Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index.
        Nat. Genet. 2015; 47: 1114-1120
        • Liu H.
        Genetic architecture of socioeconomic outcomes: Educational attainment, occupational status, and wealth.
        Soc. Sci. Res. 2019; 82: 137-147
        • Morris T.T.
        • Davies N.M.
        • Hemani G.
        • Smith G.D.
        Population phenomena inflate genetic associations of complex social traits.
        Science Advances. 2020; 6: eaay0328
        • Stern A.J.
        • Wilton P.R.
        • Nielsen R.
        An approximate full-likelihood method for inferring selection and allele frequency trajectories from DNA sequence data.
        PLoS Genet. 2019; 15: e1008384
        • Key F.M.
        • Fu Q.
        • Romagné F.
        • Lachmann M.
        • Andrés A.M.
        Human adaptation and population differentiation in the light of ancient genomes.
        Nat. Commun. 2016; 7: 10775
        • Bersaglieri T.
        • Sabeti P.C.
        • Patterson N.
        • Vanderploeg T.
        • Schaffner S.F.
        • Drake J.A.
        • Rhodes M.
        • Reich D.E.
        • Hirschhorn J.N.
        Genetic signatures of strong recent positive selection at the lactase gene.
        Am. J. Hum. Genet. 2004; 74: 1111-1120
        • Pickrell J.K.
        • Coop G.
        • Novembre J.
        • Kudaravalli S.
        • Li J.Z.
        • Absher D.
        • Srinivasan B.S.
        • Barsh G.S.
        • Myers R.M.
        • Feldman M.W.
        • Pritchard J.K.
        Signals of recent positive selection in a worldwide sample of human populations.
        Genome Res. 2009; 19: 826-837
        • Ding K.
        • Kullo I.J.
        Geographic differences in allele frequencies of susceptibility SNPs for cardiovascular disease.
        BMC Med. Genet. 2011; 12: 55
        • GTEx Consortium
        The GTEx Consortium atlas of genetic regulatory effects across human tissues.
        Science. 2020; 369: 1318-1330
        • Wang W.
        • Wang C.
        • Xu H.
        • Gao Y.
        Aldehyde dehydrogenase, liver disease and cancer.
        Int. J. Biol. Sci. 2020; 16: 921-934
        • Zhang X.
        • Ling J.
        • Barcia G.
        • Jing L.
        • Wu J.
        • Barry B.J.
        • Mochida G.H.
        • Hill R.S.
        • Weimer J.M.
        • Stein Q.
        • et al.
        Mutations in QARS, encoding glutaminyl-tRNA synthetase, cause progressive microcephaly, cerebral-cerebellar atrophy, and intractable seizures.
        Am. J. Hum. Genet. 2014; 94: 547-558
        • Buckley M.T.
        • Racimo F.
        • Allentoft M.E.
        • Jensen M.K.
        • Jonsson A.
        • Huang H.
        • Hormozdiari F.
        • Sikora M.
        • Marnetto D.
        • Eskin E.
        • et al.
        Selection in Europeans on Fatty Acid Desaturases Associated with Dietary Changes.
        Mol. Biol. Evol. 2017; 34: 1307-1318
        • Mathieson S.
        • Mathieson I.
        FADS1 and the Timing of Human Adaptation to Agriculture.
        Mol. Biol. Evol. 2018; 35: 2957-2970
        • The 1000 Genomes Project Consortium
        A global reference for human genetic variation.
        Nature. 2015; 526: 68-74
        • Chang C.C.
        • Chow C.C.
        • Tellier L.C.
        • Vattikuti S.
        • Purcell S.M.
        • Lee J.J.
        Second-generation PLINK: rising to the challenge of larger and richer datasets.
        GigaScience. 2015; 4: 7
        • Patterson N.
        • Price A.L.
        • Reich D.
        Population Structure and Eigenanalysis.
        PLoS Genetics. 2006; 2: e190
        • Peter B.M.
        Admixture, population structure, and f-statistics.
        Genetics. 2016; 202: 1485-1501
        • Reich D.
        • Thangaraj K.
        • Patterson N.
        • Price A.L.
        • Singh L.
        Reconstructing Indian population history.
        Nature. 2009; 461: 489-494
        • Speidel L.
        • Forest M.
        • Shi S.
        • Myers S.R.
        A method for genome-wide genealogy estimation for thousands of samples.
        Nat. Genet. 2019; 51: 1321-1329
        • Kelleher J.
        • Etheridge A.M.
        • McVean G.
        Efficient coalescent simulation and genealogical analysis for large sample sizes.
        PLoS Comput. Biol. 2016; 12: e1004842
        • Speed D.
        • Holmes J.
        • Balding D.J.
        Evaluating and improving heritability models using summary statistics.
        Nat. Genet. 2020; 52: 458-462
        • Kals M.
        • Nikopensius T.
        • Läll K.
        • Pärn K.
        • Tõnis Sikka T.
        • Suvisaari J.
        • Salomaa V.
        • Ripatti S.
        • Palotie A.
        • Metspalu A.
        • et al.
        Advantages of genotype imputation with ethnically matched reference panel for rare variant association analyses.
        bioRxiv. 2019; https://doi.org/10.1101/579201