Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The fine-scale genetic structure of the British population

Abstract

Fine-scale genetic variation between human populations is interesting as a signature of historical demographic events and because of its potential for confounding disease studies. We use haplotype-based statistical methods to analyse genome-wide single nucleotide polymorphism (SNP) data from a carefully chosen geographically diverse sample of 2,039 individuals from the United Kingdom. This reveals a rich and detailed pattern of genetic differentiation with remarkable concordance between genetic clusters and geography. The regional genetic differentiation and differing patterns of shared ancestry with 6,209 individuals from across Europe carry clear signals of historical demographic events. We estimate the genetic contribution to southeastern England from Anglo-Saxon migrations to be under half, and identify the regions not carrying genetic material from these migrations. We suggest significant pre-Roman but post-Mesolithic movement into southeastern England from continental Europe, and show that in non-Saxon parts of the United Kingdom, there exist genetically differentiated subgroups rather than a general ‘Celtic’ population.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Clustering of the 2,039 UK individuals into 17 clusters based only on genetic data.
Figure 2: European ancestry profiles for the 17 UK clusters.
Figure 3: Major events in the peopling of the British Isles.

Similar content being viewed by others

Accession codes

Data deposits

Genotype data, as well as location information at county level (or aggregated across counties where there are small numbers of samples associated with a particular county), will be made available by the WTCCC access process, via the European Genotype Archive (https://www.ebi.ac.uk/ega/) under accession numbers EGAS00001000672 and EGAD00010000632.

References

  1. Cardon, L. R. & Bell, J. I. Association study designs for complex diseases. Nature Rev. Genet. 2, 91–99 (2001)

    Article  CAS  PubMed  Google Scholar 

  2. Marchini, J., Cardon, L. R., Phillips, M. S. & Donnelly, P. The effects of human population structure on large genetic association studies. Nature Genet. 36, 512–517 (2004)

    Article  CAS  PubMed  Google Scholar 

  3. Bodmer, W. & Bonilla, C. Common and rare variants in multifactorial susceptibility to common diseases. Nature Genet. 40, 695–701 (2008)

    Article  CAS  PubMed  Google Scholar 

  4. Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes (Princeton Univ. Press, 1994)

    MATH  Google Scholar 

  5. Quintana-Murci, L. et al. Genetic evidence of an early exit of Homo sapiens sapiens from Africa through eastern Africa. Nature Genet. 23, 437–441 (1999)

    Article  CAS  PubMed  Google Scholar 

  6. Conrad, D. F. et al. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 38, 1251–1260 (2006)

    Article  CAS  PubMed  Google Scholar 

  7. The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  8. Botigué, L. R. et al. Gene flow from North Africa contributes to differential human genetic diversity in southern Europe. Proc. Natl Acad. Sci. USA 110, 11791–11796 (2013)

    Article  ADS  PubMed  PubMed Central  Google Scholar 

  9. Ralph, P. & Coop, G. The geography of recent genetic ancestry across Europe. PLoS Biol. 11, e1001555 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Hellenthal, G., Auton, A. & Falush, D. Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  11. The Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)

  12. O’Dushlaine, C. T. et al. Population structure and genome-wide patterns of variation in Ireland and Britain. Eur. J. Hum. Genet. 18, 1248–1254 (2010)

    Article  PubMed  PubMed Central  Google Scholar 

  13. Winney, B. et al. People of the British Isles: preliminary analysis of genotypes and surnames in a UK-control population. Eur. J. Hum. Genet. 20, 203–210 (2012)

    Article  PubMed  Google Scholar 

  14. The International Multiple Sclerosis Genetics Consortium & The Wellcome Trust Case Control Consortium 2. Genetic risk and a primary role for cell-mediated immune mechanisms in multiple sclerosis. Nature 476, 214–219 (2011)

  15. Lawson, D. J., Hellenthal, G., Myers, S. & Falush, D. Inference of population structure using dense haplotype data. PLoS Genet. 8, e1002453 (2012)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Pirinen, M., Donnelly, P. & Spencer, C. C. A. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies. Ann. Appl. Stat. 7, 369–390 (2013)

    Article  MathSciNet  Google Scholar 

  18. Wilson, J. F. et al. Genetic evidence for different male and female roles during cultural transitions in the British Isles. Proc. Natl Acad. Sci. USA 98, 5078–5083 (2001)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  19. Capelli, C. et al. A Y chromosome census of the British Isles. Curr. Biol. 13, 979–984 (2003)

    Article  CAS  PubMed  Google Scholar 

  20. Goodacre, S. et al. Genetic evidence for a family-based Scandinavian settlement of Shetland and Orkney during the Viking periods. Heredity 95, 129–135 (2005)

    Article  CAS  PubMed  Google Scholar 

  21. Wells, R. S. et al. The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc. Natl Acad. Sci. USA 98, 10244–10249 (2001)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  22. Hellenthal, G. et al. A genetic atlas of human admixture history. Science 343, 747–751 (2014)

    Article  CAS  ADS  PubMed  PubMed Central  Google Scholar 

  23. Genetic Analysis of Psoriasis Consortium & the Wellcome Trust Case Control Consortium 2. A genome-wide association study identifies new psoriasis susceptibility loci and an interaction between HLA-C and ERAP1. Nature Genet. 42, 985–990 (2010)

  24. Pritchard, J. K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000)

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Bodmer, J. G. in Population Structure and Genetic Disorders (eds Eriksson, A. W., Forsius, H., Nevanlinna, H. R., Workman, P. L. & Norio, R. K. ) 211–238 (Academic Press, 1980)

    Google Scholar 

  26. Howie, B. N., Donnelly, P. & Marchini, J. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLoS Genet. 5, e1000529 (2009)

    Article  PubMed  PubMed Central  Google Scholar 

  27. Wright, S. Isolation by distance. Genetics 28, 114–138 (1943)

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Rousset, F. Genetic Structure and Selection in Subdivided Populations (Princeton University Press, 2004)

    Google Scholar 

  29. Lawson, C. L. & Hanson, R. J. Solving Least Squares Problems (Reprinted by the Society for Industrial and Applied Mathematics, 1995)

    Book  Google Scholar 

  30. Bhatia, G., Patterson, N., Sankararaman, S. & Price, A. L. Estimating and interpreting FST: the impact of rare variants. Genome Res. 23, 1514–1521 (2013)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Pritchard, J. K. & Przeworski, M. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69, 1–14 (2001)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008)

    Article  PubMed  PubMed Central  Google Scholar 

  33. Moorjani, P. et al. The history of African gene flow into Southern Europeans, Levantines, and Jews. PLoS Genet. 7, e1001373 (2011)

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003)

    CAS  PubMed  PubMed Central  Google Scholar 

  35. Patterson, N. et al. Ancient admixture in human history. Genetics 192, 1065–1093 (2012)

    Article  PubMed  PubMed Central  Google Scholar 

  36. National Records of Scotland. 2011 Census: Digitised Boundary Data (Scotland). [computer file]. UK Data Service Census Support. Downloaded from: http://edina.ac.uk/census

  37. Northern Ireland Statistics and Research Agency. 2011 Census: Digitised Boundary Data (Northern Ireland). [computer file]. UK Data Service Census Support. Downloaded from: http://edina.ac.uk/census

  38. Office for National Statistics. 2011 Census: Digitised Boundary Data (England and Wales). [computer file]. UK Data Service Census Support. Downloaded from: http://edina.ac.uk/census

  39. European maps. Sourced from Eurostat, copyright EuroGeographics for the administrative boundaries http://ec.europa.eu/eurostat/web/gisco/geodata/reference-data/administrative-units-statistical-units

Download references

Acknowledgements

We thank J. Cheshire for his advice. We thank the UK Office for National Statistics, the National Records of Scotland, and the Northern Ireland Statistics and Research Agency for providing the boundaries used for the UK maps. We note that census output is Crown copyright and is reproduced with the permission of the Controller of HMSO and the Queen’s Printer for Scotland. We further acknowledge the provision of maps from Eurostat, which are copyright EuroGeographics for the administrative boundaries. We acknowledge support from the Wellcome Trust (072974/Z/03/Z, 088262/Z/09/Z, 075491/Z/04/Z, 075491/Z/04/A, 075491/Z/04/B, 090532/Z/09/Z, 084818/Z/08/Z, 095552/Z/11/Z, 085475/Z/08/Z, 098387/Z/12/Z, 098386/Z/12/Z), the Academy of Finland (257654) and the Australian National Health and Medical Research Council (APP1053756). P.D. was supported in part by a Wolfson-Royal Society Merit Award.

Author information

Authors and Affiliations

Authors

Consortia

Contributions

W.B. conceived and directed the PoBI project. P.D. directed the analysis and sample genotyping. B.W., A.B., T.D., K.H., E.C.R. and W.B. collected the UK (PoBI) samples and extracted DNA. IMSGC provided the European samples’ genotypes and geographical information. Sample genotyping and quality control was performed by WTCCC2 for both the UK and European genotype data. S.L., G.H., S.M. and P.D. performed the major analyses with contributions from B.W., D.D., D.J.L., D.F., C.F., M.R., M.P. and W.B. M.R. and B.C. provided historical and archaeological information and context. G.H. made Extended Data Fig. 2. S.L. produced all the other figures. P.D., S.L., B.W., G.H., S.M., M.R. and W.B. wrote the manuscript. All authors reviewed the manuscript.

Corresponding author

Correspondence to Peter Donnelly.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Extended data figures and tables

Extended Data Figure 1 The effect of setting a threshold on the confidence of cluster assignment for the genetic clusters in the UK inferred by the fineSTRUCTURE analysis.

The UK map depicts the clustering of the 2,039 UK individuals into 17 clusters on the basis of genetics alone. See Fig. 1 for further details. Here a threshold is set on the measurement of confidence used for assigning individuals to clusters (see Methods). This measure is defined on the interval [0, 1], where the value 1 is interpreted as meaning complete certainty of cluster assignment and 0 as being complete lack of certainty. The plot illustrates the effect of setting a threshold of 0.7 so that a UK individual is only assigned to a cluster if the measure of assignment for that individual is greater than 0.7. All of the samples that have small, faded symbols are assigned to their clusters with confidence greater than 0.7. Those samples for which the assignment is less confident (that is, the measure is less than or equal to 0.7) are plotted with large, bold symbols. The table shows the number of individuals with confidence measure above and below the 0.7 threshold together with the total for each UK cluster. The slight discrepancy between the totals in this figure and Supplementary Information Fig. 1.16 is due to differences in the method for assigning individuals to clusters (see Methods). The threshold of 0.7 was chosen for illustrative purposes only. Similar patterns relate to other thresholds. Contains OS data © Crown copyright and database right 2012. © EuroGeographics for some administrative boundaries.

Extended Data Figure 2 Convergence of the algorithm implemented in fineSTRUCTURE.

The fineSTRUCTURE clustering algorithm was run twice on the UK samples (a) and twice on the European samples (b) to assess convergence. The displayed heatmap depicts the proportion of sampled MCMC iterations for which each pair of UK individuals is assigned to the same cluster. The values above and below the diagonal represent two different runs of fineSTRUCTURE. Individuals are ordered along each axis according to the inferred tree from the fineSTRUCTURE run above the diagonal, with tick marks on the axes at the middle of each cluster. Comparison between runs is made by comparing the plot above the diagonal (run two) with that below the diagonal (run one). The high degree of symmetry in the plot confirms the similarity between the runs and hence that each MCMC run has converged to very similar clusters.

Extended Data Figure 3 Application of standard methods for detecting population structure to the UK data.

a, Genome-wide principal component analysis of the UK samples. The UK samples plotted against all pairs of principal component axes, for the first five axes, as determined in the genome-wide principal components analysis. Each individual is depicted by a symbol representing the district from which it was collected. The labels of the sample collection districts are interpreted as follows: CUM, Cumbria; LIN, Lincolnshire; NEA, north east England; OXF, Oxfordshire; YOR, Yorkshire; CHE, Cheshire; NTH, Northamptonshire; NOT, Nottinghamshire; DOR, Dorset; SUS, Sussex; NOR, Norfolk; WOR, Worcestershire; DEV, Devon; SPE, south Pembrokeshire; COR, Cornwall; NWA, north Wales; ARG, Argyle and Bute; NPE, North Pembrokeshire; BAN, Banff and Buchan; NIR, Northern Ireland; ORK, Orkney; SUF, Suffolk; LEI, Leicestershire; FOD, Forest of Dean; HER, Herefordshire; HAM, Hampshire; DER, Derbyshire; LAN, Lancashire; KEN, Kent; GLO, Gloucestershire. b, Clustering the UK samples using the program ADMIXTURE. ADMIXTURE was applied in three scenarios, corresponding to different preset values for K, the number of clusters into which the UK samples are divided. Here K = 2, 3 and 17 (see Methods). A map is shown for each value of K. Each symbol on the map corresponds to one of the sampled individuals and is plotted at the centroid of their grandparents’ birthplace. Each cluster is represented by a unique combination of colour and plotting symbol, with individuals depicted with the symbol of the cluster to which they are assigned. The ellipses centred on each cluster give a sense of the extent of the cluster by showing the 90% probability region of the two-dimensional t-distribution (5 degrees of freedom) which best fits the locations of the individuals in the cluster. Contains OS data © Crown copyright and database right 2012. © EuroGeographics for some administrative boundaries.

Extended Data Figure 4 Potential recent shared ancestry in the genetic clusters in the UK inferred by the fineSTRUCTURE analysis.

a, The UK map on the left depicts the clustering of the 2,039 UK individuals into 17 clusters on the basis of genetics alone. See Fig. 1 for further details. Pairwise identity by descent (IBD) within clusters and across the whole UK sample for all of the 2,039 UK individuals is shown to the right. For each of the inferred UK clusters a box and whisker plot shows the distribution of the pairwise IBD statistic (see Methods). Each box is filled by the colour of the cluster to which it relates, and the outlier points have the same shape as the cluster to which they relate. For comparison the distribution of the pairwise IBD statistic across the whole UK sample is shown on the far right, with the box coloured grey. The light grey horizontal lines indicate the upper and lower quartiles of the IBD statistic’s distribution for the whole UK sample. Along the x axis the number of samples in the associated cluster is shown. The y axis gives the value of the pairwise IBD statistic. b, The same information as a but with 53 clusters of UK individuals. Note that only clusters of size 4 or less depart substantially from the average relatedness. Contains OS data © Crown copyright and database right 2012. © EuroGeographics for some administrative boundaries.

Extended Data Figure 5 Population structure in the European samples.

a, Number of samples derived from each European sampling region. The 6,209 European samples used for the analyses were sampled from ten countries and various locations within each country. Each sample has a specific sampling location (often a city, but in some cases only a whole country). The numbers shown give the number of samples derived from a particular location. Some numbers are depicted out of position for clarity. In these cases a line leads from the number to the actual location. Where the sample locations are well-localized (for example, the city of sampling is known) the box surrounding the number is white. When only information about the country of sampling is known the box is coloured yellow. The numbers are overlain on a faded version of the pie charts from panel b for easy reference. b, European population structure inferred by fineSTRUCTURE. The 6,209 European samples divided into 51 genetic groups (represented by colours and labelled with a subset of the numbers between 1 and 145) using fineSTRUCTURE. For clarity the colour space has been skewed to emphasize the differences between groups 1 to 18 as these groups are the major contributors to the ancestry profiles of the UK clusters. Each sample has a specific sampling location (often a city, but in some cases only a country, see panel a). The pie charts are located at these sampling locations, and depict the proportion of the samples from that location assigned to each of the 51 genetic groups. Each genetic group also has a label number, which is displayed for the larger sectors of each of the pie charts. The area of the pie chart is proportional to the number of samples from that location. Pie charts with black borders correspond to well-localized samples. In contrast, for samples where only the country of sampling is known, they are combined in a single pie chart for the country, which is shown with white borders. Some pie charts are depicted out of position for clarity; in these cases a line leads from the chart to the actual location. © EuroGeographics for the administrative boundaries.

Extended Data Figure 6 European ancestry profiles of the UK clusters.

a, The map of the UK shown relates to the map with 17 UK clusters shown in Fig. 1. Ellipses indicate the extent of the UK clusters as in Fig. 1. The pie charts represent the ancestry profile of the UK clusters from Fig. 1. Each pie chart is plotted at the centroid of the corresponding cluster, although some pie charts have been moved for clarity; in the cases where the relocation is substantial a red line leads from the pie chart to the centroid. The sectors of the pie charts are coloured with the colours of the European genetic groups (for the larger sectors the number of the European group is also given). They indicate the ancestry profiles of each UK cluster, namely the proportion of the cluster ancestry that is best represented by each of the European groups. The magnitude of the angle of a sector is proportional to the contribution of that European group to the ancestry profile of the associated UK cluster. The symbols in the grey bar to the left of the map represent the UK clusters as in Fig. 1. The bar chart in the left part of the plot depicts the same ancestry profiles of the UK clusters in a different way. Each row represents a UK cluster (arranged roughly north to south) with the symbols for the clusters from Fig. 1 indicated at each end of the row. Each column represents a European group, with group numbers listed with a three letter prefix that, for clarity, relates to the country or countries where the cluster is most represented. The colour of each bar also indicates the European group to which the bar relates. Confidence intervals (95%) obtained from 1,000 bootstraps of the ancestry profile analysis (see Methods) are indicated on each bar. b, Renormalized ancestry profiles of the UK clusters illustrating possible early European contributions to the UK population. A representation of the relative contributions to the UK clusters from the three European groups (GER6-W. Germany, BEL11- Belgium, and FRA14-NW France) hypothesized to be the major contributors to the earliest migrations into the UK after the last ice age from which DNA survives to the present in substantial proportions (see Supplementary Note). Interpretation of the map, pie charts and bar chart is as for a. In this case, however, the proportions were renormalized to sum to 1 for the contributions from GER6, BEL11 and FRA14. Contains OS data © Crown copyright and database right 2012. © EuroGeographics for some administrative boundaries.

Extended Data Figure 7 More major events in the peopling of the British Isles.

See Supplementary Note for further details. a, The arrival of agriculture and subsequent migrations from 4000–2500 bc. b, The major iron age tribes of Britain around the year 40. © EuroGeographics for the administrative boundaries (coastlines).

Extended Data Figure 8 Application of GLOBETROTTER to infer simulation of ancestry 40 generations ago between groups from Northern Germany (GER3, 25%) and Italy (ITA36, 75%).

Twenty-five admixed individuals were simulated, and the individuals used to construct these simulated individuals were then removed from the list of potential donors (see Methods). Left barplot, and map: the barplot shows the true population and proportion contributed for each of the two admixing groups. The map shows, for each of the European sampling locations, the true proportion of individuals sampled from that location assigned to each of the admixing groups, coloured according to the barplot. Central three plots: example curves constructed by GLOBETROTTER to infer admixture times, and infer details of admixing groups (see Methods and Supplementary Note). For each pair of populations A and B (A can be the same as or different from B) the points show the empirical probability, relative to under independence, as a function of genetic distance x, that two positions separated by distance x correspond to ancestry donated by population A, and by population B, respectively. The green line shows GLOBETROTTER fitted exponential decay curves for the underlying (that is, expected) value of this relative probability estimate. Under a model of a single admixture event occurring g generations ago, this probability decays at a rate g according to theory, providing an estimate of the admixture time (and 95% CI) shown overlaying the curve ITA36 versus GER3. If ancestries A and B associate with the same admixing group, for example, whenever A = B the fitted curve will have a negative slope, as seen for the GER3 versus GER3 plot. If a positive slope is seen, as for the ITA36 versus GER3 plot, this implies these populations contribute to the two different respective admixing groups. Right bar-plot, and map: GLOBETROTTER produces an inference of the genetic composition of (haplotypes carried by) the two admixing groups, as a mixture of (haplotypes carried by) populations actually sampled. This mixture inference jointly uses curves for pairs of sampled populations, and the overall haplotypic makeup of different sampled populations, including the admixed group. The bar-plot shows the inferred mixture representation (dominated in each case by the true admixing groups) and estimated admixture proportion (24%, close to the truth of 25%), with more red/blue populations respectively giving a larger contribution. The map shows populations inferred as contributing to the first (pink/red shades) or second (blue shades) admixing group, respectively, as for the left map, with populations coloured according to the bar-plot. This shows populations falsely inferred as contributing material to the admixing groups were still sampled, mainly, from locations close to those of the true admixing groups. We caution that in this setting of admixture between genetically similar European groups, estimation of admixture fraction is very uncertain (see Methods) (for example, contributing populations are often impossible to definitively assign to a ‘side’ of the event). For further details of the analysis, for example, tests for admixture presence in this simulation, see Methods and Supplementary Note. © EuroGeographics for the administrative boundaries.

Extended Data Figure 9 Application of GLOBETROTTER to infer details of admixture in the UK clusters.

a, Inferring admixture in a population of 1,044 UK individuals from central and southern England. Left hand plot: the bold red squares show mean grandparental birthplace for each individual in this cluster. Central three plots: example curves constructed by GLOBETROTTER to infer admixture times, and infer details of admixing groups (see Methods and Supplementary Note). For each pair of populations A and B (A can be the same as or different from B) the points show the empirical probability, relative to under independence, as a function of genetic distance x, that two positions separated by distance x correspond to ancestry donated by population A, and by population B, respectively. The green line shows GLOBETROTTER fitted exponential decay curves for the underlying (that is, expected) value of this relative probability estimate. Under a model of a single admixture event occurring g generations ago, this probability decays at a rate g according to theory, providing an estimate of the admixture time (and 95% CI) shown overlaying curves SFS31 versus GER3 and SFS31 versus SFS31. If ancestries A and B associate with the same admixing group, for example, whenever A = B the fitted curve will have negative slope, as seen for the GER3 versus GER3 plot. If a positive slope is seen, as for the SFS31 versus GER3 plot, this implies these populations contribute to the two different respective admixing groups. Right bar-plot, and map: GLOBETROTTER inference shows one possibility for the genetic composition of (haplotypes carried by) the two unsampled historical admixing groups, as a mixture of (haplotypes carried by) populations actually sampled. This mixture inference jointly uses curves for pairs of sampled populations, and the overall haplotypic makeup of different sampled populations, including the admixed group. The bar-plot shows the inferred mixture representation (with largest contributions in each case by GER3/DEN18, sampled most frequently from northern Germany and Denmark, and SFS31/ITA52, sampled mainly from southern France and Spain and northern Italy) and estimated admixture proportion (34%), with more intense red/blue populations respectively implying a larger contribution. The map shows populations inferred as contributing to the first (pink/red shades) or second (blue shades) admixing group respectively, with populations coloured according to the bar-plot. We caution that in this setting of admixture between genetically similar European groups, estimation of admixture fraction is very uncertain (see Methods and Supplementary Note) (for example, contributing populations are often impossible to definitively assign to a side of the event), so that other closely related scenarios, for example, a somewhat lower admixture fraction from a more completely ‘GER3’-like group than that inferred, are likely consistent with the GLOBETROTTER results seen. b, Inferring admixture in a population of 51 UK individuals from Orkney. Left hand plot: the bold purple squares show mean grandparental birthplace for each individual in this cluster. Central three plots: example curves constructed by GLOBETROTTER to infer admixture times, and infer details of admixing groups (see Methods and Supplementary Note). For each pair of populations A and B (A can be the same as or different from B) the points show the empirical probability, relative to under independence, as a function of genetic distance x, that two positions separated by distance x correspond to ancestry donated by population A, and by population B, respectively. The green line shows GLOBETROTTER fitted exponential decay curves for the underlying (that is, expected) value of this relative probability estimate. Under a model of a single admixture event occurring g generations ago, this probability decays at a rate g according to theory, providing an estimate of the admixture time (and 95% CI) shown overlaying curves NOR90 versus FRA12 and NOR90 versus NOR90. If ancestries A and B associate with the same admixing group, for example, whenever A = B the fitted curve will have negative slope, as seen for the NOR90 versus NOR90 plot. If a positive slope is seen, as for the NOR90 versus FRA12 plot, this implies these populations contribute to the two different respective admixing groups. Right bar-plot, and map: GLOBETROTTER inference shows one possibility for the genetic composition of (haplotypes carried by) the two unsampled historical admixing groups, as a mixture of (haplotypes carried by) populations actually sampled. This mixture inference jointly uses curves for pairs of sampled populations, and the overall haplotypic makeup of different sampled populations, including the admixed group. The bar-plot shows the inferred mixture representation (with largest contribution in each case by GER3/NOR90, sampled most frequently from northern Germany and Norway, and FRA12/FRA14, both sampled mainly from France) and estimated admixture proportion (42%), with more intense red/blue populations respectively implying a larger contribution. The map shows populations inferred as contributing to the first (pink/red shades) or second (blue shades) admixing group respectively, with populations coloured according to the bar-plot. We caution that in this setting of admixture between genetically similar European groups, estimation of admixture fraction is very uncertain (see Methods and Supplementary Note) (for example, contributing populations are often impossible to definitively assign to a ‘side’ of the event). In particular, inspection of curves involving GER3 does not yield a clear ‘side’ of the event for this population, unlike the NOR90 versus FRA12 case that implies French-like and Norwegian-like haplotype presence must occur mainly in distinct admixing groups. Therefore the GER3 component might in fact capture haplotypes for either (or both) the French-like or Norwegian-like admixing groups, and the inferred scenario shows only one possibility. Contains OS data © Crown copyright and database right 2012. © EuroGeographics for some administrative boundaries.

Related audio

Supplementary information

Supplementary Information

This file contains Supplementary Notes, Supplementary References and Supplementary Figure 1. (PDF 23911 kb)

Supplementary Table 1

This table contains pairwise FST values for the UK sample collection districts. For each of the 30 UK sample collection districts the table gives the pairwise FST value. The standard errors on these estimates (not shown for clarity of exposition) have a mean of 0.0001 and a maximum of 0.0003. The labels of the sample collection districts are interpreted as follows: CUM = Cumbria; LIN = Lincolnshire; NEA = North East England; OXF = Oxfordshire; YOR = Yorkshire; CHE = Cheshire; NTH = Northamptonshire; NOT = Nottinghamshire; DOR = Dorset; SUS = Sussex; NOR = Norfolk; WOR = Worcestershire; DEV = Devon; SPE = South Pembrokeshire; COR = Cornwall; NWA = North Wales; ARG = Argyle and Bute; NPE = North Pembrokeshire; BAN = Banff and Buchan; NIR = Northern Ireland; ORK = Orkney; SUF = Suffolk; LEI = Leicestershire; FOD = Forest of Dean; HER = Herefordshire; HAM = Hampshire; DER = Derbyshire; LAN = Lancashire; KEN = Kent; GLO = Gloucestershire. (XLSX 50 kb)

Supplementary Table 2

This table contains pairwise FST values for the UK clusters. For each of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the pairwise FST value. The standard errors on these estimates (not shown for clarity of exposition) have a mean of 0.0001 and a maximum of 0.0003. (XLSX 38 kb)

Supplementary Table 3

This table shows robustness of the inferred UK clusters. For each pair of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the total variation distance between the copying vectors (TVDCV) associated with the pair (see Methods for details). The TVDCV statistic is interpreted as a measure of the differentiation of the pair of clusters, based on genetic ancestry. Using the TVDCV statistic, one can calculate the p-value from a permutation test of the null hypothesis that, given the cluster sizes, the individuals in the two clusters are assigned randomly to each cluster. Based on 1,000 permutations for each pair, all the pairwise comparisons of clusters give p-values below 0.001, confirming that the actual clusters are capturing real ancestry differences. (XLSX 42 kb)

Supplementary Table 4

This table contains European ancestry profiles of the UK clusters. For each of the 17 UK clusters used in the main analysis (rows, labelled approximately from north to south) the table gives the ancestry profile point estimates (with 95% confidence intervals derived by bootstrapping shown in brackets) for the 20 of the 51 groups obtained in the European clustering analysis (columns, labelled by European group number): those that contribute at least 1% to the ancestry profile of at least one UK cluster are shown. (XLSX 64 kb)

Supplementary Table 5

This table contains differences between the ancestry profiles of the UK clusters. For each pair of the 17 UK clusters used in the main analysis (labelled approximately from north to south) the table gives the total variation distance between the ancestry profiles (TVDAP) associated with the pair (see Methods for details). The TVDAP statistic is interpreted as a measure of the differentiation of the pair of clusters, based on genetic ancestry. Using the TVDAP statistic, one can calculate the p-value from a permutation test of the null hypothesis that, given the cluster sizes, the individuals in the two clusters are assigned randomly to each cluster. The calculated p-values, based on 1,000 permutations for each pair, are shown in brackets. (XLSX 42 kb)

Supplementary Table 6

This table shows robustness of the ancestry profiles. The table gives the inferred ancestry profiles for 18 clusters, simulated under various demographic scenarios and using two different simulation approaches (here labelled ‘Real Data’ and ‘Forwards’, see Methods for details). Each simulation assumes the cluster is the result of a single admixture of two populations (samples from which are derived from the clusters we used in our main analyses), in the proportions given (50:50; 25:75; 10:90; labelled 50, 25, 10 respectively). For each of the simulated clusters the table gives the ancestry profile point estimates (with 95% confidence intervals derived by bootstrapping shown in brackets) for the 51 groups obtained in the European clustering analysis (columns, labelled by European group number). See Methods and Supplementary Note for more details. (XLSX 56 kb)

Supplementary Table 7

This table contains correlations between European groups’ contributions to the UK ancestry profiles. Displayed are pairwise correlations (Pearson’s r) of each European group’s contributions to the ancestry profiles of each of the 17 UK clusters used in our main analysis. Here only values for European groups that contribute at least 1% to the ancestry profile of at least one UK cluster are shown. a, Ordered by European group numbers. b, Grouped into clusters according to similar patterns of correlation coefficients. Note that there are various scenarios which can give rise to these correlations, so that strong correlations between contributions from two European groups do not necessarily imply that the two groups contributed ancestry through the same migration event(s) (see Methods and Supplementary Note for examples of this). (XLSX 52 kb)

PowerPoint slides

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leslie, S., Winney, B., Hellenthal, G. et al. The fine-scale genetic structure of the British population. Nature 519, 309–314 (2015). https://doi.org/10.1038/nature14230

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nature14230

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing