Volume 51, Issue 3 p. 391-408
Full Paper
Free Access

Assessing selection signatures within and between selected lines of dual-purpose black and white and German Holstein cattle

S. Naderi

S. Naderi

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Search for more papers by this author
M. H. Moradi

M. H. Moradi

Department of Animal Sciences, Arak University, Shahid Beheshti Street, Arak, Iran

Search for more papers by this author
M. Farhadian

M. Farhadian

Department of Animal Science, University of Tabriz, 29 Bahman Boulevard, Tabriz, Iran

Search for more papers by this author
T. Yin

T. Yin

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Search for more papers by this author
M. Jaeger

M. Jaeger

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Search for more papers by this author
C. Scheper

C. Scheper

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Search for more papers by this author
P. Korkuc

P. Korkuc

Albrecht Daniel Thaer Institute for Agricultural and Horticultural Sciences, Humboldt University Berlin, Invalidenstr. 42, Berlin, D-10115 Germany

Search for more papers by this author
G. A. Brockmann

G. A. Brockmann

Albrecht Daniel Thaer Institute for Agricultural and Horticultural Sciences, Humboldt University Berlin, Invalidenstr. 42, Berlin, D-10115 Germany

Search for more papers by this author
S. König

Corresponding Author

S. König

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Address for correspondence

S. König, Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany.

E-mail: [email protected]

Search for more papers by this author
K. May

K. May

Institute of Animal Breeding and Genetics, Justus-Liebig University Giessen, Ludwigstr. 21b, Giessen, Germany

Search for more papers by this author
First published: 25 February 2020
Citations: 12

Summary

The aim of this study was to detect selection signatures considering cows from the German Holstein (GH) and the local dual-purpose black and white (DSN) population, as well as from generated sub-populations. The 4654 GH and 261 DSN cows were genotyped with the BovineSNP50 Genotyping BeadChip. The geographical herd location was used as an environmental descriptor to create the East-DSN and West-DSN sub-populations. In addition, two further sub-populations of GH cows were generated, using the extreme values for solutions of residual effects of cows for the claw disorder dermatitis digitalis. These groups represented the most susceptible and most resistant cows. We used cross-population extended haplotype homozygosity methodology (XP-EHH) to identify the most recent selection signatures. Furthermore, we calculated Wright’s fixation index (FST). Chromosomal segments for the top 0.1 percentile of negative or positive XP-EHH scores were studied in detail. For gene annotations, we used the Ensembl database and we considered a window of 250 kbp downstream and upstream of each core SNP corresponding to peaks of XP-EHH. In addition, functional interactions among potential candidate genes were inferred via gene network analyses. The most outstanding XP-EHH score was on chromosome 12 (at 77.34 Mb) for DSN and on chromosome 20 (at 36.29–38.42 Mb) for GH. Selection signature locations harbored QTL for several economically important milk and meat quality traits, reflecting the different breeding goals for GH and DSN. The average FST value between GH and DSN was quite low (0.068), indicating shared founders. For group stratifications according to cow health, several identified potential candidate genes influence disease resistance, especially to dermatitis digitalis.

Introduction

Selection contributes to selection signatures at specific genome segments, potentially controlling fitness (Qanbari & Simianer 2014). In this regard, selection increases the allele frequencies of beneficial mutations, but also reduces genetic variation in chromosomal segments linked to selected loci, so-called ‘selective sweeps’ (Smith & Haigh 1974). For the identification of selection signatures, the following statistical approaches have been applied: the extended haplotype homozygosity (EHH) (Sabeti et al. 2002), the integrated haplotype score (Voight et al. 2006) and the cross-population extended haplotype homozygosity (XP-EHH) (Sabeti et al. 2007). For the detection of most recent signatures of selection and recent breed differentiation, the concept requires sub-populations (Sabeti et al. 2007). Applying XP-EHH, Chen et al. (2016) identified signatures of recent selection in Chinese Holstein and Simmental cattle populations. Chromosomal segments under positive selection pressure overlapped with a set of important genes involved in biological processes. Lee et al. (2014) used XP-EHH methodology and detected 250 genes in regions with ‘outlier SNP’, including the alpha1casein (CSN1S1) and the beta-casein gene (CSN2). Both genes have a substantial impact on milk protein quality in Holstein cattle.

Assessing the variation of marker allele frequencies in different populations is a further tool to discover genome-wide signatures (Holsinger & Weir 2009). One specific strategy is to use a large number of SNP markers across the genome and to compare specimens from different populations (Gholami et al. 2014). In this regard, Wright’s fixation index (FST) is the most popular method (Wright 1949), as applied for the detection of selection signatures in several recent studies (Moradi et al. 2013; Gholami et al. 2014).

The local dual-purpose German black pied cattle (DSN) comprise the founder breed of the current German Holstein (GH) population. The DSN breed has favored a dual-purpose breeding goal, including both meat and milk traits, and adaptation to grassland systems. In contrast, in GH cattle, there has been a strong selection focus on milk yield and on dairy character over decades (Brade & Brade 2013; Jaeger et al. 2016). In consequence, selection intensities for functional traits in GH have been quite low (König et al. 2007). Selection on milk production traits in GH has been intensified through the implementation of artificial insemination schemes since 1960 (Skjervold & Langholz 1964). The breeding and selection focus on milk yield stimulated imports of specialized Holstein Friesian dairy lines from North America. A comparison of the modern GH population with its founder breed, i.e. the DSN population, is a unique opportunity for the detection of recent selection signatures. In this specific case, we hypothesize that selection signatures can be identified that are due to selection pressure on chromosomal segments associated with milk production traits.

Additionally, a further intra-breed comparison considering the current West German-DSN and East German-DSN sub-populations will contribute to a deeper understanding of selection mechanisms. From a breeding history perspective, the division of Germany after the Second World War also separated the DSN population, preventing any exchange of animals until reunification in 1990. During the period of separation, production system characteristics and breeding goal definitions differed in the two German states (Brade & Brade 2013; Jaeger et al. 2016).

The lack of precise health data hampered the inclusion of health traits into breeding goals (Gernand et al. 2013). Thus, natural selection was the only force influencing adaptive evolution over these decades. The on-farm installation of electronic health-recording systems combined with large-scale cow genotyping led to artificial health trait selection (Egger-Danner et al. 2014). As indicated by Nielsen et al. (2007), recent selection on disease resistance has influenced the pattern of specific chromosomal segments and allele frequencies of potential candidate genes. As a consequence, stratification of GH populations according to disease incidence for the most important claw disorder dermatitis digitalis (DD; Gernand et al. 2013) allows the detection of recent selection signatures for health and adaptation. For both sub-grouping strategies (i.e. geographic characteristics and health status), the XP-EHH method might be appropriate to detect recent selection influence.

A selection signature indicates functional genes contributing to complex traits (Wang et al. 2010; Evangelou et al. 2014). A complex trait, such as claw disease susceptibility, mostly relies on cumulative gene effects, as well as on multiple gene interactions in functional pathways (Eleftherohorinou et al. 2009). The term ‘interacting’ addresses genes with possible direct or indirect impact on other gene expressions, and in causality, on the biological output. For instance, in immunogenetics, the pattern of gene variants involved in a functional pathway influences the immune response intensity of a host to specific pathogens (Hill 2006).

In consequence, the aims of this study were (i) to detect selection signatures considering the GH and the DSN populations, (ii) to detect selection signatures considering sub-populations stratified according to geographical characteristics and disease incidences for DD and (iii) to infer biological pathways of genes that are located close to or within selection signature segments.

Materials and methods

Genotype samples and quality control

GH (4654 cows) and DSN (261 animals) cattle were genotyped with the BovineSNP50 Genotyping BeadChip. With regard to DSN cow selection for genotyping, we only considered cows with 100% DSN breed percentages. Genetic breed percentages were calculated utilizing our newly developed algorithm (Jaeger et al. 2018). The application of the algorithm indicated for 46% of the registered DSN cows larger gene percentages of other breeds (mainly Holstein Friesian), explaining the comparably small number of genotyped DSN cows in the present study. Nevertheless, regarding the impact of sample size for selection signature analyses, Ma et al. (2015) reported that a rather limited value (N = 30 gametes, equivalent to 15 diploid individuals) is sufficient to reach reasonable power when applying XP-EHH.

With regard to SNP quality control criteria, SNP markers with a MAF lower than 0.01, and with a large deviation from HWE (< 10−6), were excluded. All SNPs had a call rate larger than 95%. After SNP-data editing, the final dataset considered 39 917 SNPs from the 4915 cows. The software plink version 1.7 (Purcell et al. 2007) was used for the SNP filtering processes.

Population stratifications

GH and DSN cows

From the total dataset of 4654 genotyped first-lactation GH cows, we considered a random sample of 2076 from 28 different herds (because of computation time). The 28 GH herds represent the database when implementing the cow training set for genomic selection in Germany. These herds are located in the German federal states of Brandenburg and Mecklenburg–West Pomerania. Nevertheless, they represent broad spectra of herd characteristics phenotypically, genetically and genomically, in order to avoid possible genotyping by environmental interactions in genetic evaluations (Yin & König 2018). Accordingly, the 261 genotyped DSN cows represent a variety of herd and geographical descriptors (see Fig. 1 and some more description details in the paragraph ‘West-DSN and East-DSN cows’). Most obvious are phenotypic trait differences in milk yield between GH and DSN cows. The 305-day lactation level for the genotyped 2076 GH cows was 11 264 kg, but only 7009 kg for the genotyped 261 DSN cows.

Details are in the caption following the image
Distribution of German black pied cattle (DSN) herds across Germany. Symbols represent the DSN percentage per herd (indicated in the legend with <10% DSN, 10–50% DSN, 50–75% DSN, >75% DSN).

Phenotypes and genotypes are available on request for reproduction of the results upon signing of a material transfer agreement.

Healthy and sick GH cows

GH cows were allocated to two different groups according to their susceptibility/resistance to DD. In this regard, we focused on the first 200 days in milk, representing the period of natural energy deficiency after calving. At least one entry for DD implied a score of 1 (sick); otherwise, a score of 0 (healthy) was assigned. In the 200 day period, only a limited number of cows had an entry in the disease database for a new DD diagnosis, and repeated entries showed ongoing treatment after earlier diagnosis. In the next step, binary DD data were analyzed using the following generalized linear mixed model with a logit link function:
urn:x-wiley:02689146:media:age12925:age12925-math-0001
where πrst is the probability of occurrence for DD of cow t in calving-year-season r and herd s; φ is the overall mean effect; γr is the fixed effect of calving-year-season; and λs is the fixed effect of the herd.

The solutions for all cows represent the random cow effects plus further random residual effects, named DD residuals. The residuals followed a Gaussian distribution. In order to generate cow groups with obvious contrasts in disease susceptibility, cows with the most extreme DD residuals were allocated either to group A or to group B (250 cows in each group). The 5% upper tail (= group A) included the DD-resistant cows with residuals in the range from −0.88 to −0.31 (mean value −0.44). The 5% lower tail (= group B) included the DD-susceptible cows with residuals in the range from 0.75 to 0.99 (mean value 0.83).

West-DSN and East-DSN cows

DSN sub-population creation was based on the geographical location of the herd (i.e. former East vs. former West Germany). The map in Fig. 1 indicates the location of DSN herds, along with the percentage of DSN cows within herds. The majority of West-DSN farms were located in north-west Germany (East Friesland area) and around Hannover and Bremen. These small-scale herds (average herd size 49 cows) represent an intensive grazing system in the coastal marshlands. The East-DSN sub-population was a random sample from one large-scale herd (herd size 856 cows), representing an indoor production system. The West-DSN sub-population included 172 cows and the East-DSN sub-population 89 cows.

Pedigree-based genetic relationships between and within populations and sub-populations were calculated using the software package cfc (Sargolzaei et al. 2006). For further illustrations regarding genomic compositions and population stratification, a PCA was carried out (R package SNPRelate, Zheng et al. 2012).

Identification of selection signatures

As introduced by Sabeti et al. (2007), XP-EHH was performed to identify recent signatures of selection. XP-EHH is based on EHH values and evaluates LD decay across the genome. For a bi-allelic SNP with alleles A and a, EHH is defined as follows (Sabeti et al. 2002):
urn:x-wiley:02689146:media:age12925:age12925-math-0002
where nA and na are the numbers of haplotypes with alleles A and a respectively, ni is the count for the ith haplotype within a sub-population and hx represents the number of distinct haplotypes in a genomic region up to a distance x from the core locus. All SNP markers located in an interval of 1 Mb in both directions from a given core SNP were considered. Afterwards, EHH was integrated within these bounds (for the entire interval from the core SNP up to the distance x) considering both sub-populations. In this regard, calculation of XP-EHH followed the principles defined by Sabeti et al. (2007). Extreme positive values reflect selection in sub-population 1, and extreme negative values selection in sub-population 2. The software fastphase version 1.2.3 (Scheet & Stephens 2006) was used to reconstruct haplotypes for each chromosome in both sub-populations, because phased haplotypes are required to calculate XP-EHH.
According to MacEachern et al. (2009), the FST was calculated as follows:
urn:x-wiley:02689146:media:age12925:age12925-math-0003
where HT and HS are the expected heterozygosities for the overall total population and for sub-populations respectively.

Gene annotation and functional gene analysis

Gene annotation from the Ensembl database (www.ensembl.org/biomart/martview) focussed on chromosomal segments close to peaks for extreme XP-EHH values. The ‘peak definition’ corresponded to XP-EHH values beyond the upper and lower 1% of the observed genome-wide distribution of XP-EHH. A window of 250 kbp downstream and upstream of each core SNP was considered for potential candidate gene identifications. Such window is quite large, but was considered by Maiorano et al. (2018) for Angus beef cattle and by Klein et al. (2019) for Holstein dairy cattle. Additionally, the cattle QTL database (https://www.animalgenome.org/cgi-bin/QTLdb/BT/search) was used to compare the detected regions with previously identified QTL. The QTL database contains cattle QTL from previous pubications, allowing QTL search by specified chromosomal locations. Hence, we focussed on a comparison and discussion considering cattle QTL as identified for specific traits with detected regions under positive selection.

In addition, we studied functional interactions among proteins encoded by candidate genes. In this regard, we used the STRING database (https://string-db.org/). This database stores information for experimentally validated protein interactions and protein networks (Szklarczyk et al. 2016). Constructed networks were clustered using the K-means algorithm, in order to define functional modules. In this regard, we tested different cluster numbers ranging from 2 to 10. Based on the size of clusters, and considering functional relativeness, three clusters was the optimal number for all networks.

Results

Genetic relationships

The average coefficient of pedigree based relationships was 0.043 (±0.035) for the total GH population, and 0.049 (±0.039) for the healthy and 0.047 (±0.034) for the sick GH sub-populations. With regard to dual-purpose DSN cows, the average pedigree-based relationship coefficient was 0.021 (±0.046) for the total DSN dataset, 0.089 (±0.064) for the East- and 0.028 (±0.049) for the West-DSN sub-populations. The average pedigree-based relationship between the sick and healthy GH sub-populations was 0.041 (±0.027) and 0.0001 (±0.0001) between the GH and DSN populations, and 0.038 (±0.029) between East- and West-DSN sub-populations.

Selection signatures

GH and DSN populations

The Manhattan plot for standardized XP-EHH scores across the genome is shown in Fig. 2. Negative XP-EHH values reflect selection in the DSN population. According to XP-EHH scores and a threshold for the top 0.1 percentile for negative XP-EHH scores (corresponds to XP-EHH <−2.9), we found evidence of selection on 14 different chromosomes (Fig. 2). An extremely negative value for XP-EHH (XP-EHH = −4.09) was identified on chromosome 12 at 77.34 Mb. In addition, negative XP-EHH scores were calculated for chromosome 9 from 13.63 to 13.94 Mb, and for chromosome 23 from 5.21 to 5.52 Mb. The regions under positive selection and the annotated genes considering the window 250 kb downstream and upstream of each core SNP are presented for DSN in Table 1.

Details are in the caption following the image
Extended haplotype homozygosity methodology (XP-EHH) scores for each SNP for the German Holstein (GH; positive values) and the DSN population (negative values).
Table 1. The chromosomal segments under positive selection in the German black pied cattle (DSN) population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
1 150.61–151.12 2 CHAF1B, CLDN14, SIM2, HLCS, HGNC, RIPPLY3
3 24.28 1 WARS2, TBX15
4 0.25, 19.89–19.85 3 THSD7A, TMEM106B, VSTM2A
5 0.65, 6.49, 7.17 3 TSPAN8, NAV3, ZDHHC17, CSRP2, E2F7
6 97.58 1 C4orf22, BMP3, PRKG2
8 11.00 1 SCARA5, PBK, ESCO2, CCDC25, SCARA3, CLU, TMEM215
9 13.63–13.94 4 CD109
10 87.99 1 C14orf1, TTLL5, TGFB3
11 89.23 1
12 6.76, 70.06–70.09, 77.31–77.61 12 LOC515333, UGGT2
14 64.58, 53.02 2 RRM2B, CSMD3
15 25.18 1 C15H11orf71, RBM7, REXO2, NXPE4, ZBTB16
18 65.40–65.62 3 ZNF211, ZSCAN4, LOC509810, LOC100124497, LOC104968476
23 5.21–5.52 4 GFRAL, HCRTR2, FAM83B, 5S_rRNA

The regions with the highest positive XP-EHH scores reflecting selection in the GH population (Fig. 2) are located on chromosome 10 (31.46–37.83 Mb and 68.60–68.89 Mb), chromosome 20 (36.29–38.42 Mb and 69.43–69.66 Mb) and chromosome 2 (131.30–131.34 Mb). The regions under positive selection and the annotated genes considering the window 250 kb downstream and upstream of each core SNP for GH are presented in Table 2. The positive selection signatures in the two segments on chromosome 10 included several genes (i.e. SNORA74, C15orf41, MEIS2, TMCO5A, snoU13, SPRED1, SNORA70 and PELI2). The potential candidate genes on chromosome 20 are NIPBL, bta-mir-2360, NUP155, WDR70, GDNF, SNORA17, EGFLAM and IRX1. The extended selection signature on chromosome 2 spanned almost 0.021 Mb. The most important potential candidate genes for this chromosomal segment are LDLRAD2, HSPG2, CDC42 and WNT4.

Table 2. The chromosomal segments under positive selection in the German Holstein (GH) population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
10

31.46–37.83

68.60–68.89

24 U2, SNORA74, C15orf41, MEIS2, U4, TMCO5A, snoU13, SPRED1, SNORA70, U7, PELI2
20

36.29–38.42

69.43–69.66

12 NIPBL, bta-mir-2360, NUP155, WDR70, GDNF, U6, SNORA17, EGFLAM, IRX1
2 131.30–131.34 3 LDLRAD2, HSPG2, CDC42, WNT4

East-DSN vs. West-DSN sub-populations

XP-EHH values for East-DSN and West-DSN sub-populations are shown in Fig. 3. Again, a threshold for the top 0.1 percentile for negative and positive XP-EHH scores was defined. Negative XP-EHH scores indicate selection in the East-DSN population. The most extended selection signature for East-DSN was identified on chromosome 6 (92.97–94.43Mb), spanning a segment of almost 1.46 Mb. The regions under positive selection, and the annotated genes considering the window 250 kb downstream and upstream of each core SNP, are listed in Table 3.

Details are in the caption following the image
XP-EHH scores for each SNP for the West-DSN (positive values) and East-DSN sub-populations (negative values).
Table 3. The chromosomal segments under positive selection in the East-DSN sub-population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
6 92.97–94.43 28 ART3, NUP54, SCARB2, STBD1, SHROOM3, SEPT11, CCNI, CCNG2, CXCL13, CNOT6L, MRPL1
11 82.83 1
13 45.84, 34.16 2 ZEB1
15 41.78,   GALNT18
16 4.93, 5.25, 66.22 3 C16H1orf116, YOD1, PFKFB2, C4BPA, SMG7, NCF2, ARPC5, APOBEC4
21 25.09–25.56 2 SH3GL3, RASGRF1, CTSH, MORF4L1

XP-EHH scores beyond the threshold for the top 0.1 percentile for positive signals indicate targets of recent selection in the West-DSN sub-population (Fig. 3). Extreme positive values for XP-EHH are located on chromosome 1 (at 110.72 Mb), chromosome 4 (at 120.64 Mb), chromosome 5 (at 76.33 Mb), chromosome 7 (at 80.27 Mb), chromosome 8 (between 61.35 and 61.45 Mb), chromosome 10 (between 62.40 and 82.57 Mb), chromosome 12 (between 67.07 and 67.95 Mb), chromosome 14 (between 57.63 and 57.87 and between 84.55 and 84.61 Mb), chromosome 16 (between 47.30 and 48.45 Mb), chromosome 17 (between 26.39 and 26.41 Mb), chromosome 20 (at 6.69 Mb and between 67.01 and 67.49 Mb) and chromosome 21 (between 9.74 and 9.78 Mb and at 34.06 Mb). The genes of interest located in these genomic regions are shown in Table 4.

Table 4. The chromosomal segments under positive selection in the West-DSN sub-population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
1 110.72 1 RF00100
4 120.64 1 VIPR2
5 76.33 1 CYTH4, ELFN2, MFNG, USP18, ALG10
7 80.27 1 RFAM
8 61.35–61.45 2 MELK
10 62.40, 71.69, 82.57 3 DUT, RF00139, SLC12A1
12 67.07–67.95 7 GPC5
14

57.63–57.87

84.55–84.61

7 TRHR, TMEM74, SNTB1
16 47.30–48.45 5 DNAJC11, THAP3, PHF13
17 26.39–26.41 2 RF00003
20 6.69, 67.01–67.49 3 GFM2, NSA2, FAM169A,
21 9.741–9.78, 34.06 3 SIN3A,

Sick vs. healthy GH sub-populations

XP-EHH scores for healthy and sick GH cows are shown in Fig. 4. The top 0.1 percentile for positive XP-EHH values reflects selection signals in the healthy sub-population. The most extended selection signatures indicating recent selection in the healthy sub-population were identified on chromosomes 16 (26.99–27.62 Mb), 22 (46.85–49.25 Mb) and 23 (48.99–49.76 Mb). The genomic regions with extreme peaks for XP-EHH for the healthy sub-population include the annotated genes as listed in Table 5.

Details are in the caption following the image
XP-EHH scores for each SNP for the healthy (positive values) and sick GH sub-populations (negative values).
Table 5. The chromosomal segments under positive selection in the healthy GH sub-population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
16 26.99–27.62 11 TAF1A, MIA3, AIDA, BROX, AUH, TLR5, SUSD4, CCDC1, CAPN8
22 46.85–49.25 22 CACNA1D, CACNA2D3, LRTM1, SELENOK, ACTR8, IL17RB, CHDH
23 48.99–49.76 6 F13A1, NRN1, FARS2, LYRM4, PPP1R3G, RPP40, CDYL

The detected selection signatures in the sick GH sub-population correspond to extreme negative XP-EHH values (Fig. 4). The most obvious effect was on chromosome 15, spanning a segment of almost 5.8 Mb and harboring the genes SLC35C1, CRY2 and MAPK8IP1. The three genes RORA, ICE2 and ANXA2 are located in a 1.2 Mb segment on chromosome 10. The detected segment on chromosome 16 encompasses the genes DIEXF, IRF6, C1orf74, TRAF3IP3, HSD11B1, G0S2, LAMB3 and CAMK1G. The regions under positive selection and the full list of annotated genes are presented in Table 6.

Table 6. The chromosomal segments under positive selection in the sick GH sub-population and the annotated potential candidate genes within a window 250 kb downstream and upstream of each core SNP
Chromosome Segment (in mega base pairs) No. of core SNP Potential candidate genes
10 48.54–49.74 18 RORA, ICE2, ANXA2
12 22.21–23.06 3 FOXO1, COG6, LHFP, SNORA48, NHLRC3, PROSER1, STOML3
13 62.53–63.18 2 KIF3B, ASXL1, NOL4L, NOL4L, COMMD7, DNMT3B, MAPRE1, SUN5, BPIFB2
15 76.86–82.76 11 SLC35C1, CRY2, MAPK8IP1, C11orf94, PEX16, LARGE2, PHF21A, CREB3L1, DGKZ
16 75.49–75.86 2 DIEXF, IRF6, C1orf74, TRAF3IP3, HSD11B1, G0S2, LAMB3, CAMK1G
24 18.78–19.11, 33.30 4 LAMA3, NKRD29, NPC1, C24H18orf8, IOK3, TMEM241

Discussion

Genetic relationships

Between-group relationships indicate a clear genetic separation between DSN and GH. Genetic relationships between healthy and sick GH cows as well as between East- and West-DSN sub-populations were larger, probably owing to there being shared founder animals in recent generations. Accordingly, also based on genomic marker data (Fig. 5), only GH can be clearly distinguished from DSN based on the first two principal components (PCA1 and PCA2). The GH sub-populations (including healthy and sick cows) and the DSN sub-populations (including West-DSN and East-DSN cows) are allocated to opposite clusters. The healthy and sick GH sub-populations are grouped together. In analogy, it was not possible to clearly separate the West- and East-DSN sub-populations based on PCA1 and PCA2. The percentage of the total variation explained by these two principal components was 33.4%.

Details are in the caption following the image
PCA considering GH and DSN sub-populations (East_DSN, DSN cows from East Germany; West_DSN, DSN cows from West Germany; Holstein, GH cows, Healthy, healthy GH sub-population; Sick, sick GH sub-population).

Selection signatures and annotated potential candidate genes

GH vs. DSN populations

The focus of the present study was to detect recent signatures of selection in the DSN and GH populations. DSN is the founder breed of the modern GH population, but represents a breeding goal including both trait categories meat and milk. The divergent breeding goals contributed to genomic differentiation between the two breeds. None of the identified chromosomal segments under recent selection in the GH population overlapped with selection signature segments in the DSN population. Hence, our findings refer to the fact that different breeding strategies with focus on different traits affected different loci in these two populations. On the other hand, the average FST between German Holsteins and DSN was 0.068 ± 0.051, confirming genetic relationships in distant generations. Both criteria, FST and XP-EHH, reflect population differentiation, but they address different periods on the time scale (Sabeti et al. 2007). XP-EHH mostly detects recent positive selection signals, but FST values refer to selection signals from the past.

Network N1 shows physiological pathways for genes overlapping with signatures of positive selection in the DSN population (Fig. 6). Associations between genes included in network N1 and dairy cattle breeding goal traits have been identified in previous studies. For instance, CLU is suggested to be one of the most influential candidate genes affecting milk protein content (Wang et al. 2012; Li et al. 2016). Regarding animal health, CLU is induced in the mammary gland under stress and plays an anti-inflammatory role (Guenette et al. 1994; Humphreys et al. 1999; Piantoni et al. 2010). Accordingly, Silkensen et al. (1994) found increased CLU expressions during stress periods in sick animals. Swanson et al. (2009) verified increased CLU expressions during mastitis episodes in lactating dairy cows. PRKG2, the hub gene from the third cluster (in blue) in network N1, also influences production and conformation traits in cattle populations. Boegheim et al. (2017) identified strong LD between a causative mutation in PRKG2 with several QTL for dairy cattle production traits. Genetic hitchhiking with neighboring genes contributed to increased homozygosity for this specific chromosomal segment. Selection on a mutation within the kinase domain of PRKG2 probably contributed to dwarfism in Angus cattle (Koltes et al. 2009).

Details are in the caption following the image
Network N1, interactions between the identified genes for the DSN population.

WARS2 is one of the genes in network N1, which is involved in the regulation of subcutaneous fat mobilization and fat disposition in human visceral adipose tissue (Schleinitz et al. 2014). Accordingly, Cesar et al. (2014) identified associations between specific fatty acid compositions and WARS2 in Brazilian Nellore steers. WARS2 encodes an essential enzyme to catalyze amino acylation of the tRNA and carries a signal peptide to regulate mitochondrial import (McKusick 2007).

The pronounced XP-EHH scores on chromosome 20 in the GH population confirm selection signatures as identified in Israeli Holsteins (Glick et al. 2012) and GH bull dams (Qanbari et al. 2011). Both populations have been intensively selected on milk yield over decades. Zhao et al. (2015) applied integrated haplotype score methodology in Holstein dairy cattle, and in agreement with results from our study, reported strong selection signatures on chromosome 20. Viitala et al. (2006) reported two significant QTL associated with protein yield, fat percentage and protein percentage in Finnish Ayrshire dairy cattle, which are located close to the regions with the highest positive XP-EHH scores in the GH population. The segregating QTL from this chromosomal region (45–65 cM) includes the GHR gene. The GHR receptor plays a key role in the regulation of growth hormone levels in the mammary gland (Viitala et al. 2006).

Physiological pathways for genes overlapping with signatures of positive selection in the GH population are shown in Network N2 (Fig. 7). NIPBL is a potential candidate gene overlapping with signatures of positive selection on chromosome 20. NIPBL is one hub gene in this network and was included in the first cluster of the network (in blue). Based on a genome-wide scan in Brazilian sheep breeds, NIPBL was identified as a candidate gene involved in various biological functions, such as immunity, nervous system development and reproduction (Gouveia et al. 2017). Furthermore, NIPBL influenced fat percentage and protein percentage in Chinese dairy cattle (Jiang et al. 2014). Pellino E3 Ubiquitin Protein Ligase Family Member 2 (PELI2) overlapped with selection signatures on chromosome 10. Gutiérrez-Gil et al. (2015) identified PELI2 as a candidate gene within core selective sweep regions, regulating the toll-like receptor signaling pathway in dairy cattle. This pathway contributes to immune response. Yang et al. (2014) reported similar PELI2 pathway contributions in pigs.

Details are in the caption following the image
Network N2, interactions between the identified genes for the German Holstein population.

CDC42 overlapped with selection signatures on chromosome 2, and was the hub gene of the second cluster of the network N2 (in green). CDC42 is involved in several pathway activations: the MAPK signaling pathway, the JAK-STAT signaling pathway and the T cell receptor-signaling pathway, e.g. initiating lactation processes in mice (Yamaji et al. 2013). Accordingly, Arun et al. (2015) reported that the JAK-STAT pathway plays an essential role in the co-ordination of gene transcriptions regulating the lactation cycle. The MAPK signaling pathway is involved in cell proliferation, affects hyperplastic growth (Chang 2007) and influences residual feed intake (Rolf et al. 2012).

The annotated genes involved in the third cluster of networks N2 (in red, Fig. 7) did not significantly activate a pathway. However, HSPG2, one of the genes involved in this cluster, contributes to the matrix metalloproteinase pathway (Do et al. 2017). Key functions of matrix metalloproteinases address the regulation of mammary epithelial cell functions, cell proliferation and cell differentiation (Uría et al. 1997).

East-DSN vs. West-DSN sub-populations

The identified regions under recent selection in the West-DSN sub-population did not overlap with the selection signatures in the East-DSN sub-population. Only chromosomes 16 and 21 contributed to selection signatures in both West and East sub-populations, but different chromosomal segments. Interestingly, the XP-EHH scores revealing recent selection signals were quite high, whereas the FST value between the two sub-populations was low (average 0.0085 ± 0.0024). The results underline the close relationship between these two sub-populations in the past, but different breeding strategies in East and West Germany after the Second World War.

The extended selection signature for East-DSN on chromosome 6 confirms the detected QTL for milk and meat quality traits. For instance, Buitenhuis et al. (2016) and Cai et al. (2018) detected several significant QTL influencing milk kappa-casein percentages in Danish Holsteins and in Nordic Holstein cattle respectively. Regarding meat quality and carcass traits, Mateescu et al. (2017) and Michenet et al. (2016) detected SNP significantly associated with marbling score in Angus cattle and weaning weight. In addition, for a chromosomal segment from 4.93 to 5.25 Mb on chromosome 6, several QTL for body weight and carcass quality were reported (McClure et al. 2010).

Interactions between the annotated genes for the East-DSN sub-population are illustrated in network N3 (Fig. 8). The network is separated into three clusters. Genes from the third cluster (in green) regulate the actin cytoskeleton and the Fc gamma R-mediated phagocytosis pathways. The actin cytoskeleton and the Fc gamma R-mediated phagocytosis pathways contribute to metabolic processes, energy mobilization and functions of the immune system (Rolf et al. 2012). Some genes overlapped with selection signatures on chromosome 16 for the East-DSN sub-population. For example, YOD1 regulates protein levels in the endoplasmic reticulum (Lee et al. 2017). PFKFB2 was identified as a candidate gene related to lipid and phospholipid metabolism (Hue & Rider 1987; Buchanan et al. 2016).

Details are in the caption following the image
Network N3, interactions between the identified genes for the East DSN sub-population.

The extended selection signatures in the West-DSN sub-population on chromosomes 8, 12, 16, 17 and 21 harbor several known QTL, mostly associated with body weight and milk composition (MacNeil & Grosz 2002; McClure et al. 2010; Michenet et al. 2016). Regarding meat quality traits, Gutiérrez-Gil et al. (2008) found a QTL on chromosome 16 between 22.5 and 49.5 Mb associated with juiciness. McClure et al. (2010) detected a QTL on chromosome 17 between 22.6 and 27.1 Mb associated with marbling score. Two QTL associated with milk alpha and kappa-casein percentage on chromosomes 17 and 21 (Schopen et al. 2009) overlapped with the selection signatures identified in the present study.

Interactions between the annotated genes for the West-DSN sub-population are illustrated in network N4 (Fig. 9). Network N4 was separated in only two clusters (owing to the low number of annotated genes for the West-DSN sub-population). Two major gene families (ALG and STT) were involved in the second cluster (in green) of network N4. Both genes activate the N-Glycan biosynthesis. Accordingly, Palombo et al. (2018) suggested ALG12 to be a candidate gene for long-chain fatty acid productions in Italian Holstein cows. USP18 (hub gene of the network) is involved in many biological pathways and in various immunological processes (Honke et al. 2016). Lindholm-Perry et al. (2016) detected USP18 gene expressions only for animals with low feed intake and high daily gain. Similarly, Lee et al. (2015) detected USP18 differences in gene expressions between chicken populations with either low or high residual feed intake. Furthermore, Magalhães et al. (2016) reported the influence of USP18 on carbohydrate and lipid metabolism.

Details are in the caption following the image
Network N4, interactions between the identified genes for the West DSN sub-population.

Sick vs. healthy German Holstein sub-populations

The identified selection signatures in the healthy GH sub-population on chromosome 16 confirm the identified QTL for health traits. For instance, Smetko et al. (2015) found a pronounced population differentiation between local African breeds in the same segment on chromosome 16. The population differentiation was associated with trypanosomiasis susceptibility. Cole et al. (2011) identified a QTL for claw health indicator traits, i.e. foot angle and rear leg quality, in this segment on chromosome 16. In the identified selection signature segment on chromosome 22, Kolbehdari et al. (2008) found significant SNP markers associated with bone quality. Interactions between the identified genes for the healthy GH sub-population are illustrated in network N5 (Fig. 10). The first cluster of the network (in red) included seven genes. Nevertheless, those genes did not significantly activate a pathway. Six genes (hub gene FARS) are involved in the second cluster of the network (in green). McCarthy et al. (2010) defined FARS2 as a ‘down-regulating gene’ in high-yielding Holstein dairy cows during the energy-deficiency period in early lactation. We hypothesize strong associations between energy deficiency and the occurrence of claw health. In this regard, Collard et al. (2000) found a significant correlation between energy balance traits and laminitis. Boettcher et al. (1998) identified lameness as the most important disease for cows with a negative energy balance during the first 50 days in milk. The annotation list is enriched with genes of biological interest involved in wound healing pathways. According to Shearer et al. (2015) and Wilson-Welder et al. (2015), these genes might affect dermatitis digitalis, because claw lesions in dairy cattle follow a common wound healing process. The third cluster of network N5 (in blue) considers four genes (hub gene CACNA2D3). The annotated genes from this cluster activate the MAPK signaling and oxytocin signaling pathways. Rolf et al. (2012) reported the impact of the MAPK pathway on residual feed intake in Angus cattle. Regarding causalities with health, Wassmuth et al. (2000) found pronounced genetic correlations between feed efficiency and incidences of claw disorders.

Details are in the caption following the image
Network N5, interactions between the identified genes for the healthy GH sub-population.

The selection signature segments for the sick GH sub-population include several QTL for claw disorders and for feet and leg conformation traits. In this regard, van der Spek et al. (2015) and Wu et al. (2016) identified significant SNPs on chromosome 10. Buitenhuis et al. (2007) detected a QTL on chromosome 15 at 78.0 Mb for bone quality and a QTL for hock quality.

The interactions between the annotated genes are shown in network N6 (Fig. 11). The genes involved in the first two clusters of the network (in red and in blue) did not activate a pathway significantly. The third cluster (in green) includes six genes (hub gene CRY1), which are involved in the activation of the circadian rhythm pathway (Wang et al. 2015). Interestingly, overexpression of CRY1 was associated with lower blood glucose concentrations (Zhang et al. 2010). Subsequently, animals with low blood glucose concentrations had a higher risk for the occurrence of claw disorders during the first two months in lactation (Wilhelm et al. 2017).

Details are in the caption following the image
Network N6, interactions between the identified genes for the sick GH sub-population.

None of the identified regions under recent selection in the sick GH sub-population overlapped with identified regions in the healthy sub-population. Such a finding shows that selection affected different loci in both sub-populations. Only chromosome 16 (but different chromosomal segments) showed selection signatures in both healthy and sick sub-populations.

Conclusion

Application of XP-EHH methodology successfully identified several putative selection signature segments in local DSN and in GH cattle sub-populations. Selection signatures could be explained in the context of (sub)population selection strategies. The divergent breeding goals contributed to genomic differentiation between the two breeds. None of the identified regions under recent selection in the GH population overlapped with such regions in the DSN population. Similarly, recent selection signatures in the sick GH sub-population did not overlap with selection signatures in the healthy GH sub-population. Such a finding refers to the fact that selection affected different loci in both sub-populations. Identified selection signature segments overlapped with important potential candidates genes, influencing diseases and production traits. For instance, CLU was detected as a hub gene in the DSN population and is one the most promising potential candidate genes affecting milk protein concentration. USP18 was identified as a candidate gene in the West-DSN sub-population, influencing the lipid and carbohydrate metabolism. FARS2, known to be an important gene for the regulation of energy balances in high-yielding Holstein dairy cows, was detected in the healthy German Holstein sub-population.

Acknowledgements

This work was financially supported by the German Federal Ministry of Food and Agriculture based on a decision of the Parliament of the Federal Republic of Germany, granted by the Federal Office for Agriculture and Food (grant numbers FKZ 2818BM090 and FKZ 2818BM091).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.