Introduction

Since the human Y chromosome is characterised by the presence of the largest non-recombining region in the whole human genome, sensitivity of genetic variation to drift phenomena, a unique inheritance pattern and specificity to males, its polymorphism has been widely studied by researchers interested in human evolution and forensic geneticists (Jobling and Tyler-Smith 2003; Butler 2003). Depending on the time scales of the population history events, different types of polymorphic markers abundant on the Y chromosome are available for research. Analysis of slowly evolving Y-chromosomal biallelic polymorphisms have enabled deeper insight into prehistoric population movements and colonisation waves in Europe (Semino et al. 2000; Rootsi et al. 2004). On the other hand, Y-chromosomal short tandem repeats (Y-STRs) are characterised by a relatively high mutation rate and seem to be much more suitable for genetic studies of more-recent events. Although the combination between Y-chromosomal microsatellites and biallelic polymorphisms yields the highest level of resolution and a means of clarifying complex genetic histories (Weale et al. 2001, 2002; Das et al. 2004), Y-STR data alone also provide very useful information for analyses of interpopulation diversity and have been widely applied in resolving differentiation of various human populations (Kayser et al. 2001; Ploski et al. 2002; Caglià et al. 2003; Roewer et al. 2005; Immel et al. 2006).

The Indo-European linguistic family has the largest number of speakers of the recognised families of languages in the world today. In this family, Slavic languages form a group of closely related languages with close to 250 million speakers worldwide (Schenker 1995). In Europe, Slavs are the most numerous ethnic and linguistic group of peoples, residing chiefly in eastern and southeastern Europe but also extending across northern Asia to the Pacific Ocean. The early medieval great migrations in Europe utterly changed the ethnic and linguistic situation of the continent and spread the Slavic settlement in the fifth to sixth centuries over the major part of Eastern Europe, leading to the ethno-cultural subdivision of the primarily united Proto-Slavic community (Encyclopædia Britannica 2006). Nowadays, from the linguistic, cultural and geographic point of view, Slavs are customarily divided into three major subgroups: Eastern Slavs (Belarusians, Russians, Ukrainians), Western Slavs (Poles, Slovaks, Czechs, Lusatians), and Southern Slavs (Slovenes, Croats, Bosnians, Montenegrins, Serbs, Macedonians, Bulgarians).

Since the ethno-cultural subdivision of Slavs and the formation of modern Slavic nations took place quite recently, and significant differences in Y-STR haplotype distribution exist even between closely related human populations (Roewer et al. 1996), hypervariable Y-chromosomal microsatellites seem to be markers of choice for the study of mutual relations between different Slavic ethnic groups. So far, it was demonstrated that Slavic Y-STR haplotype paternal lineages form a separate branch in a phylogenetic tree of European populations (Roewer et al. 2001, 2005). However, no comprehensive analysis of interpopulation Y-STR haplotype variation between different Slavic groups was available. This study has focused on providing a phylogenetic overview of closely related Central-Eastern European populations of Poland, Slovakia and Belarus to analyse their relationship with each other and with other Slavic populations, and to investigate how these relations reflect Slavs’ historical migrations.

Materials and methods

Eighteen Y-chromosomal microsatellite loci: DYS19, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS426, DYS437, DYS438, DYS439, DYS460, GATA H4.1, DYS385 a/b, and YCAII a/b, were genotyped in randomly selected, unrelated Poles (n = 208), Slovaks (n = 164) and Belarusians (n = 196) by means of a multiplex polymerase chain reaction (PCR) and capillary electrophoresis, as previously described (Rębała and Szczerkowska 2005). The Belarusian population included samples from three distinct regions: 86 males from southern Belarus, 57 males from central Belarus, and 53 males from northern Belarus. Additionally, all Belarusian samples were genotyped at the M46 (Tat) locus by a PCR restriction fragment-length polymorphism (RFLP) method (Kayser et al. 2005). The products of amplification of the M46 locus were digested with Hsp92II restriction endonuclease (Rybakowski et al. 2002), separated by polyacrylamide gel electrophoresis, and visualised by silver staining. The Y-STR haplotype data for other 20 Slavic (n = 2,937) and nine neighbouring non-Slavic (n = 1,428) populations were obtained either from the Y chromosome haplotype reference database (http://www.ystr.org) (Roewer et al. 2001) or from the literature (Pepinski et al. 2004a, b; Lauc et al. 2005; Lovrečić et al. 2005; Marjanovic et al. 2005; Peričić et al. 2005; Spiroski et al. 2005; Lessig et al. 2006; Rodig et al. 2007).

Analysis of molecular variance (AMOVA) was performed with the use of Arlequin 3.1 software (Excoffier et al. 2005) to calculate matrices of pairwise F ST and R ST values between populations. Associated probability values were estimated from 10,100 permutations. Linearised F ST and R ST values (Slatkin 1995) were applied to build a neighbour-joining tree using the options NEIGHBOUR and DRAWTREE in the PHYLIP package (Felsenstein 2004), and to conduct a multidimensional scaling analysis using the STATISTICA 7.1 software (StatSoft). In all calculations, DYS389 was considered as a haplotype of two independent loci: DYS389I (repetitive stretches: p + q) and DYS389II-I (repetitive stretches: m + n) (Rolf et al. 1998), while microvariants, null alleles and locus multiplications were treated as missing data.

Results and discussion

AMOVA of 18-locus Y chromosome haplotypes revealed significant differences between the populations of Belarus, Poland and Slovakia (F ST = 0.0100; P < 0.0001) as well as between the three Belarusian subpopulations (F ST = 0.0084; P = 0.0034). Analysis of pairwise F ST values for 18 Y-STRs demonstrated that heterogeneity within the Belarusian population was caused by differences between northern and central Belarus while the remaining pairwise comparisons did not yield statistically significant F ST values. The most outstanding populations were those of Poland and northern Belarus, while populations of central Belarus, southern Belarus and Slovakia were genetically indistinguishable. The Polish population was shown to be distinct from all studied populations (P = 0.01 for comparison with Slovakia, P < 0.005 for all other comparisons), while in the case of northern Belarus, the only insignificant F ST value was for a comparison with the southern part of the country.

Our results have demonstrated that the Y-STR haplotype distribution does not reflect the linguistic and/or historical affiliations between Slavic populations. The Polish paternal lineages revealed by Y-chromosomal microsatellite haplotype analysis were previously reported to be distant from a number of non-Slavic European populations and Slavic-speaking Muscovites (Ploski et al. 2002), but no Slavic populations inhabiting Poland’s geographic neighbours were taken into consideration. Despite the usage of very closely related languages by Poles and Slovaks as well as over half a millennium of a common state bordering the majority of Poles and Belarusians, the Polish Y-STR haplotype heritage was shown to be distinct from that of both neighbouring Slavic nations. On the other hand, analysis of Y chromosome haplotypes defined by 18 loci has revealed genetic homogeneity between Slovaks and two subpopulations of Belarus, although both populations are geographically distant and isolated, speak tongues belonging to separate branches of the Slavic language group, and have never shared common state borders throughout their histories.

Since for the majority of other Slavic populations, only nine-locus haplotypes (DYS19, DYS389I, DYS389II-I, DYS390, DYS391, DYS392, DYS393, and DYS385 a/b) were available, only these loci were used for further analysis. A comparison of our Polish haplotypes defined by the selected nine loci with data for six Polish regional subpopulations from the Y chromosome haplotype reference database confirmed previously observed homogeneity of Polish subpopulations (F ST = −0.0003; P = 0.66) (Ploski et al. 2002), and in AMOVA, all Polish samples were combined. Results revealed four clusters of Slavic populations connected by a network of statistically insignificant F ST values (P > 0.05): (1) all Western-Slavic and Eastern-Slavic populations, Slovenes and western Croats; (2) Lusatians; (3) Southern-Slavic northern Croats and Bosnians; (4) Southern-Slavic Serbs, Macedonians, and Bulgarians (Table 1). However, at the significance level of 0.01, only one such cluster involving all Slavic populations was disclosed. The distinctiveness of Southern-Slavic populations was observed as a separate branch in a neighbour-joining tree, while multidimensional scaling has displayed a nucleus of seven genetically indistinguishable populations with very small relative genetic distances, which involved population samples of Eastern-Slavic (Ukrainian, Russian, Belarusian), Western-Slavic (Slovak) as well as Southern-Slavic (Slovene) origin (Fig. 1).

Table 1 Y-STR haplotype pairwise F ST values below the diagonal and corresponding P values above the diagonal (P values > 0.05 are bold) for 19 Slavic populations
Fig. 1
figure 1

Two-dimensional plot obtained from multidimensional scaling, and a neighbour-joining tree, based on pairwise F ST values for nine-locus Y-STR haplotypes observed in 19 Slavic and 9 non-Slavic populations. Ellipses are traced around genetically homogeneous northern Slavic populations and clusters of non-Slavic populations with known ethno-historical affiliations. Arrows indicate historically proved directions of migrations. A dotted line connects populations with a disputed direction of migration, which inhabit areas designated according to the various sources as the Slavic homeland

Since there is no clear consensus over the accuracy of different statistical parameters estimating genetic distances between populations in studies using microsatellite markers, both a classical allele frequency-based differentiation estimator (F ST) and its stepwise mutation model-based analogue (R ST) are commonly reported (Balloux and Lugon-Moulin 2002). Therefore, we applied both distance methods to assess genetic relations between various Slavic and non-Slavic populations. In accordance with results obtained for autosomal STRs in various human populations (Pérez-Lezaun et al. 1997; Destro-Bisol et al. 2000) and for Y-chromosomal microsatellites in sub-Saharan Africans (Caglià et al. 2003), the pattern of Y-STR interpopulation diversity among Slavs and neighbouring populations, based on F ST values, appeared to be congruent with known ethno-historical relationships, while that based on R ST values revealed unexpected and unconvincing population affinities.

Both the multidimensional scaling plot and the neighbour-joining tree, based on the F ST values, revealed genetic proximity between related populations: (1) Germans from Bavaria and Saxony, (2) Italians from Latium and Veneto, (3) Turks from Anatolia and Bulgaria, and (4) Balts from Latvia and Lithuania (Fig. 1). The F ST-based results were consistent with expectations also in case of three isolated Slavic populations: (1) Lusatians from southeastern Germany, who are descendants of Slavic tribes that have inhabited the lands between the Elbe and Oder rivers since the fifth century (Encyclopædia Britannica 2006), (2) Polish Belarusians, who colonised parts of Podlachia (northeastern Poland) in the 15th to 16th centuries after arrival from the Hrodna region (Wiśniewski 1964), and (3) a community of Russian settlers (Old Believers), who arrived in Podlachia in the eighteenth century from the Pskov and Novgorod regions (Grek-Pabisowa 1968). All three ethnic groups were shown to be homogeneous with only one of all compared populations (Table 1), representing their population of origin. Lusatians revealed Y-STR homogeneity with the neighbouring population inhabiting areas from which they are supposed to migrate, i.e. with Poles. In the case of Podlachian Belarusians, such a population of origin was geographically the closest population of central Belarus (involving the city of Hrodna), while in the case of Polish Old Believers it was Russians from the Novgorod region.

On the other hand, in the R ST-based multidimensional scaling plot and the neighbour-joining tree, only the two Baltic populations were clearly isolated from unrelated ethnicities, while separation of German, Italian and Turkish populations was no longer visible (Fig. 2). Moreover, statistically insignificant P values for R ST genetic distances were observed between historically, linguistically, and geographically unrelated populations such as Turks and Italians from Latium, or Belarusians and Germans from Saxony. In the case of the three isolated Slavic populations, genetic homogeneity restricted only to the populations of origin, as observed for F ST values, was lost in the R ST method. Thus, the F ST genetic distances reflect interpopulation relationships between the compared populations much better than their stepwise-based analogues. This indicates that the genetic differences in the Y-STR haplotype distribution between the Slavic populations and their nearest neighbours are caused mainly by drift, and is explained by the fact that a period of time since the differentiation of the compared populations was too short to allow for an effective impact of mutations on Y-chromosome variation (Pérez-Lezaun et al. 1997; Caglià et al. 2003). Therefore, in further analysis, we focused only on the results obtained by the classical allele frequency-based approach for estimation of Y-STR genetic distances.

Fig. 2
figure 2

Two-dimensional plot obtained from multidimensional scaling, and a neighbour-joining tree, based on pairwise R ST values for nine-locus Y-STR haplotypes observed in 19 Slavic and nine non-Slavic populations. An ellipse is traced around a cluster of populations with known ethno-historical affiliations Lu Lusatians; Po Poles; Slvk Slovaks; BeN, BeC, BeS, BePdl Belarusians from northern Belarus, central Belarus, southern Belarus, Podlachia; RuPdl, RuNov, RuMos, RuVla Russians from Podlachia (Old Believers), Novgorod region, Moscow region, Vladivostok region; Ukr Ukrainians; Slvn Slovenes; CroW, CroN Croats from western Croatia, northern Croatia (Zagreb region); Bo Bosnians; Se Serbs; Ma Macedonians; Bu Bulgarians; Lit Lithuanians; Lat Latvians; GeBav, GeSax Germans from Bavaria, Saxony (Dresden region); ItLat, ItVen Italians from Latium, Veneto; Hun Hungarians; TuAn, TuBu Turks from Anatolia, Bulgaria

Comprehensive analysis of Slavic Y-chromosomal microsatellite haplotypes on a European scale confirmed previous observations for 18 Y-STR loci in the Polish, Slovak and Belarusian populations: no relationship between the customary linguistic division of Slavs and the Y-STR haplotype distribution was disclosed. The most apparent genetic distance was found between the northern (Eastern and Western) and Southern Slavs, who at the end of the 9th century were separated by the invasion of Finno-Ugric Hungarians. The AMOVA showed that the variation observed between both population groups was 4.3% (F CT = 0.0428; P = 0.0003), which was higher than the level of genetic variance among populations within the groups (1.2%; F SC = 0.0130; P < 0.0001). This difference was even more profound when the R ST-based distance method was applied: genetic variation between both population clusters was 19.8% (R CT = 0.1984; P = 0.0002) while the interpopulation variance within the groups was only 1.8% (R SC = 0.0228; P < 0.0001). The observed northern Slavic Y-STR genetic homogeneity extends from Slovakia and Ukraine to parts of Russia and Belarus, but also involves Southern-Slavic populations of Slovenia and western Croatia, and is the most probably due to a homogeneous genetic substrate inherited from the ancestral Slavic population. However, due to the Y-STR proximity of linguistically and geographically Southern-Slavic Slovenes and western Croats to the northern Slavic branch, the observed genetic differentiation cannot simply be explained by the separation of both Slavic-speaking groups by the non-Slavic Romanians, Hungarians, and German-speaking Austrians. A similar difference has been previously reported between Bulgarians and a few other Slavic populations (Roewer et al. 2005), and our results demonstrate that other Southern-Slavic populations, namely Macedonians, Serbs, Bosnians, and northern Croats are genetically distinct from their northern linguistic relatives as well. Roewer et al. (2005) attributed a possible explanation for these differences to the admixture of Y chromosomes of Finno-Ugric and Turkic-speaking peoples who had invaded and settled in the Danube basin and the Balkans. However, we found that the only population that revealed an insignificant F ST value in comparison with the Finno-Ugric Hungarians was the population of western Croatia, and this putative admixture did not significantly affect the Y-STR proximity of western Croats to Eastern and Western Slavs (Table 1). All other F ST values obtained for comparison of nine-locus haplotypes of Hungarians and Turks from Bulgaria and Anatolia, with those of 19 Slavic populations appeared to be statistically significant (P = 0.01 for a comparison of Hungarians with the population of central Belarus, P = 0.002 for a comparison of Bulgarian Turks with Bulgarians, P ≤ 0.0003 for all other comparisons). Thus, the contribution of the Y chromosomes of peoples who settled in the region before the Slavic expansion to the genetic heritage of Southern Slavs is the most likely explanation for this phenomenon. On the other hand, our results indicate no significant genetic traces of pre-sixth-century inhabitants of present-day Slovenia in the Slovene Y chromosome genetic pool.

Although the existence of the Balto-Slavic linguistic community, or at least territorial contiguity of Proto-Baltic and Proto-Slavic in the past, is generally accepted (Schenker 1995), AMOVA revealed significant differences in Y-STR distribution between Slavic and Baltic populations (P < 0.005 for all pairwise comparisons), which is likely to result from the previously observed different Y-chromosomal haplogroup distribution (Rosser et al. 2000). The Baltic populations are characterised by the high incidence of the Y-chromosomal haplogroup N3 (47% among Lithuanians, 32% among Latvians) (Rosser et al. 2000; Zerjal et al. 2001). Its distribution pattern in Slavic populations indicates that Proto-Slavs did not carry this lineage at a substantial frequency, since it is relatively rare among Slavs and at high frequency was observed only in some Russian subpopulations (Malyarchuk et al. 2004). Since at the significance level equal to 0.01, the only population that did not yield statistically significant F ST values for comparisons with other Slavic and both Baltic populations was the Slavic-speaking population of northern Belarus, we estimated haplogroup N3 frequencies in the three Belarusian subpopulations. The results suggest that the uniqueness of the northern Belarusian population is most likely due to the high incidence of Y chromosomes from the haplogroup N3 (18.9%), which has half the frequency in central and southern Belarus (8.8 and 8.1%, respectively). Therefore, although the early ethnogenesis of the Belarusian nation has customarily been linked to the gradual Slavicisation of the homogeneous Baltic substrate on the territory of present-day Belarus (Sedov 1970), only northern Belarus seems to be a transient area for the Baltic and Slavic settlement. Apart from Balts, the N3 Y chromosomes are also prevalent among Finno-Ugrians (Rosser et al. 2000; Zerjal et al. 2001), but it was found that Y-STR haplotypes from the haplogroup N3 differ in Baltic and Finno-Ugric populations, most likely due to two distinct migration waves of people carrying N3 Y chromosomes (Zerjal et al. 2001; Kasperavičiūtė et al. 2004). On the contrary to the Finno-Ugric populations, where allele DYS19*14 is the most common among haplogroup N3 males, allele DYS19*15 is the most frequent in the Baltic haplogroup N3 (Kasperavičiūtė et al. 2004), and was also the most frequent allele in the haplogroup N3 in northern Belarus (60.0%), providing additional evidence for the presence of the Baltic substrate in the genetic pool of Belarusians from the northernmost part of the country.

Localisation of the Slavic homeland prior to their great expansion in the fifth to sixth centuries is one of the key problems of European history in the first millennium AD. Although it is assumed that prehistorically the original habitat of Slavs was Asia, from which they migrated in the third or second millennium BC to populate parts of Eastern Europe (Encyclopædia Britannica 2006), a debate concerning the European homeland of Slavs seems to remain unsolved. Because Slavs unequivocally enter the records of history as late as the sixth century AD, when their expansion in Eastern Europe was already advanced, different theories concerning the Slavs’ geographic origin based on archaeological, anthropological and/or linguistic data have been formulated. Two such theories have gained the largest support among the scientists (Schenker 1995), one placing the cradle of Slavs in the watershed of the Vistula and Oder rivers (present-day Poland), and the other locating it in the watershed of the middle Dnieper (present-day Ukraine). Our results indicate that using the population-of-origin approach based on the AMOVA, as many as nine (P > 0.05) or ten (P > 0.01) populations can be traced back to the lands of present-day Ukraine, including Eastern-Slavic Russians and Belarusians, Western-Slavic Poles and Slovaks, and Southern-Slavic Slovenes and Croats. On the other hand, the Polish population gave insignificant F ST values in pairwise comparisons with only one (i.e. Ukrainians) or three (i.e. Ukrainians, Slovaks, and Lusatians) populations (P > 0.05 or 0.01, respectively). Moreover, the Y-STR genetic distance between Poles and Belarusians, who are geographic neighbours (Table 1), excludes significant gene flow between the two populations and localisation of Belarusians’ ancestors in present-day Poland.

In conclusion, we have demonstrated that Y-STR haplotype distribution divides Slavs into two genetically distant groups: one encompassing all Western Slavs, Eastern Slavs, Slovenes and Western Croats, and the other involving all remaining Southern Slavs. Many northern Slavic populations are genetically indistinguishable in regard to the nine-locus Y-STR haplotype variation, and this homogeneity extends from the Alps to the upper Volga, and even as far as the Pacific Ocean (eastern Russia), regardless of linguistic, cultural and historical affiliations of the Slavic ethnicities. The example of Slovaks and Belarusians shows that this homogeneity is likely to be extended to other Y-chromosomal microsatellites as well. Results of the interpopulation Y-STR haplotype analysis exclude a significant contribution of ancient tribes inhabiting present-day Poland to the gene pool of Eastern and Southern Slavs, and suggest that the Slavic expansion started from present-day Ukraine, thus supporting the hypothesis that places the earliest known homeland of Slavs in the basin of the middle Dnieper. To our knowledge, this is the first report on the use of genetic markers in solving the question of the localisation of the Slavic homeland.