Skip to main content

NOTICE: Site maintenance will be performed on Tuesday, May 14.
New account registration, updates to existing accounts, token activation, and single-article purchases will be unavailable during this time.

FreeReports

Genetic Support for Proposed Patterns of Relationship among Lowland South American Languages 1

Departamento de Gentica, Instituto de Biocincias, Universidade Federal do Rio Grande do Sul, Caixa Postal 15053, 91501970 Porto Alegre, RS, Brazil (all authors)/Departamento de Estatstica, Instituto de Matemtica, Universidade Federal do Rio Grande do Sul, Campus do Vale, 91540000 Porto Alegre, RS, Brazil ([email protected]) (CallegariJacques). 25 v 05

Comparison of different sets of markers to unravel the past of human populations is an established procedure in both anthropology and genetics. Language characteristics can be easily quantified, and the field of comparative linguistics has a respectable past (Trask 1996; Nichols 1997, 2002; Campbell 1997). Therefore, it is only natural that evolutionary geneticists have turned to linguistics to evaluate the population relationships that they have been obtaining with genetic markers.

Quantitative geneticlinguistic population comparisons have been conducted for some three decades. A landmark paper for South American populations was that of Spielman, Migliazza, and Neel (1974), which extensively considered the regional differences among Yanomama from both the linguistic and the genetic point of view. At the worldwide level the studies of CavalliSforza et al. (1988; CavalliSforza, Minch, and Mountain 1992; review in CavalliSforza 1997) and recent reanalysis in Nettle and Harris (2003) are notable. Although these investigations have been severely criticized (Bateman et al. 1990a,b), their validity has been confirmed in some cases (see discussion in Nettle and Harris 2003). All of these studies have involved blood group and protein genetic markers. At the DNA level the patterns of agreement are not so clear, but a strong correspondence has been reported at the world level between Ychromosome markers and linguistics (Poloni et al. 1997, Barbujani 1997).

There is no consensus among linguists about the relationships among South American Indian languages. The classification of these languages presents several difficulties. First, this subcontinent exhibits considerably more linguistic diversity than North and Central America combined. Second, significant historical linguistic research has been conducted on only a few of these families and isolates. Third, the dominant scientific tendency has been to present broad, largescale classifications, while historical research on individual languages families has received less attention (Campbell 1997:170).

Our group has been interested in aspects of evolutionary genetics in South America for a long time (Salzano et al. 1977, Black et al. 1983, CallegariJacques and Salzano 1989, CallegariJacques et al. 1993, Fagundes et al. 2002), and we have often relied on the linguistic classifications proposed by various scholars, aware that they disagree with regard to the relationships between language families. Because of our efforts and those of others, a large amount of information on protein genetic markers in these populations has been compiled, and we wondered whether these data could help clarify the issue of native language classification in this subcontinent. The objective of this report, is distinct from that of our previous work, is to evaluate the agreement of the available genetic data with the alternatives proposed by three eminent linguists (C. Loukotka, J. H. Greenberg, and A. D. Rodrigues) for the relationships among the four most important lowland South American native language families, namely, Maipure, Carib, Tupi, and Ge.

The Maipure or Arawak family is the largest in the New World. It covers a wide geographic region, including languages spoken in Central America and the Gran Chaco and the area from the Andes to the mouth of the Amazon River. Maipure used to be thought of as a major subgroup of Arawak, but Campbell (1997) has suggested that at present the terms are interchangeable. Carib designates a family found on the northern coast of South America and the Lesser Antilles. The majority of the groups that speak languages of the Carib family live in the Guiana region, but there are also speakers of it south of the Amazon (Rodrigues 1994). Tupi languages occur in a vast region across South America, but their dispersion may have begun 3,0005,000 years ago in the region between the Madeira and Xingu Rivers in what is now centralwestern Brazil (Urban 1992). The Ge languages are spoken mainly in the eastern and central areas of the Brazilian Central Plateau. Their point of dispersion may have been somewhere between the headwaters of the San Francisco and the Araguaia River, with a split leading to southern and northern expansions (Urban 1992).

Loukotka (1968) proposed that the Maipure family was more similar to the Tupi and that Ge was more remotely related to this pair of families than the Carib family. Rodrigues (1985, 1994, 2000, personal communication) has linked Carib with Tupi and regards Maipure as less closely related to this pair than Ge. Greenberg (1987) groups Maipure with Tupi and Carib with Ge. A schematic representation of these ideas is displayed in figure 1.

Fig. 1.

Schematic representation of the relationships among the four main lowland South American native linguistic families according to Greenberg, Loukotka, and Rodrigues.

Subjects, Methods, and Analysis

A databank for the evaluation of South American Native genetic variability was started by one of us (SMCJ) in 1982, using information from previous decades, and has been kept up to date ever since. We began by identifying the populations studied for blood group and protein genetic traits that fell into the four linguistic families of interest. The available data consisted of 37 genetic systems (including blood groups, electrophoretic markers, immunoglobulins, and histocompatibility lociHLA) that we will call classical and 13 autosomal shorttandemrepeat polymorphisms (STRPs), which directly assess the variability at the DNA level. The amount of information related to the latter set of markers was much more limited, but we decided to see whether the STRP results would support the conclusions obtained with the first set. The testing of the hypotheses concerning language relationship patterns of Greenberg, Loukotka, and Rodrigues was performed using genetic data by means of a method developed by CavalliSforza and Piazza (1975) to reconstruct the evolutionary history of populations, as follows: Suppose that a population (e.g., South American Indians) has several levels of nested subdivisions (e.g., language classifications). Individuals are nested within languages, which in turn can be grouped into larger groups, for example, stocks or families. The hierarchy can be represented by a bifurcating tree in which the terminal nodes are populations defined according to the same criterion (e.g., linguistic families) and the internal nodes (bifurcating points) are hypothetical populations from which the terminal nodes could have been derived.

Consider now that allele frequencies of several loci have been determined in the populations defined as terminal nodes of the tree. CavalliSforza and Piazza (1975) showed that the expected covariance of gene frequencies between two existing populations equals the expected variance associated with the highest internal node that links them. This theorem allows estimation of the variance associated with any internal node of the hierarchy, whether or not the population at that node has been observed. They also showed that the variance of an extant population is equal to a sum of certain quantities defined as the amounts of evolution that have occurred in the transitions from an ancestor common to all populations through all the intermediate nodes in the line to the population considered. An assumption related to this theory is independence of effects across hierarchical levels, but no assumptions are made about gene frequency transformations. The latter have been used to stabilize the variance (for instance, arcsine transformation [CavalliSforza and Piazza 1975) or to enhance the interpretation of the variancecovariance matrix (Urbanek, Goldman, and Long 1996), rendering its elements equal to coefficients of gene identities (Nei 1972).

These are the general bases for establishing tests of different models of bifurcating trees using genetic information. The adjustment of the data to a given tree can be evaluated by a goodnessoffit test using a likelihood ratio statistic (lambda), which involves the observed variancecovariance matrix and the corresponding theoretical matrix estimated by maximum likelihood (CavalliSforza and Piazza 1975). With a large number of traits, lambda follows a chisquare distribution with degrees of freedom equal to s(s + 1)/2 minus the number of fitted parameters, where s is the number of external nodes (populations for which there is actual genetic information). A specific tree (e.g., a particular pattern proposed for the linguistic relationships among languages) constitutes a null hypothesis that will be rejected if the tests P value is equal to or less than an arbitrary figure (in this study 0.05).

In order to have a better general picture of the fit of other models to our data, we tested all possible bifurcating trees for four sets of populations (language families). This number happens to be 15 (CavalliSforza and Piazza 1975), 3 of them being the models suggested by Greenberg, Loukotka, and Rodrigues. Two of the remaining 12 are the trees inferred using Neis (1972) standard and Nei et al.s (1983) DA genetic distances and the neighborjoining algorithm (Saitou and Nei 1987) for our data. These two topologies yield elements for a test of the treeness of the genetic data, which is an assumption for the whole analysis. In other words, if the treeness quality is rejected for the genetic markers, no tree will fit the genetic data well (e.g., Hunley and Long 2005).

Details about the classical genetic systems employed are provided in table 1, the populations considered being those that speak languages classified as Maipure, Carib, Ge, or Tupi. Gene frequencies were obtained by direct counting or using maximum likelihood procedures in the case of dominance. For the HLA data the squareroot method was used, followed by appropriate adjustments. The average gene frequency obtained from a variable number of tribes, according to the information available for each system, was used to represent each language family. The use of an average allows a better genetic characterization of the language family, since it reduces the drift effect that would occur if individual tribes were considered.

Table 1

Number of Samples and Individuals Studied According to Four Linguistic Families and

37

Blood Group and Protein Systems

Genetic Systema Maipure Carib Ge Tupi
No. of Individuals No. of Samples No. of Individuals No. of Samples No. of Individuals No. of Samples No. of Individuals No. of Samples
ABO 3,622 28 7,675 43 2,543 14 3,243 33
ACP 1,232 6 2,742 9 1,083 8 2,386 21
AK 1,256 7 1,862 8 1,064 9 2,397 21
ALB 1,320 6 2,814 11 2,181 10 2,412 20
CA2 1,099 4 1,490 6 1,351 8 1,522 13
CHE1 118 1 945 5 1,065 9 2,036 19
CP 1,141 5 1,698 7 1,646 10 2,099 18
DIEGO 2,261 17 4,083 35 2,091 10 1,404 22
DUFFY 2,272 17 5,172 37 2,127 12 2,087 29
ESA 1,092 4 1,546 6 1,214 8 1,452 14
ESD 1,183 5 1,798 8 1,466 10 1,834 17
G6PD 234 5 785 10 405 8 1,011 21
GC 1,159 7 2,671 9 2,350 10 973 10
GLO 211 3 1,094 7 361 5 1,283 13
GM 2,291 11 2,805 11 1,152 3 1,708 16
GPI 1,036 4 1,093 3 716 2 268 2
HBA 2,468 19 3,981 32 2,206 12 2,687 31
HBA2 765 1 1,014 7 1,168 0 1,756 16
HBFET 75 1 320 2 494 1 150 2
HLAA 40 1 439 5 498 5 1,144 12
HLAB 40 1 439 5 498 5 1,144 12
HLAC 40 1 302 3 498 5 1,078 11
HP 2,297 14 4,396 24 2,392 13 2,563 27
KELL 2,211 16 4,687 27 2,242 12 2,180 28
KIDD 1,772 12 3,125 19 1,423 6 886 13
KM 2,024 10 2,961 14 1,647 6 1,628 18
LEWIS 1,306 8 2,173 8 1,193 3 277 6
LUTHERAN 418 4 1,810 10 174 1 152 2
MDH 1,050 5 1,173 4 659 2 268 2
MNSs 2,192 16 4,508 24 1,953 9 1,818 23
P 2,141 16 4,985 30 2,044 10 2,204 27
PEPA 1,187 5 1,496 6 788 3 1,462 15
PGD 1,462 7 2,914 11 238 10 2,293 21
PGM1 1,244 6 2,509 9 1,452 10 2,314 20
PGM2 1,190 5 1,840 7 1,034 4 1,900 17
RH 2,361 18 5,283 34 2,374 13 2,318 28
TF 2,446 16 4,085 23 2,354 13 2,863 27
                 
Mean 1,358 8 2,560 14 1,355 8 1,654 17
Median 1,232 6 2,173 9 1,214 9 1,756 18
Modal class   15   610   610   1620
Range 403,622 128 3027,675 243 1742,543 114 1503,243 233
No. of tribes   14   15   5   24

a Information about these systems can be found in Roychoudhury and Nei (1988) and Vogel and Motulsky (1996). The allele frequency data can be obtained from S. M. CallegariJacques () by request.

View Table Image

Information about variability at the DNA level in Native Americans is much more limited, but allele frequencies for 13 STRP loci, namely, CSF1PO, D3S1358, D5S818, D7S820, D8S1179, D13S317, D16S539, D18S51, D21S11, FGA, TH01, TPOX, and VWA, were available for one Carib (Wai Wai, n = 29), one Ge (Xavante, n = 33), and three Tupi (Gavio, Suru, and Zor, total n = 77) tribes (Hutz et al. 2002). An additional Maipurean tribe, the Wayuu, had been studied in relation to STRPs (Guarino et al. 1999), but because the number of loci determined did not match the above set this sample was not considered in the study.

All the calculations necessary to test the different hypotheses were done with the COVAR.EXE and TREES.EXE programs, kindly provided by J. C. Long of the Department of Human Genetics and Center for Statistical Genetics, University of Michigan Medical School. For these softwares, inputs of the allele frequencies and the desired tree models are introduced by the user. The programs allow for two kinds of data transformation (arcsine and geneidentity). Constant or variable rates of evolution are also available options of the programs.

Table 1 presents the number of individuals and of samples tested for the classical systems in populations of the four linguistic families (see appendix for list). There is wide variation in the number of individuals or samples studied for each of the 37 genetic systems considered, but the averages ranged from 8 samples and 1,355 subjects (Ge) to 14 samples and 2,560 individuals (Carib). The maximum number of individuals considered was 17,083, and each language family was represented by the weighted average of the gene frequencies obtained in the samples. The populations were spread all over the continent (fig. 2) without obvious differences in range, although the Maipure and Carib occupy a more northern area.

Fig. 2.

Geographic distribution of the populations studied for 37 classical genetic systems, according to linguistic classification. Triangles, Maipure; circles, Carib; crosses, Ge; inverted triangles, Tupi.

In comparing the people of these four language families, it is important to verify that they do not significantly differ in relation to interethnic admixture. The relevant data are presented in table 2. The calculations were performed using three methods. The one developed by Szathmary and Reed (1978) relies on individual markers that are present in nonIndian ethnic groups only. Long and Smouses (1983) estimates the amount of admixture using a weightedleastsquares solution with multinomial sampling schemes considered in all populations involved. Chakraborty (1985) estimates the contribution of the parental populations to the hybrid based on the expected relationship between gene identities (in Neis [1972] sense) among populations. Different combinations of systems were used depending on the particular characteristics of the method employed. The estimates obtained with the different approaches do not differ significantly, indicating an admixture rate of at most 7%.

Table 2

Estimated Genetic Contribution (G) and Poisson Distribution

95

% Confidence Intervals of Different Parental Populations to Four South American Linguistic Groups as Measured by Three Alternative Methods and Different Markers

Method Proportion of Admixture in Different Language Groups
Maipure Carib Ge Tupi
Szathmary and Reed (1978)        
8 informative systemsa        
Native 0.962 (0.9550.967) 0.948 (0.9430.953) 0.978 (0.9730.982) 0.980 (0.9760.984)
European 0.038 (0.0330.045) 0.052 (0.0470.057) 0.022 (0.0180.027) 0.020 (0.0160.024)
10 informative systemsb        
Native 0.975 (0.9710.979) 0.959 (0.9550.963) 0.987 (0.9840.990) 0.980 (0.9760.983)
African 0.025 (0.0210.029) 0.041 (0.0370.045) 0.013 (0.0100.016) 0.020 (0.0170.024)
Long and Smouse (1983)        
15 polymorphic systemsc        
Native 0.986 (0.9661.000) 0.979 (0.9501.000) 0.992 (0.9571.000) 0.993 (0.9711.000)
African 0.014 (0.0000.034) 0.021 (0.0000.050) 0.008 (0.0000.043) 0.007 (0.0000.029)
Chakraborty (1985)        
29 systemsd        
Native 0.980 (0.9291.000) 0.979 (0.9361.000) 0.934 (0.8071.000) 0.980 (0.8881.000)
European 0.000 (0.0000.000) 0.008 (0.0000.039) 0.066 (0.0000.193) 0.000 (0.0000.000)
African 0.020 (0.0000.071) 0.013 (0.0000.074) 0.000 (0.0000.000) 0.020 (0.0000.112)

a ABO, ACP, AK, G6PD, GM, KELL, LUTHERAN, and RH, in a total of 10 markers.

b ABO, CA2, CP, GM, G6PD, HBA, LUTHERAN, PEPA, PGD, and RH, in total of 12 markers.

c ACP, DI, ESD, DUFFY, GC, GLO, GM, HP, KIDD, KM, LEWIS, MNSs, P, PGM1, and RH. No trihybrid admixture was identified with this method.

d Those listed in table 1 except ALB, CHE1, ESA, GPI, HBA2, HBFET, HLAC, and MDH.

View Table Image

Results

The P values obtained in the tests of the hypotheses of Greenberg, Loukotka, and Rodrigues using classical genetic systems data are given in table 3. Three approaches were used in the analysis: (1) the more comprehensive set of 37 genetic systems; (2) the 18 more polymorphic loci (namely, those in which the frequency of the most common allele is less than 0.90), to minimize the effect of genetic drift; and (3) a more reduced set of 15 systems, designed to reduce both the effects of drift and the eventual effect of selection on HLA. For each set of data, four alternative calculations were performed according to the type of gene frequency transformation (arcsine or geneidentity) and the evolutionary rate (variable or constant) assumed.

Table 3

P Values Obtained in Testing Three Alternative Patterns of Relationships among South American Native Linguistic Families Using Blood Group and Protein Genetic Data

Genetic Systems and Data Transformation Considered Proposed Linguistic Relationship and Evolutionary Rate
Greenberg Loukotka Rodrigues
Variable Constant Variable Constant Variable Constant
All 37 systemsa            
Arcsine 0.4207 0.0099 0.4777 0.4756 0.5993 0.0202
Gene identity 0.4012 <0.0001 0.4835 0.0515 0.5270 <0.0001
18 polymorphic systemsb            
Arcsine 0.3480 0.0044 0.3650 0.4491 0.7036 0.0068
Gene identity 0.3780 0.0002 0.6653 0.2287 0.4532 0.0003
15 polymorphic systems(except HLA)c            
Arcsine 0.0203 0.0024 0.0214 0.0450 0.1489 0.0003
Gene identity 0.0051 0.0003 0.0108 0.0155 0.1085 0.0001

a The genetic systems are listed in table 1.

b ACP, DIEGO, DUFFY, ESD, GC, GLO, GM, HLAA, HLAB, HLAC, HP, KIDD, KM, LEWIS, MNSs, P, PGM1, RH.

c ACP, DIEGO, DUFFY, ESD, GC, GLO, GM, HP, KIDD, KM, LEWIS, MNSs, P, PGM1, RH.

View Table Image

The null hypothesis was rejected in 15 of 18 tests assuming constant evolutionary rates. There is a poor fit for all three alternative linguistic proposals when the genetic data are considered (P values ranging from < 0.0001 to 0.045 plus a borderline case of 0.052). This observation agrees with previous results obtained for such markers in South American natives (e.g., Bortolini et al. 1998), suggesting that the evolutionary rate is variable for different groups.

When the tests were done assuming variable rates, however, a better fit was observed irrespective of the type of transformation (arcsine or geneidentity) used in the allele frequencies. For the first and second sets of data, none of the three linguistic hypotheses is rejected (P values ranging from 0.348 to 0.704), but a different picture is found when the third set of genetic systems is considered. Now both Greenbergs and Loukotkas proposals are rejected at the 0.05 level of significance, irrespective of the type of transformation used in the gene frequencies (P values ranging from 0.005 to 0.021). Rodriguess hypothesis is not rejected with either the arcsine (P = 0.149) or the geneidentity (P = 0.109) transformation of the data.

The agreement of the linguistic models with the much more variable 13 STRP data was examined with a limited sample of loci and tribes. In this analysis, the Carib and Gespeakers are represented by a single tribe each, while the three Tupi populations studied live in close proximity to each other and are classified in the same linguistic subgroup, TupiMond. Information about all these systems was unavailable for the Maipure, preventing us from distinguishing between the hypotheses proposed by Loukotka and Rodrigues. Greenbergs model assuming constant rates of evolution was rejected for both types of transformation (P from < 0.0001 to 0.005); none of the other models (Greenbergvariable rate, Loukotka/Rodriguesvariable rate, Loukotka/Rodriguesconstant rate) were rejected (P values ranging from 0.184 to 0.932)

To investigate the agreement of tree models other than the three considered here, we tested the hypotheses for the remaining 12 using the third set of classical genetic systems and the arcsine transformation. For all 12 tree models, the P values were larger than that obtained for Greenbergs model (P values above 0.021); 11 figures exceeded P value observed for Loukotkas tree (P above 0.021) and 6 were larger than P = 0.149, the figure observed for Rodriguess proposal. Incidentally, trees #14 and #15 corresponded to the dendrograms obtained using two wellknown genetic distances and the neighborjoining algorithm. These models were not rejected (P = 0.526 and 0.171, respectively), which means that there is a tree that fits the genetic data (the P values for the same models using 37 genetic systems were respectively 0.593 and 0.571).

The same calculations were done for the STRP data, available for three language families only, Carib, Ge, and Tupi. Now there was only one additional arrangement shaping bifurcating trees for these language families, one that linked Tupi and Ge, with Carib in a more distant position. For this pattern, not envisaged by these linguists, the P values obtained with the arcsine and geneidentity transformations were 0.667 and 0.685, respectively.

Discussion and Conclusions

Analogies between linguistic and genetic variability should be performed with caution. For instance, the type of variation differs. The biological inheritance is Mendelian (it follows strict parentoffspring connections), while language transmission is Lamarckian, that is, it can be transmitted both vertically (parents to children) and horizontally (between unrelated people). Status differences (economic, political) in the carrier(s) of innovation also play an important role in linguistic evolution, an extreme aspect of this phenomenon being the subjection of a whole group by another, with the forced total replacement of a language. The rate of change of genetic and linguistic characteristics is therefore much faster in the latter than in the former.

Despite these differences, there are factors common to linguistic and genetic evolution that are responsible for the congruence observed in many cases. Human populations, in spreading, often split. The resulting isolation of two or more groups favors diversification of both genes and phonemes, and the degree of this diversification is generally a function of the time elapsed since separation.

Of the worlds 5,0006,000 languages, some 150 separate stocks have been identified in the Americas (Nichols 1997); there are 48 families and 70 isolated languages in the region comprising lower Central America plus South America (J. Sherzer, http://www.ailla.utexas.org/site/welcome.html). In relation to the four South American linguistic families considered here, Rodrigues (1985) has argued in favor of a strong TupiCarib relationship, relying on phonological correspondences of consonants and vowels, as well as TupiCarib cognates related to personal affixes, case and other affixes, grammatical particles and words, kinship terms, body parts and parts of plants, elements of nature, qualities, cultural and noncultural items, actions and states, animal and plants.

The bifurcating pattern of splits supposed for both genetic and linguistic evolution prompted us to employ a method for treeness testing developed by CavalliSforza and Piazza (1975) to measure the degree of adequacy of different patterns of linguistic relationships to the available genetic data. This method is based on the assumption that populations that split according to a given pattern of fission and evolve independently after the fission have a patterned variancecovariance matrix that reflects the history of these fissions. Multivariate analysis techniques such as this one can be used to test whether an observed matrix is compatible with a given tree, using any kind of quantitative data. The assumption that the genetic data themselves exhibit treeness is important, because if they do not, there is no ground for testing models suggested by other kinds of data. The treeness of the genetic markers considered in this study was evaluated and not rejected.

In the analysis of the subset of genetic systems selected to minimize the effects of genetic drift and selection in large samples of four lowland language families, Rodriguess hypothesis was the only one not rejected. Other possible tree arrangements representing the relationships among the language families not considered by the three linguists were identified but are irrelevant to the present inquiry.

Greenberg (1987) classified all Native American languages into three large groups: EskimoAleut (accepted by other linguists), NaDene (the position of Haida is disputed), and Amerind (mostly rejected [Campbell 1997:326]). The idea of an Amerind unit has been criticized by several linguists (e.g., Campbell 1997 and references therein) and by geneticists as well (Bolnick et al. 2004, on the basis of Ychromosome variation data). Hunley and Long (2005) have recently reanalyzed the relationship between genetics and Greenbergs language classification of Native North American populations using mitochondrial DNA data and ended by rejecting Greenbergs classification. Loukotkas methodology has been criticized by Campbell (1997:81), but Terrence Kaufman (cited by him) found his classification practically errorfree as far as genetic groupings are concerned. Rodriguess proposals do not seem to have received much criticism.

The interpretation of these analyses should be as follows: Genetic data support other tree configurations besides those proposed by Greenberg, Loukotka, and Rodrigues. This is not surprising, because different kinds of data may produce somewhat different estimates of the history of the populations. Among the three possibilities proposed by these scholars, that of Rodrigues has the best genetic support.

Appendix Tribes Included In The Present Study

Linguistic Family and Tribe No. of Samples
1. Maipure  
Arawak 3
Baniwa 1
Campa 2
Goajiro 4
Guahibo 3
Iana Indians 1
Mehinaku 1
Palikour 2
Paraujano 1
Piro 1
Tariana 1
Wapishana 5
Waur 1
2. Carib  
ApalaiWayana 1
Arara 1
Bacairi 1
Carib 6
Galibi 2
Macushi 5
Makiritare 2
Panare 1
Pemon 2
Pijao 3
Tiriyo 3
Wai Wai 1
Wayana 6
Yabarana 1
Yupa 4
3. Ge  
Cayapo 8
Kaingang 5
Krah 1
Xavante 2
Xokleng 1
4. Tupi  
Arawet 1
AsuriniKoatinemo 1
AsuriniTrocar 1
AwGuaj 1
Cayu 1
Chiriguano 1
Cinta Larga 1
Emerillon 1
Gavio 1
Guarani 2
Guarayu 1
Guayaki (Ach) 2
Karitiana 1
Mundurucu 2
Parakan 2
Poturujara 1
SaterMaw 1
Sirion 2
Suru 1
Tapiete 1
Tenharim 1
UrubuKaapor 1
Wayampi 3
Zor 1

References Cited

  • Barbujani, G. 1997. DNA variation and language affinities. American Journal of Human Genetics 61:101114.

  • Bateman, R., I. Goddard, R. OGrady, V. A. Funk, R. Mooi, W. J. Kress, and P. Cannell. 1990 a. Speaking of forked tongues: The feasibility of reconciling human phylogeny and the history of language. current anthropology 31:124.

  • . 1990 b. On human phylogeny and linguistic history: Reply to comments. current anthropology 31:17783.

  • Black, F. L., F. M. Salzano, L. L. Berman, Y. Gabbay, T. A. Weimer, M. H. L. P. Franco, and J. P. Pandey. 1983. Failure of linguistic relationships to predict genetic distances between the Waipi and other tribes of Lower Amazonia. American Journal of Physical Anthropology 60:32735.

  • Bolnick, D. A., B. A. Shook, L. Campbell, and I. Goddard. 2004. Problematic use of Greenbergs linguistic classification of the Americas in studies of Native American genetic variation. American Journal of Human Genetics 75:51923.

  • Bortolini, M. C., C. Baptista, S. M. CallegariJacques, T. A. Weimer, and F. M. Salzano. 1998. Diversity in protein, nuclear DNA, and mtDNA in South Amerindsagreement or discrepancy? Annals of Human Genetics 62:13345.

  • CallegariJacques, S. M., and F. M. Salzano. 1989. Genetic variation within two linguistic Amerindian groups: Relationship to geography and population size. American Journal of Physical Anthropology 79:31320.

  • CallegariJacques, S. M., F. M. Salzano, J. Constans, and P. Maurieres. 1993. Gm haplotype distribution in Amerindians: Relationship with geography and language. American Journal of Physical Anthropology 90:42744.

  • Campbell, L. 1997. American Indian languages: The historical linguistics of Native America. New York: Oxford University Press.

  • CavalliSforza, L. L. 1997. Genes, peoples, and languages. Proceedings of the National Academy of Sciences, U.S.A. 94:771924.

  • CavalliSforza, L. L., and A. Piazza. 1975. Analysis of evolution: Evolutionary rates, independence, and treeness. Theoretical Population Biology 8:12765.

  • CavalliSforza, L. L., A. Piazza, P. Menozzi, and J. Mountain. 1988. Reconstruction of human evolution: Bringing together genetic, archaeological, and linguistic data. Proceedings of the National Academy of Sciences, U.S.A. 85:60026.

  • CavalliSforza, L. L., E. Minch, and J. L. Mountain. 1992. Coevolution of genes and languages revisited. Proceedings of the National Academy of Sciences, U.S.A. 89:562024.

  • Chakraborty, R. 1985. Gene identity in racial hybrids and estimation of admixture rates, in Genetic microdifferentiation in human and other animal populations. Edited by Y. R. Ahuja and J. V. Neel. New Delhi: Indian Anthropological Association.

  • Fagundes, N. J. R., S. L. Bonatto, S. M. CallegariJacques, and F. M. Salzano. 2002. Genetic, geographic, and linguistic variation among South American Indians: Possible sex influence. American Journal of Physical Anthropology 117:6878.

  • Greenberg, J. H. 1987. Language in the Americas. Stanford: Stanford University Press.

  • Guarino, F. D., L. Federle, R. A. H. Van Oorschot, I. Briceno, J. E. Bernal, S. S. Papiha, M. S. Schanfield, and R. J. Mitchell. 1999. Genetic diversity among five Native American tribes of Colombia: Evidence from nine autosomal microsatellites, in Genomic diversity: Applications in human population genetics. Edited by S. S. Papiha, R. Deka, and R. Chakraborty. New York: Kluwer Academic/Plenum Publishers.

  • Hunley, K., and J. C. Long. 2005. Gene flow across linguistic boundaries in Native North American populations. Proceedings of the National Academy of Sciences, U.S.A. 102:131217.

  • Hutz, M. H., S. M. CallegariJacques, S. E. M. Almeida, T. Armborst, and F. M. Salzano. 2002. Low levels of STRP variability are not universal in American Indians. Human Biology 74:791806.

  • Long, J. C., and P. E. Smouse. 1983. Intertribal gene flow between the Yecuana and Yanomama: Genetic analysis of an admixed village. American Journal of Physical Anthropology 61:41122.

  • Loukotka, C. 1968. Classification of South American Indian languages. Los Angeles: Latin American Center, University of California.

  • Nei, M. 1972. Genetic distance between populations. American Naturalist 106:28392.

  • Nei, M., F. Tajima, and Y. Tateno. 1983. Accuracy of estimated phylogenetic trees from molecular data. 2. Gene frequency data. Journal of Molecular Evolution 19:15370.

  • Nettle, D., and L. Harris. 2003. Genetic and linguistic affinities between human populations in Eurasia and West Africa. Human Biology 75:33134.

  • Nichols, J. 1997. Modeling ancient population structures and movement in linguistics. Annual Review of Anthropology 26:35984.

  • . 2002. The first American languages, in The first Americans: The Pleistocene colonization of the New World. Edited by N. G. Jablonski. San Francisco: California Academy of Sciences.

  • Poloni, E. S., O. Semino, G. Passarino, A. S. SantachiaraBenerecetti, I. Dupanloup, A. Langaney, and L. Excoffier. 1997. Human genetic affinities for Ychromosome P49a,f/TaqI haplotypes show strong correspondence with linguistics. American Journal of Human Genetics 61:101535.

  • Rodrigues, A. D. 1985. Evidence for TupiCarib relationships, in South American Indian languages: Retrospect and prospect. Edited by H. E. M. Klein and L. R. Stark. Austin: University of Texas Press.

  • . 1994. 2d edition. Lnguas brasileiras: Para o conhecimento das lnguas indgenas. So Paulo: Edies Loyola.

  • . 2000. GePanoCarib x JTupiKarib: Sobre relaciones lingsticas prehistricas en Sudamerica, in Actas del I Congreso de Lenguas Indgenas de Sudamrica. Edited by L. Miranda. Lima: Universidad Ricardo Palma.

  • Roychoudhury, A. K., and M. Nei. 1988. Human polymorphic genes: World distribution. Oxford: Oxford University Press.

  • Saitou, N., and M. Nei. 1987. The neighborjoining method: A new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:40625.

  • Salzano, F. M., J. V. Neel, H. Gershowitz, and E. C. Migliazza. 1977. Intra and intertribal genetic variation within a linguistic group: The Gespeaking Indians of Brazil. American Journal of Physical Anthropology 47:33747.

  • Spielman, R. S., E. C. Migliazza, and J. V. Neel. 1974. Regional linguistic and genetic differences among Yanomama Indians: The comparison of linguistic and biological differentiation sheds light on both. Science 184:63744.

  • Szathmary, E. J. E., and T. E. Reed. 1978. Calculation of the maximum amount of gene admixture in a hybrid population. American Journal of Physical Anthropology 48:2934.

  • Trask, R. L. 1996. Historical linguistics. London: Arnold.

  • Urban, G. 1992. A histria da cultura brasileira segundo as lnguas nativas, in Histria dos ndios no Brasil. Edited by M. Carneiro da Cunha. So Paulo: Companhia das Letras.

  • Urbanek, M., D. Goldman, and J. C. Long. 1996. The apportionment of dinucleotide repeat diversity in Native Americans and Europeans: A new approach to measuring gene identity reveals asymmetric patterns of divergence. Molecular Biology and Evolution 13:94353.

  • Vogel, F., and A. G. Motulsky. 1996. Human genetics: Problems and approaches. Berlin: SpringerVerlag.