Wednesday, March 14, 2012

Afro-Asiatic Languages and Uniparental Genetic Markers

There are several major groups of languages within the Afro-Asiatic languages: Semitic (e.g. Arabic and Hebrew), Chadic (e.g. Hasua), Berber (e.g. Tuareg), Omotic, Cushitic, and Coptic (i.e. ancient Egyptian). Discerning the pre-history of these languages is not easy. Neither language features, nor genetics offer a comprehensive pan-linguistic family answer. This post looks at Afro-Asiatic uniparental genetic markers (and was mostly written about six months ago, but never posted as it awaited some editing and polishing).

An analysis of autosomal Afroasiatic genetics appears in a recent post at Ethio Helix. In the Principle Component Analysis done for this dataset, "Component 1 separates Berber/Semitic/Egyptian speakers from Chadic speakers, with Ethiopian Semitic/Cushitic speakers plotting somewhere in between, but closer to the former in this separation. Component 2, separates Ethiopians+Egyptians from the rest. Component 3 Separates the Mozabites from the Rest, with Ethiopians again retaining an intermediate position." Looking at an Admixture analysis with K=5 in this data set on a chart the separates clusters in two dimensions whose distance from each other is based on FsT distance, "The biggest separation for both Axis is for the cluster I nicknamed Cushitic, while the Berber, Semitic and Mozabite clusters appear pretty close, with the Mozabites looking a bit isolated." Berber and Egyptian are close; Cushitic and Chadic are roughly as far from each other and the Egyptian-Berber-Mozabite cluster is from the Cushitic cluster.

Honestly, autosomal data, rather than clarifying the relationships, seems mostly just to muddy the waters further. The notion that Egyptians are more closely linked to Ethiopian populations than non-Ethiopian populations does emerge, however. Also, Egyptians do seem to be more genetically diverse than some of the other populations autosomally. They seem to have genetic components that don't overlap with other populations including the main "Egyptian" component, at a fairly high frequency.

Chadic Languages and Y-DNA R1b-V88

Chadic language speaking men have the most distictive genetic profile.

Although human Y chromosomes belonging to haplogroup R1b are quite rare in Africa, being found mainly in Asia and Europe, a group of chromosomes within the paragroup R-P25(*) are found concentrated in the central-western part of the African continent, where they can be detected at frequencies as high as 95%. Phylogenetic evidence and coalescence time estimates suggest that R-P25(*) chromosomes (or their phylogenetic ancestor) may have been carried to Africa by an Asia-to-Africa back migration in prehistoric times. Here, we describe six new mutations that define the relationships among the African R-P25(*) Y chromosomes and between these African chromosomes and earlier reported R-P25 Eurasian sub-lineages. The incorporation of these new mutations into a phylogeny of the R1b haplogroup led to the identification of a new clade (R1b1a or R-V88) encompassing all the African R-P25(*) and about half of the few European/west Asian R-P25(*) chromosomes. A worldwide phylogeographic analysis of the R1b haplogroup provided strong support to the Asia-to-Africa back-migration hypothesis. The analysis of the distribution of the R-V88 haplogroup in >1800 males from 69 African populations revealed a striking genetic contiguity between the Chadic-speaking peoples from the central Sahel and several other Afroasiatic-speaking groups from North Africa. The R-V88 coalescence time was estimated at 9.2-5.6 [corrected] kya, in the early mid Holocene. We suggest that R-V88 is a paternal genetic record of the proposed mid-Holocene migration of proto-Chadic Afroasiatic speakers through the Central Sahara into the Lake Chad Basin, and geomorphological evidence is consistent with this view.

From Cruciani, et al., "Human Y chromosome haplogroup R-V88: a paternal genetic record of early mid Holocene trans-Saharan connections and the spread of Chadic languages." Eur J Hum Genet. 2010 Jul;18(7):800-7. Epub 2010 Jan 6.

I am deeply skeptical of the data suggested (Dienekes' expedent of dividing estimates by three seems more on target and the entire business of dating Y-chromosomes by mutation rates is suspect), but the linguistic-genetic link is clear.

What is notable about this result?

As one of the five major Afro-Asiatic language families, the origins of this language family are important to understanding the origin of the Afro-Asiatic languages generally, a major unresolved issue.

The genetic case is strong that Chadic has its roots in Asia to Africa backmigration. The low levels of admixture between Chadic and non-Chadic speakers until very recently (the Fulani and Hasua are in the process of merging in ethnic identity to some extent right now), suggests that this relatively genetically distinct population is relatively new to the region relative to other languages.

Since language tends to track Y-DNA more than mtDNA, the strong Y-DNA traces suggest an origin for Chadic languages outside of Africa.

This branch of the Y-DNA R haplogroup is quite basal, so dating this split also puts a minimum date on expansion of the rest of the Y-DNA R haplogroup which is predominant in much of West Eurasia. The Y-DNA haplogroup R1b common among Chadic language speakers in the V88 subtype, however, is generally associated with Western Europe and tenatively, with Indo-Europeans, although the break of the African Y-DNA haplogroup V88 from other R1b haplogroups found in Europe would have happened a very long time ago.

Still, how any R1b haplogroup end up in a linguistically Afro-Asiatic population in the Sahel? This is a mystery.

R1b is not commonly found in Egyptians, Ethiopians, or Berbers.

Indo-Europeans are not attested historically anywhere in Mesopotamia, the Levant, or North Africa until the classical Greeks and Romans expand except at the fringes of the Hittite Empire of the Bronze Age.

Indeed, there is a plausible arguement that there weren't even any Indo-Europeans (and quite possibly very few men with Y-DNA R1b haplogroups) in Western European until the early Bronze Age, with Indo-Europeans possibly arising with the Bell Beaker culture. The Central European Neolithic, for example, up the Danube, is associated with Y-DNA haplogroup R1a (which is absent in Chadic populations), not R1b.

Anatolia is associated with Y-DNA haplogroup J2, which is absent in Chadic populations. South Semitic peoples are associated with Y-DNA haplogroup J1, which is absent in Chadic populations. One would think that migrations to Africa from either area would have included J2 or J1 in the Y-DNA mix of the population, although founder effects with a small founding population of related men could account for their absence as well.

R1b is not exclusively Indo-European, as it is found in the linguistically non-Indo-European Basque people (whose high levels of lactose tolerance suggest a pastrolist dairy utilizing economy for their ancestors), and in Central Asia (which was genetically and in phenotype basicallly European as far as the fringes of Mongolia in the early Bronze Age), but no one suggests that the Chadic and Basque languages have any connection whatsoever.

Anatolia and the Dead Sea are plausible sources based on the presence of basal Y-DNA haplogroup R1b types there. But, we don't really know how or when the genetic outlier pocket of population in the Dead Sea area, for example, ended up there. The population of the Dead Sea, for example, could be derived from refuguees from someplace else who finally found a place to settle in the Dead Sea area at some undocumented point of time in ancient times.

Overview of Y-DNA in Afro-Asiatic Linguistic Families

There are population genetic links to Eurasia in many of Afro-Asiatic populations. On the Y-DNA side, Y-DNA haplogroups T (common in Egypt, Ethiopia and Somolia), J1 (associated with Arabia and other Semitic populations including Ethio-Semitic populations), and R1b-V88 (associated with Chadic language populations) all probably have origins outside Africa. On the mtDNA side, haplogroups M1 and U6 are both probably Eurasian in origin. There are mtDNA similarities between Cushitic and Chadic populations, although their Y-DNA profiles are quite distinct. Also associated with Afro-Asiatic languages is Y-DNA haplogroup Eb3 associated strongly with Egypt and probably having an African origin.

As we look at Afro-Asiatic liguistic origins it looks like Chadic is an outlier genetically within the Afro-Asiatic linguistic family, but each of the Afro-Asiatic lingustic families has a pretty distinctive uniparental genetic marker profile. As Wikipedia notes (footnotes omitted):

The migration of Afroasiatic languages from their original homeland is often thought to have also involved the movements of significant numbers of people. Therefore, attempts have been made to associate Afroasiatic language groups with genetic markers.

The most commonly cited genetic marker in recent decades has been the Y chromosome, which is passed from father to son along paternal lines in un-mixed form, and therefore gives a relatively clear definition of one human line of descent from common ancestors.

Several branches of humanity's Y DNA family tree have been proposed as having an association with the spread of Afroasiatic languages.

Frequency of haplogroup E1b1b in select Afro-Asiatic speakers
Language (Region where historically spoken) Frequency
Cushitic 32–81%
Egyptian languages 36–60%
Berber languages 40–91%
Semitic languages 7–29%
Omotic languages 50%

1. Haplogroup E1b1b is thought to have originated in East Africa. In general, Afroasiatic speaking populations have relatively high frequencies of this haplogroup, with the notable exception of Chadic speaking populations. Christopher Ehret and Shomarka Keita have suggested that the geography of the E1b1b lineage coincides with the distribution of Afroasiatic languages.

2. Haplogroup J1c3 (Y-DNA), formally known as "J1e", is actually a more common paternal lineage than E1b1b in most Semitic speaking populations, but this is associated with Middle Eastern origins and has apparently been spread from there after the original dispersion of Afroasiatic.

3. Haplogroup R1b1a (R-V88), and specifically its sub-clade R-V69, has a very strong relationship with Chadic speaking populations. Unlike other Afroasiatic speakers, Chadic language speakers have low frequencies of Haplogroup E1b1b.

Frequency of Sub-Haplogroup R1b1a in select Afro-Asiatic speakers
Chadic languages -- 28.6-95.5%
Berber languages -- 0-26.9%
Semitic languages -- 0-40%
Egyptian languages -- About 14% of Sudanese Copts had R1b although they were not typed for the V88 marker which defines R1b1a

This was announced in 2010 by Cruciani et al. The majority of R-V88 was found in northern and central Africa, in Chadic speaking populations. It is less common in neighbouring populations.

The authors of Cruciani, et al. (2010) also found evidence of high concentration in Western Egypt and evidence that the closest related types of R1b are found in the Middle East, and to a lesser extent southern Europe. They proposed that an Eastern Saharan origin for Chadic R1b would agree with linguistic theories such as those of Christopher Ehret, that Chadic and Berber form a related group within Afroasiatic, which originated in the area of the Sahara.

The genetics make a quite strong case that Semitic languages are Southwest Asian in origin, probably Levantine, and that Chadic peoples have at least patriline genetic origins outside Africa and are relatively recent arrivals in Africa through male dominanted migrations seem quite convincing. But, this doesn't resolve the question of Afro-Asiatic linguistic origins, because the uniparental genetic markers don't point to a single common genetic origin for Afro-Asiatic language speakers across all of the subfamilies that make up the larger grouping.

Y-DNA Haplogroups T in Afro-Asiatic Linguistic Populations

There is also a good case that Y-DNA haplogroup T, found in a somewhat hard to characterize mix of populations including many Afro-Asiatic populations, is a backmigration to Africa from Southwest Asia. The Wikipedia survey of its frequency in Africa shows 26 instances of non-Afro-Asiatic peoples: 15 (of 256) in the Spanish Canary Islands that trace roots to North Africa, 6 (of 34) in the Bantu speaking South African Lemba who claim to have have Jewish roots, and 3 (of 17) of the Niger-Congo speaking Fulbe in Northern Cameroon right on the Afro-Asiatic and Niger-Congo population boundary in Africa, with the remaining seven instances involving three isolated cases in small Bantu populations, and four involving small percentages in Nilotic populations, each in samples mostly on the Eastern Coast of Africa which would have been exposed to Y-DNA T rich Somolians in the sea trade or near the Western boundary of Afro-Asiatic populations that are relatively rich in Y-DNA T. There were 108 from Afro-Asiatic populations.

Wikipedia on Haplogroup T also notes that:

Since haplogroup T is not associated with the R1, G and J lineages that entered Africa from Eurasia relatively recently, Luis et al. (2004) suggest that the presence of the clade on the African continent may, like R1* representatives, point to an older introduction from Asia.

The Levant rather than Southern Arabia appears to have been the main route of entry, as the Egyptian and Turkish haplotypes are considerably older in age (13,700 ybp and 9,000 ybp, respectively) than those found in Oman (only 1,600 ybp).

According to the authors, the spotty modern distribution pattern of haplogroup T within Africa may therefore represent the traces of a more widespread early local presence of the clade. Later expansions of populations carrying the E1b1b, E1b1a, G and J NRY lineages may have overwhelmed the T clade-bearers in certain localities.

Egypt, Ethiopia and Somolia are the dominant African locations of Y-DNA haplogroup T in Africa. The dispersal pattern of Y-DNA haplogroup T is far less clear cut than in Y-DNA R-V88 suggesting a deeper time depth. There are only a handful of instances apart from the Canary Islands, that are West of Egypt. It is not really a characteristic Berber marker (the one Berber population that showed it was in Lower Egypt and it was also found in Bengahzi near the Egyptian border with Libya). Instead, it marks Coptic, Cushitic and Omotic populations.

Y-DNA Haplogroups E1b1b in Afro-Asiatic Linguistic Populations

All of the Y-DNA T rich populations are also rich in Y-DNA E1b1b.

The Berbers, who lack much Y-DNA T, are even more rich in E1b1b.

On the other hand, E1b1b seems to have deep origins in Africa. Y-DNA haplogroup E is the dominant black African haplogroup (excluding Khoisan and Pygmy populations) where E1b1a (particularly associated with West Africa), E1a (particularly in West Africa and Sudan) and E2 (pan African, but especially Eastern and Southern Africa) are particularly common. There have been proposals for Y-DNA haplogroup E as back migrations, but I find them highly implausible.

Specific E1b1b subhaplogroups have associations with different groups. E1b1b1b (E-V257) is associated with Berbers. E1b1b1c1 (E-M34) is mostly Semitic. E1b1b1d (E-V6), E1b1b1f (E-V42), E1b1b1g (E-V92), and E1b1b2 (E-V16/E-M281) are all associated with Ethiopia. E1b1b1e* is found in Southern and Eastern Africa. Roots in Ethiopia or the vicinity seem likely.

Afroasiatic mtDNA Evidence

In contrast to the evidence from paternally inherited Y DNA, where Cushitic and Chadic linguistic populations have very different profiles from each other, a recent study has shown that a branch of mitochondrial haplogroup L3 links the maternal ancestry of Chadic speakers from the Sahel with Cushitic speakers from East Africa.

Other mitochondrial lineages that are associated with Afroasiatic include mitochondrial haplogroups M1 and haplogroup U6. Gonzalez et al. 2007 suggest that Afroasiatic speakers may have dispersed from East Africa carrying the subclades M1a and U6a1.

mtDNA haplogroups M1 and U6, and Y-DNA hapologroup R-V88 and J1 are back migrations from Southwest Asia when found in Africa, and there are also clearly distinctly non-African genetic signatures in Ethiosemitic language speaking populations. Ethiosemitic languages probably all have a common origin very early in the historic era.

No comments: