Sons of the Indus The

Source: here.

Sons of the Indus: The Indians

The past decade has seen a revolution in archaeogenetics. Access to autosomal DNA from our pre-historic hominid forefathers has served as the final piece of the puzzle in many a questions about our own ethno-cultural genesis. This is partly due to how much cheaper it has become to sequence a genome due to advances in technology from the initial days of the HGP (from $100 million / per genome in 2001 to just under $1000 in 2016~) and partly due to improvements in gathering aDNA (ancient DNA) from samples. The literature on this has piled up since 2015- and is now vast. We must make use of it and understand the origins of the Indian people. This post will just serve to tabulate the recent discoveries and collate them. The title will be self-justified once the post is read in full.

Agro-revolution

It’s beyond the scope of this post to explain all of human prehistory here but we can note one pattern. The agricultural revolution (neolithic revolution) 11,000 ybp~ (years before 1950CE~) launched a wave of intrusion of migrants {who had developed farming, domesticated animals & started producing sedentary societies} into previously purely nomadic aboriginal hunter-gatherer (HG) societies who were descendants of the paleolithic homo sapiens of the region.+++(5)+++ This led to farmers eventually replacing the HGs or mixing with them to spread agriculture, techology language, culture and religion as part of the “Neolithic Package ” – we see this happen in Europe with farmers from Asia Minor replacing the aboriginal HGs starting ~7000BCE & we see this all over Asia (Yellow River farmers expanding into Tibet or Yangtze River farmers expanding into SE-Asia, carrying the Austroasiatic languages to the region or even Japan with a three way admixture of Yayoi & Kofun period farmers with Jomon-like HGs).

HGs

India was no exception here. The aboriginal1 HGs of the country are now termed as the “AASI” (Ancient Ancestral South Indian) and are presumedly the descendants of paleolithic homo sapiens of the region. Their lineage is deeply diverged from other Eurasian lineages and is very closely related to the HG populations of SE-Asia (Hoabinhian HG - Vietnam), the aboriginals of Australia and the group that gave rise to the indigenous populations of the Andaman Islands (Onge, Jarawa).+++(Not really- “AASI are not “close” to the andamanese or se asia negrito groups. divergence is 35,000 years ago or so last i saw in reich lab supplements. they are closest.”)+++ It’s important to note AASI is a “ghost population” in the sense its ancestry & cladal position is inferred and reconstructed but we don’t actually have any AASI whole genome sequences currently. As a result of this, the Onge are often used as proxy for Indian HGs or sometimes even the Paniya might fit better (A south indian tribal group with virtually no west eurasian ancestry) as this paper suggests. +++(no - “they have some west eurasian ancestry via 25% ~IVC or so.”)+++ Keep in mind, Indian HGs are not the same group that spawned the Onge or even SEA HGs but are closely related to them. So at best, we have imperfect proxies or reconstructions of supposed AASI ancestry. This will do for now. The harsh climate of India that degrades aDNA & poor archaeology makes it much harder. Combine this with crematory practices and you can begin seeing why this is a big problem for us. AASI ancestry is found in most Indian groups across a North-South cline and a caste-based cline that transcends and even breaks regional boundaries. So we have horizontal differentiation as well as vertical. Later on, we will see the same pattern when it comes to Steppe Pastoralist ancestry in India.

Fig 2. A Paniya laborer from Kerala. The Paniya are considered to be the best proxy of the Indian Hunter Gatherers. They speak a South Dravidian language. Indians belonging to the Dalit/Tribal caste groups get anywhere from 45-65% of their ancestry from Paniya-like people.

Farmers

We now come to the second population of the pie. The farmers. Who were the carriers of the “Neolithic Package” to India? The earliest neolithic sites in the Indian subcontinent are in Mehrgarh (7000 BCE). It has for long been thought that Neolithic migrants from Iran brought winter rainfall crop farming (wheat, barley) to India and adapted it to the monsoon season locally before it spread across the country. This notion however has been refuted in recent genetic research that sequenced a female genome from Rakhigarhi (I6113) in India belonging to the mature Indus Valley Civilization (IVC) and found that it completely lacked any Iranian Zagros farmer ancestry. This means the Iran-HG like component of the IVC lineage diverged more than 10,000 ybp from Iranian-HGs and comes from a presently unsampled population who might have always been indigenous to the region. It also means farming was likely developed independently in India.

Fig 3. Chronological representation of Iran-like ancestry and its relation to the Indus-Valley Cline.

These early farmers in the NW of the subcontinent would mix with the AHGs (AASI) and form the cline along which the Harappan population was set. (70-90% Iran like HG + 10-30% AHG). This would have been our ancestry in the mature IVC period (2600-2000 BCE).+++(4)+++

Fig 4. A reconstruction of a Harappan woman from the mature Indus Valley Civilization site at Rakhigarhi, Haryana. (4500 YBP). Most Indians receive nearly 45-55% of their ancestry from the Indus Valley people.

Steppe Pastoralist

We now come to the third (but not the final) piece of the pie. The Steppe Pastoralist Ancestry. This ancestry is also pretty ubiquitious in all Indians but varies from as low as 3% to as high as 45% of total admixture in different groups. This ancestry likely entered the subcontinent around 2000-1800 BCE in the Middle-Late Bronze Age (MLBA) from the inner mountain corridor from Central Asia leading to multiple admixture events that culminated in the Steppe migrants mixing with the IVC locals to form the modern Indian cline (1200-1300 BCE). Steppe is a vague description, so we must be specific. The Central_Steppe_MLBA ancestry has a few components. The chief component is Yamnaya_EMBA (Early bronze age) ancestry at around 65-70%. The second major component is European_EarlyFarmer (25-30%) which is further broken down into Eastern HG (Baltic) and Anatolian farmer ancestry .+++(5)+++ The final component is West Siberian Hunter Gatherer ancestry (WSHG) at around 5-8%. Let us pause and be very clear here. The Steppe component actually introduces 4 different basal lineages into the Indian subcontinent.+++(5)+++ This can be rationalized as the fact that a group of Yamnaya steppe herders entered Europe, mixed with the European aboriginals (European_EarlyFarmer) and then moved to Central Asia. While on their way to India, they mixed more with Siberian HGs probably in the Pamir corridor to take on an additional lineage.

Fig. 5. A reconstruction of a Yamnaya man from Ishkinovka in the Southern Urals. (5300-4700 BP). Most upper-caste Indians get anywhere from 10-30% of their ancestry from the Yamnaya people.

All credit must go to this group for the reconstructions posted here. On twitter, they can be followed here.2

When they came to India, they mixed with the Harappans to give us a total of 6 different lineages that formed the post-Harappan population.

Tibetan and Austroasiatic

Is this all? Not so. We must look at two additional lineages that are present in many Indians today. The first one is chiefly found in some Northern (and NE) Indians. This is Tibetan Neolithic ancestry (Chokhopani late neolithic, 2700 YBP) that likely derives from archaic admixture with the Kirāṭa-like Sino-Tibetan tribesmen of Ladakh & Nepal.+++(4)+++ The second one is Austroasiatic ancestry that is only found in tribal groups such as the Munda and Ho in eastern India. This is because these groups are descendants of Austroasiatic migrants to India from around 3000-3500 YBP.

Fig 6. A picture of Tibetan people. They speak Sino-Tibetan languages and most Northern/Eastern Indians get anywhere from 3-10% of their ancestry from Neolithic Tibet.

Proportions

We now have a good picture of the ancestral lineages that make up the Indian subcontinent. What are the proportions of these lineages in modern Indians like? Let’s look at the data provided by Dr. Vagheesh Narasimhan in his most recent breakthrough paper which you can view here.

1) North West India + Pakistan

This data is directly from Dr. Narasimhan’s paper. It gives us a k = 3 admixture chart. As we can see, the Northernmost populations have around 55-65% Indus_Valley ancestry, 22-30% Western_Steppe_Herder ancestry, 10-20% AASI ancestry. The generic Punjabi sample has around 33% AASI ancestry and has been collected from non-Jatt, non-Khatri migrants in the UK. This might be reflective of the backward caste admixture of Punjab. (It actually matches perfectly the Punjabi_Lahore samples). Amongst the NW Indian groups, the Khatri have the highest Steppe admixture at around 27% and the lowest AASI admixture at 13.8%. ***

This model however does not look at the Tibetan ancestry which I thought important to check in the Northern populations. Dr. Narasimhan’s model also misses out on some groups such as the Ror of Harayana, the Kohistani, Kamboj and the Tibetans of Leh-Ladakh. Hence, we will look at a different G25 model (my own) this time and see the difference.

With this we see the Northernmost populations have around 55-60% IVC ancestry, 22-28% Sintashta ancestry, 10-15% AASI ancestry and 3-6% Tibetan ancestry on average. Notable exceptions are the Balti people of Gilgit-Balitstan and Ladakh who have 26% Tibetan ancestry, the Rors of Haryana who have around 36.5% Sintashta ancestry + 2.3% WSHG ancestry for a total of around 40% Steppe ancestry and the average Punjabi from Lahore who have nearly 40% AASI (Andamanese-like) ancestry.+++(4)+++ Note also how Kashmiri Pandits and Kohistanis have pretty high Tibetan ancestry at 6%+ which makes sense considering their proximate location to Sino-Tibetan tribals.+++(4)+++

The Baltis are a Tibetan group, this explains their elevated Tibetan ancestry. The Ror likely are descendants from a Kushan-like group mixing with local Indo-Aryans & the Punjabi from Lahore likely represent the average admixture of the lower castes (non Jat, non Ashraf) of the region.+++(4)+++

I am also of the opinion both the Pashtuns and the Kalash can be better modelled after taking excess Anatolian Farmer, Tibetan & some CHG ancestry into account. Testing this hypothesis, we got a satisfactory model which showed both Pashtun groups have a solid 7% excess Anatolian Farmer ancestry & excess 1-3% Iran_Neolithic ancestry. Bear in mind that the Central_Steppe_MLBA includes the West_SiberianHG component but is shown here as separate since we used Sintashta as our source. Even then, Steppe ancestry here is slightly underestimated, so is Paniya (AASI).

2) Northern India + Central India + West India

While there are too many populations from this region to get into it, hopefully this provides a good overview of the bulk of Indo-Aryan speaking ethnic groups of India. One can notice how savarṇa castes (first 4 castes) generally have much lower AASI ancestry than the scheduled castes (dalits) of the region. The Lohana (an upper caste community) have the lowest AASI in the region at 8.9% and a solid 25% Steppe_Herder ancestry. Gujarati Brahmins are similar to the Lohana in this regard. The Bhumihar Brahmins of Bihar have the highest steppe ancestry in the region at 28.3%.

3) Southern India + Deccan

In Southern India, we start noticing that Steppe ancestry collapses dramatically across every caste group and at the same time AASI rises up fast, reaching as high as 65% in the Palliyar of Tamil Nadu. The Brahmins of South India are left with similar but slightly lower levels of Steppe ancestry than North Indian Brahmins but they remain the only group in the region with 15-20% of Steppe ancestry. One notable exception are the Coorghi or the Kodavas, who have relatively low AASI ancestry at 25.8% coupled with shockingly high IVC ancestry at 64.4%. The Kodavas have been speculated to be “Scythians” in the colonial era due to their robust Caucasoid phenotype and martial tradition in the Kannada country. However, the likely answer is the Kodava must be direct descendants of the late Indus_Valley migrants from Gujarat who immigrated into the Deccan & Karnataka.

I took the liberty to model the Iyer Brahmins of Tamil Nadu as Prof Narasimhan’s paper did not deal with them. These are the results I got.

The highest of the Iyer samples seemed to have around 15% Sintashta ancestry while the lowest had 7% Sintashta ancestry. Overall, a fairly high percentage of AASI-like ancestry at around 40-43%.

4) Eastern India + Nepal

Contrary to popular stereotypes –– Nepalese Brahmins generally have fairly high Steppe_Herder ancestry (22-26%) and pretty low Tibetan ancestry (6-8%). Bengali Brahmins have slightly lower Steppe+Herder ancestry but much higher AASI ancestry with a slight 2% Tibetan component (as expected). Bangladeshi Bengalis have very high AASI ancestry at nearly 60% and a higher 6% Tibetan component. The Brahmins and non-Brahmin Bengalis are very far apart.+++(5)+++ Unfortunately, no Kayastha or Vaidya samples are available to us yet so we cannot test for Bhadralok distance.

Manipuri Brahmins are big outliers with 43% Tibetan and 11% South Chinese Farmer ancestry.+++(4)+++ Juang, Asur, Bhumji are Austroasiatic speaking tribal groups. Hence the South Chinese Neolithic & heavy Paniya ancestry (must’ve made first contact with unmixed AASI in eastern india).+++(4)+++ The Gond are Dravidian speakers but it’s been long suspected they switched to the language family afterwards and have Austroasiatic origins, which shows in the DNA.+++(5)+++

Conclusion

I feel we have now achieved our purpose and understood the origins of Indian ancestral lineages as well as their relative proportions in different ethnocultural groups across the country. The title of the post should make a lot of sense now. Indus_Valley ancestry is by far the most prevalent ancestry in the entire Indian subcontinent and generally makes up anywhere from 1/3 to 2/3 of our genome. It is normal for some Indian groups to have as much as 65% Indus ancestry and even at the lowest we have at least 35% Indus ancestry everywhere except the extremes of Sino-Tibetan India or in the Austroasiatic tribal lineages. Keeping this in mind, the name of our Republic is pretty apt.


A final note must be made. While I have indeed studied Human Genetics in college, I am no expert at it and certainly not a professional. I enjoy the field as a dilettante and have been interested in it since I was pretty young! If you are a professional and find some mistakes, feel free to comment. If you are not a professional and find some mistakes, feel free to comment as well :)

1

Note that aboriginal does not refer to Australian Aboriginals or any other specific group here. It only refers to the local hunter-gatherers of the region. There can be European aboriginals, Indian aboriginals or Chinese aboriginals.

2

If you run this account or group and are not okay with me reproducing your reconstructions here, please DM me or leave a comment.