What is Language Diversity?

Today I saw another article from a biologist throwing their hat into the linguistics ring. In this case it was geneticist Sarah Tishkoff who implies in the Christian Science Monitor that because humans in Africa are more genetically diverse, the languages must be too.

Tishkoff argues that “There’s just been a lot of time for cultural diversity, linguistic diversity, genetic diversity to accumulate in Africa.” At first glance this would make sense, but the reality has been that languages can easily spread independently of the gene pool. For instance, most people of African descent in the U.S. actually speak a Germanic language (i.e. English). In all of the Americas, most people of African descent speak a European language (English, Spanish, Portuguese or French or a creole based on one of these languages).

So…I will say that I (and probably linguist Salikoko Mufwene who is quoted in the article) would dispute Tioshkoff’s premise. In fact, the Christian Science Monitor mentions that the trick is “how you define diversity.”

Greenberg Index

One measure is the “Greenberg Index” which measures how the probability that any two speakers will “have a different mother tongue.” In Papua New Guinea, the number is 99% and Cameroon, the number is 97%. These are impressive figures, but they don’t measure how distinct the languages are.

In this scenario speaking Italian vs. Spanish (relatively closely related) is given the same weight as speaking Spanish vs. Basque (completely unrelated). Italy, France and Spain are European countries with more linguistic diversity than we may initially realize, but the majority languages in question are descended from Latin. This happened because most of Western Europe was within the Roman Empire, but this means that almost all pre-Roman languages in the Western Empire have been lost. The pre-Roman languages that have survived in Western Europe have been Basque and some Celtic languages. Germanic and Finnic also survived this era.

As it turns out most languages in Cameroon are all in just two language families – Niger Congo and Afro-Asiatic. So, although there are lot of languages, they are generally related. In fact, the vast majority of the Afro-Asiatic languages in Cameroon are Chadic > Biu-Mandara languages, which is a very specific group. I am by no means an Chadic expert, but I wouldn’t be surprised if some of them are as close as Spanish and Italian. Similarly, the Niger-Congo languages of Cameroon are generally in the Atlantic branch and many (169) are in the Bantoid branch. Again these languages may be close.

Language Relatedness

If we are going to truly compare linguistic diversity to genetic diversity, then we DO need to factor in how many language families are being used in a specific area. Language family represents a “line of evolution” for a proto language. The more language families in an area, the more proto languages are represented. In that respect, Africa is not especially diverse in comparison to some areas such as the Americas. If language spread were tied exclusively with genetics, then we would expect Africa to have the largest number of language families, but that is not what happens.

One interesting comparison is counting isolates (languages with no known relatives). These represent proto-languages that is not currently widespread. According to Lyle Campbell (University of Hawaii), there are about 10 isolates in Africa (vs. one in Europe), but 20 in North America, six in Mexico and 55 in South America. That’s a lot of leftover languages in the Americas. Similarly, an overview of indigenous Mexican languages shows they can be grouped into seven families (vs. seven in Cameroon). The issue is that most indigenous languages in Mexico may have smaller populations than Cameroon for political reasons.

The irony here is that the Americas are more diverse in terms of language families than Africa even though it was settled much later (and as expected has less genetic diversity than Africa). Whatever the explanation, we need to be very careful how we model the spread and evolution of language vs genetics.

