Complex societies = simple languages

At least that was my take home message from a new paper in PLoS One, Language Structure Is Partly Determined by Social Structure:

Background: Languages differ greatly both in their syntactic and morphological systems and in the social environments in which they exist. We challenge the view that language grammars are unrelated to social environments in which they are learned and used.

Methodology/Principal Findings: We conducted a statistical analysis of .2,000 languages using a combination of demographic sources and the World Atlas of Language Structures— a database of structural language properties. We found strong relationships between linguistic factors related to morphological complexity, and demographic/socio-historical factors such as the number of language users, geographic spread, and degree of language contact. The analyses suggest that languages spoken by large groups have simpler inflectional morphology than languages spoken by smaller groups as measured on a variety of factors such as case systems and complexity of conjugations. Additionally, languages spoken by large groups are much more likely to use lexical strategies in place of inflectional morphology to encode evidentiality, negation, aspect, and possession. Our findings indicate that just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. As adults learn a language, features that are difficult for them to acquire, are less likely to be passed on to subsequent learners. Languages used for communication in large groups that include adult learners appear to have been subjected to such selection. Conversely, the morphological complexity common to languages used in small groups increases redundancy which may facilitate language learning by infants.

Conclusions/Significance: We hypothesize that language structures are subjected to different evolutionary pressures in different social environments. Just as biological organisms are shaped by ecological niches, language structures appear to adapt to the environment (niche) in which they are being learned and used. The proposed Linguistic Niche Hypothesis has implications for answering the broad question of why languages differ in the way they do and makes empirical predictions regarding language acquisition capacities of children versus adults.

The paper itself is almost total gibberish in the details to me, I'm pretty ignorant of the jargon of linguistics, but the gist seems rather evident in the figures:



Languages such as English, which spread with complex and expansive political orders, seem to exhibit a tendency toward simplicity. The reason behind this is straightforward: adult language learners have difficulties with morphological complexity. These data and analysis seem to resolve nicely the "paradox" that the most complex languages are generally found in small-scale "primitive" cultures. Since very few outsiders learn these languages there's no need for them to be "dumbed down" or non-fluent adults.

Citation: Lupyan G, Dale R (2010) Language Structure Is Partly Determined by Social Structure. PLoS ONE 5(1): e8559. doi:10.1371/journal.pone.0008559

More like this

One case that does not correlate well is Chinese which is spoken by a large population but has a high complexity. (I could not find the article so I am assuming the complexity of Chinese is greater but I could be wrong). A similar argument could be given about Japanese.

What these languages have in common is that they were used by societies which were self contained and had little contact with outside groups.

It seems to me that the determining factor is not population but interaction with external groups. English for example, and according to the graph, seems far less complex than Spanish even though they possess rougly the same number of speakers. However, English is far more widespread, being spoken by people of different cultures and through the simultaneous immigration of foreigners to Britain and then to the USA forced to lower the standards for communicating.

A group that seldom has interaction with outsiders can evolve progressively more complex language structures without loosing communication ability.

It would be interesting to compare the language from Ancient Egypt who had a culture with relatively contact to outsiders, with Hittite for example, which required more trade for survival.

This is interesting work, and Lupyan has solid linguistic cred, but I question the simplifying methodology of equating morphology with complexity. Two points complicate this view:

1. A common view amongst typologists is that there is a trade off: less morphology = more syntax & pragmatics (just as complex). Somehow, someway, all language must accomplish the communicative acts necessary for human interaction. How they choose to do this is less relevant than the fact that they all must do it somehow. the simple claim that adult learners have trouble with morphological complexity may stand up, but that's a much less bold of a claim.

2. grammaticalization: some historical linguists posit that languages tend to cycle through morphologically complex structures --> morphologically simple --> morpho complex --> etc. English, for example, used to be much more morphologically complex 800 years ago. Now, you might argue that as the population increased, it's complexity decreased as Lupyan predicts. On the other hand, Mandarin Chinese seems to be getting MORE morphological (see the use of the particle âleâ äº which seems to be developing as more and more of a bound morpheme, if I understand the literature correctly) this is the opposite of his predication.

I'm excited to see WALS being used for research purposes, but it does not, as far as I can tell, include historical information. How can we tie historical language changes to historical population changes? Not with WALS, as far as I can tell. Rather, we can only get a synchronic snapshot of the current state of things. How can we draw conclusions about change strictly from synchronic data?

I could be wildly off on this because, as of yet, I can't find this article on PloS ONE. I'll keep looking. I'm generally in favor of the kind of research Lupyan does.

Just guessing, because the link is bad:

I think this might have something to do with use by non-native speakers, in particular when used as a language of administration and culture for speakers of many languages. They say it this way: "Languages used for communication in large groups that include adult learners appear to have been subjected to such selection."

Don't know exactly what "morphological complexity" means. Also, there are actually very few languages with a big geographical spread -- European languages, Turkish, maybe Chinese, Persian, and Hindi, maybe Malayo-Polynesian, and Bantu languages. So a comparison is being made between a few languages and all other languages.

Three of the widespread languages are developed from vulgar Latin, which is much less complex than literary Latin, maybe because of Latin's use by non-native speakers.

Languages with large populations which may break this generalization include Arabic, Javanese, German, Russian, and Japanese.

20 most spoken languages. 18 of them are South Asian or East Asian.

By John Emerson (not verified) on 20 Jan 2010 #permalink

Reading this article immediately raised questions in my mind re. Arabic. Wiki says that about 280 million people speak it as a 1st language with almost that same number as a second - which seems a fairly large number.

But my understanding is that Arabic depends very much on inflection for accurate communication. I suspect that's why huge blog fights so dependably erupt over transcriptions of Arab / Muslim leaders' public remarks. It seems many Arabic statements can have completely opposite meanings - depending on who provides the translation.

I don't fully understand the article but isn't Arabic a pretty glaring counter-example?

By Ray in Seattle (not verified) on 20 Jan 2010 #permalink

I should add that the premise as stated, seems like a good one; that languages evolve and adapt to their social environment. But the article (excerpt at least) only used population size and the presence of adult learners to justify that generalization. There are many dimensions to social environment.

By Ray in Seattle (not verified) on 20 Jan 2010 #permalink

Japanese isn't a particularly complex spoken language. Intonation is not particularly important, the grammar is fairly simple. The only really difficult aspect of Japanese is the crazy writing system that makes becoming literate a life-long task.

Spoken Chinese has low complexityin the terms of the article -- virtually no morphology. The written language is extremely difficult, but any language written in that kind of script would be difficult.

But Chris may be on to something. Languages which are more complex in one way (morphology here) might be simpler in others, and vice versa.

On the other hand, it may be that some languages, spoken only by native speakers who learn during childhood, might pick up complexities which are non functional ornaments in every sense except possible enforcing group solidarity or something like that.

By John Emerson (not verified) on 20 Jan 2010 #permalink

I'd be really interested to see a similar thing with regularisation of inflection.

Re: Arabic, it is a very widespread language but has famously regular and tractable inflective patterns (as do all Triconsonantal Root languages -

Indo-European languages tend to have relatively fusional and irregular inflection (compared to the trends for steppe area languages and Semitic [i.e. specifically Middle Eastern] languages), so I think they would probably exaggerate any trends that exist for a relationship beween inflectional and social complexity and simplification.

Since people are asking, morphology means how word-pieces (morphemes) are put together. For example, you want to express something in the past, so you need the morpheme for the action and the morpheme for "past tense." How do you fuse those together?

In a more complex morphology, you might alter the root in some way -- stick on a prefix, a suffix, an infix, and maybe those would be different depending on whether the subject was singular or plural, 1st 2nd or 3rd person, etc.

In a simpler morphology, you'd have an all-purpose word that indicated past tense that you would put before or after the root -- not even a prefix or suffix, but something that could stand on its own in a sentence. The verb's root wouldn't have all those different conjugations for number, person, etc.

So that's all they mean by "complexity" -- it's specifically how complex this word-formation process is.

An independent test of their hypothesis is to do the same thing for phonological complexity -- or just the diversity of the phonetic inventory. That is, what sounds are required to speak the language fluently?

Their argument is that less complex systems are a benefit in large-scale populations because the many adult non-native learners, whose brains have already gelled into place as far as word-formation rules go, will find the simpler systems easier to pick up. It's like a pidgin.

The same should be true for sounds -- unless you have a knack for languages, you can't learn new sounds and sound rules after age 14 or so. That's why the strongest give-away that someone learned the language as an adult is that they have "an accent" -- they can't make the sounds in the right way, or they may have all the necessary sounds but can't follow sound rules (like "change the /d/ to a /t/ at the end of a word").

If the point is to integrate into the larger society, mastering sounds are more important than mastering word-formation, given that as already noted your "accent" is a more salient ethnic marker -- remember that 42,000 Ephraimites were slaughtered by the Gileadites for not being able to pronounce the "sh" sound in "shibboleth"!

Just re-read the post. I speak English, Spanish and Japanese, and have taught English and Spanish. English is certainly the hardest of the three to learn to speak. Japanese and Spanish native speakers can learn the opposite much more easily than either learns English, in my experience.


One case that does not correlate well is Chinese which is spoken by a large population but has a high complexity. (I could not find the article so I am assuming the complexity of Chinese is greater but I could be wrong). A similar argument could be given about Japanese.
What these languages have in common is that they were used by societies which were self contained and had little contact with outside groups.

I don't see how you can say either of those two things. Standard Chinese is the language of an empire, hardly an isolated society at all. Remember that there are, and have been, many other languages spoken in the same area by many different groups of people. (And from what I have heard from adult learners, Standard Chinese does seem to have a very low morphological complexity.)

Japanese is spoken by a more isolated population, though, and is very different in morphological complexity.


but I question the simplifying methodology of equating morphology with complexity

From what I can see from a quick scan this isn't what they're saying at all. Did you pick it up from razib's "tabloid sensationalist" headline?

John Emerson:

But Chris may be on to something. Languages which are more complex in one way (morphology here) might be simpler in others, and vice versa.

I should read through the paper properly, but it seemed that is one of the points that the authors make. So it's not just Chris.

Ray in Seattle:

Wiki says that about 280 million people speak it

This is just a pet peeve of mine, but what you are presumably referring to is called "Wikipedia" and not "Wiki". Wiki refers to something completely different.

The way I would think of it is that the words that employ purely inflectional strategies have ways of including grammatical information that just have to be learned, whereas languages have purely lexical strategies use words which don't have a purely grammatical meaning and then use them in a grammatical way by way of analogy (even if that analogy is, at times, strained) or else they either just don't need or include that grammatical information.

One of the things, that I, as a non-linguist always find so difficult about discussing languages with different inflective complexity is that linguistics always seem to talk about words like "the" and "was" as somehow being able to stand on their own as part of a sentence, when they obviously still don't make that much sense when they do. They certainly don't make that much more sense on their own that if you had spoken a Latin inflection on its own and aloud and with no other accompanying words. If they do make sense in their own, surely then that's because they have a unique morpheme shape that doesn't require other words to disambiguate their meaning (unlike grammatical inflections), not because they are somehow "free"?

I think with accents, it's also true that people who learn second languages make consistent changes to the grammar as well (think of all the stereotypically different ways in which Slavic, Yiddish, Irish, &c. speakers form sentances - use of articles, pronouns, word order and so on), but these don't seem to be quite so simple or salient to talk about, probably because they are less frequent unless they apply to extremely common constructions and because they don't remind us so much of babytalk.

I've really enjoyed following this interesting discussion on the paper (I'm Lupyan's coauthor). Without addressing any one particular of the many points above (such as exceptions, the notion of complexity/redundancy, cultural structure vs. population, etc.), I simply wanted to share with you the fact that Gary and I have a fairly extensive collection of supporting information that can be accessed through links at the bottom of the PLoS article. These additional materials (not in the main body of the paper) may address some of these concerns (such as offering evidence for the learnability mechanism, etc.). Cheers, Rick Dale.

JFE: I'm not sure if that's true about Chinese because Chinese isn't one language. Spoken Chinese is a family of related languages that are not mutually intelligible. If you meant Mandarin then the number of speakers has historically been low. For example, my grandmother who is Chinese and born in the 1930s cannot speak Mandarin. My mother's Mandarin is far from fluent.

But Chris may be on to something. Languages which are more complex in one way (morphology here) might be simpler in others, and vice versa.

English is a classic example. Grammatically, it's one of the simplest Indo-European languages. But it has an unusually large number of phonemes, and spelling is quite a mess. The phonemes make spoken English so difficult for adults to learn. Imagine, for instance, the native speaker of Chinese or Japanese, who did not learn the distinction between L and R as a child. Or conversely, the Westerner encountering Chinese tones as an adult (in English tones are sometimes used for emphasis, but never to distinguish between concepts so different as "mother" and "horse"). The complexity of spelling comes in part from borrowing words from so many different languages, often with little or no modification to spelling when the source language also uses Latin characters.

By Eric Lund (not verified) on 21 Jan 2010 #permalink

I've taught ESL and the words "the", "a", and "an" are very hard to teach and learn. There's no morphological complexity, but the usage is tricky and not at all intuitive.

By John Emerson (not verified) on 21 Jan 2010 #permalink

@magetoo -- Having finally read the paper, I see that the authors never make the "morphology = complexity" claim explicitly, though their strict reliance on morphological features is a limitation. Nonetheless, objection withdrawn.

On the other hand, I renew my objection that they simply did not collect the sort of data that allows them to make the bold hypothesis about language evolution that has gotten this paper so much press, namely that social factors influence grammatical evolution. This may turn out to be true, but it will be discovered not based on snapshots of synchronic data of the sort in this study.

It's still an interesting study, mind you, and I think they are on to something. I just think they should have resisted the temptation to jump off the cliff of language evolution in the last section.

Thanks for your critique, Chris. I appreciate it. I should note though that our inferences regarding language evolution (perhaps we mean language change, here) are not supported only by our analyses -- our paper reviews a variety of work that is suggestive of language change in this direction (e.g., work on creole change, etc). However, I'm not sure how else the results we obtained could hold without having such a process in place. The results stand on their own as quite robust patterns across social groups and their languages -- how did it get like that? A story regarding linguistic change, across generations, with selective pressures induced by learnability constraints, seems like a natural explanation of the data (in short, our hypothesis is an explanation of the data, rather than being "shown conclusively by the data," something we probably would not aver). Rick Dale

One clarification about Chinese: it has a very low morphological complexity (no inflections at all!). There are neither noun declensions nor verb conjugations in Chinese, so it's a rather good example for their hypothesis.
What I doubt is that for the average adult language learner it's easier to learn tones, or strict (and often confusing) rules for word order different from the respective native tongue than learning a complex inflectional system.

By Kadphises (not verified) on 24 Jan 2010 #permalink

"language structures are subjected to different evolutionary pressures in different social environments"

Yes, it is!

We human started to talk since more than one million years ago and Chinese started to create symbols for communication more than 5000 years ago!

For the past 5000 years, millions of Chinese spoke or wrote Mandarin everyday. Everyone may have an idea to improve or refine Mandarin. Finally Mandarin is the accumulation of billions human's wisdom.

Also, Mandarin has other features:

Japanese adopts a lot of Chinese characters, so, some Japanese know the advantages of the Chinese language. A learned Japanese states that Chinese language is very systematic and logic. He looks at Mandarin from a different angle.

Some people say the sound of Mandarin is poetic.

I am a published author. Iâd say the writing of Chinese characters could be beautiful.

Itâs also very interesting to note that every Chinese character and pronunciation has a reason or logic behind. That means we can learn Mandarin much easier if we understand the reasons or logic behind. (I can demonstrate how easy learning Mandarin could be!)

Actually, learning Mandarin can be easy, entertaining and fun!