Evolution of the Lexicon

I recently posted about the work by Pagel and colleagues regarding ancient lexicons. That work, recently revived in the press for whatever reasons such things happen, is the same project reported a while back in Nature. And, as I recall, I read that paper and promised to blog about it but did not get to it. Yet.

So here we go.

The tail does not wag the dog

The primary finding of the Pagel et al. study is this: When comparing lexicons from different languages, meanings that shared a common word in an ancestral language change over time more slowly if the word in question is used more often in day to day speech. This finding was found to be consistent enough that the authors call this a “law-like” property of language.
́

Greek speakers say ”oura”, Germans ”schwanz” and the French ”queue” to describe what English speakers call a ‘tail’, but all of these languages use a related form of ‘two’ to describe the number after one.

You can do this yourself. Here is the English “horse” translated into two closely related and one more distantly related Indo European languages:

Dutch: paard
German: Pferd
French: cheval

Not a lot of overlap, though a linguist would see the Dutch and German as similar, I suspect. Here, in contrast, is the English word “hand”

Dutch: hand
German: hand
French: main

The three Germanic languages are identical, and maybe that French word is not so different. Now let’s try for some more anatomy, with the English word “penis.”

Dutch: penis
German: penis
French: penis

Wow. According the purported law like properties of language change … oh never mind, no way to draw any hard and fast conclusions at this point I suspect. (I’ve left off the accents and the pronunciations are more different than they look here.)

(Above results all obtained using Google Translate.)

Pagel et al. estimated the rates of change among vocabulary words for 200 different meanings across 87 Indo-European languages. The number of different cognates (words that are linguistically the same) ranged from one to 46. From this analysis they calculated that the half life of a word, on average, was probably a bit over five thousand years, with a very skewed distribution.

Our findings, based on a sample of fundamental vocabulary items, identify a general mechanism of linguistic evolution, which is expected to operate across all languages and timescales and makes predictions about rates associated with specific meanings. To the extent that the structure and everyday functions of human verbal communication mean that some words will tend to be used more frequently in all languages, we expect these words to evolve slowly, and vice versa for infrequently used words. Combined with parts of speech, this simple factor allows us to account for about 50% of the variance in rates of lexical replacement throughout the 6,000- to 10,000-year history of Indo-European languages. Given the many social, cultural and cognitive factors that can influence language, it is striking that word-use frequency alone can explain such a large proportion of the historical variation in rates of evolution. The generality of this influence is suggested in the finding that estimates of the rate of lexical replacement in Indo-European languages are correlated with rate estimates in Bantu10, Cushitic and Malayo- Polynesian.

A Tale of Two Disciplines

This research is partly based on, and partly demonstrates the validity of, the assumption that language change over time can be modeled as a tree-like pattern, much like genetic change over time is modeled to create species (or population) trees. (I hasten to add: I will be using terminology here that may annoy hard core cladists. I love annoying hard core cladists.) However, linguists have come to believe in recent decades that such research, beyond relatively simplistic grouping of very closely related languages that have diverged recently, is not worthwhile. Most linguists active today simply believe that the idea that time-deep language phylogenies can be built with any degree of reliability is utterly discredited.

The work by Pagel et al. seems to prove these linguists wrong, but the culture of incredulity is strong and seemingly unshakable. But I’d like to ask you to imagine what it might be like if things were just a little different in recent history.

( … harp music as everything becomes blurry, and the scene changes to a 1960s era lab with the large and furry figure of Charles Sibley holding four liquid filled test tubes in one hand, up to the light, gazing at them… Nearby, Jon Ahlquist is re-ordering a series of IBM punch cards that just got scrambled when they fell out of the box on the way back from the Batch Window at the computer center … )

“This is never going to work,” says Sibley. “This whole idea of using DNA to make a family tree of living species has too many problems. True, we came up with a number of plausible phylogenies, but the quick work of our colleagues in the fields of biogeography and morphology sure made quick work of our quick work!”

“I wish you would stop with the stupid puns,” intoned Ahlquist. “But as usual you’re right. This hybridization stuff kinda works but the results are not sufficiently resolved to sort out either really closely related species or very distant relationships. As for this in between scale of relationships, we can SEE those. We don’t need this extra expense. What are you doing with those colored liquids in those test tubes, anyway?”

“New martini. I call it “The Sarich,” replied Sibley.

“Meh.”

( …. scene becomes blurry again, with harp music, and refocuses on a group of graduate students and a junior prof type sitting around a table in Nick’s Beef and Brew on Mass Avenue in Cambridge. These researchers are attached to Harvard’s Phylolinguistic Research Center, a new facility just built on the foundation of the recently torn down Peabody Museum…)

“So what if they don’t think it works!” said the one named Merritt. “We’ve been using Pagel’s phylogenetic method on languages for decades, and no one has questioned our ability to make deep phylogenies going back more than half way to the origin of human speech! All we’re trying to do here is to apply the same exact methods to the phylogeny of the mammals, using genes instead of words. Of course it will work!”

The group was interrupted as the waiter, Irv, came by with a large tray and efficaciously dealt out a half dozen Double Cheeseburger Specials as though they were mere playing cards. “Which one of you gets double tops….” he said as he glanced around. Then he noticed Big Tim, and remembered ….. right, double tops…. “Here you go. Enjoy.”

After a few minutes of passing around of the ketchup and adjusting the French fries, the conversation resumed. Just then, the door opened and in came Mark, the group’s statistician. Whenever the door opens in this place, a mighty wind blows across all the tables in the general direction of cook’s grill, where a 93,000 BTU open flame is constantly in use making more and more hamburgers, converting several cubic meters of oxygen into oxidized beef per minute.

(One day, a few years after this conversation, it just happened to occur that no one went in or out of Nick’s for a full hour and ten minutes. All of the oxygen was burned up at the grill and the entire retinue of diners, employees, and Nick himself suffocated, in what would later become known as the Great Snuffing Out on Mass Ave.” But I digress…..)

Pagel sat down with the group and they started to talk again about the application of proven phylolinguistic methods to genetics.

“The problem with genetics,” someone said, “is that the are under selection, unlike words.”

“Another problem,” someone else said, “is that we’re looking at genetic change across vastly different animals, with different metabolic rates and generation times.”

“… and in some cases” someone else jumped in, “Different systems of reproduction…”

“… right, and not even the same number of chromosomes across species, so linkage effects may be different….”

“Don’t worry.” Pagel spoke those words and took a bite of his meal. “Oh, did someone order beer by the way?”

Someone handed Pagel a beer to wash down his cheeseburger.

“Cheers,” Pagel said, “You don’t have to worry about most of that stuff. Most genes are highly conserved across organisms. The plurality, anyway. And other bits of DNA seem to change fairly quickly. You couldn’t find a better system than genetics to try the phylogenetic methods on. It will work better than with language, and it works pretty well with language.”

“Why didn’t they … the biologists … why didn’t they, I mean, shouldn’t they have… um, how come…” sputtered the one called Greg, just starting on his second cheeseburger and not quite sure if he was ready to speak up yet.

“Why don’t they get it? Why did they give up on this sort of thing fourty years ago?” Pagel clarified. “Because their first few attempts used a technique that sucks, and because they had no idea how the numbers worked statistically. Now, with genetic sequencing we have excellent data, and we understand the numbers. This will be easy. You guys go collect the data and bring it back here. I’ll run it on my Android and we’ll have he paper out by dinner time…”

( … scene goes blurry, ad all six of the scholars crowded into the high-backed wooden booth in Nicks simultaneously chomp on the last bit of their cheeseburgers …. )

Well, I doubt it would have happened quite that way, but my point should be clear. Linguists gave up the ghost on phylogentics when they ran into a number of problems. The method became “discredited” and no further work has been done with it. Meanwhile, in another discipline in which this sort of method can be used (genetics, in the real world) the approach continued to be developed. And now, practitioners of this method will be happy to apply these ideas to language, and teach the old boys a thing or two.

(Clarifications: 1) In “real life” the “phylogenetic method” was invented by Pagel and Harvey, but this is not the method being used to do language phylogenies. It is a wholly different thing. 2) No one ever really died of suffocation in Nick’s. 3) Irv would not have been that good of a waiter.)


Mark Pagel, Quentin D. Atkinson, Andrew Meade (2007). Frequency of word-use predicts rates of lexical evolution throughout Indo-European history Nature, 449 (7163), 717-720 DOI: 10.1038/nature06176

Comments

  1. #1 Paper Hand
    March 7, 2009

    Now let’s try for some more anatomy, with the English word “penis.”

    Ah, but in this case, “penis” is a relatively recent borrowing from Latin. I don’t know what the Anglo-Saxons called it, but they didn’t call it “penis” (which, incidentally, meant “tail” in Latin – it originated as a colloquial term! – and is indirectly related to “pen” and “pencil” as well as “penicillin”)

    In this case, there is the common phenomenon of taboo replacement – a word becomes taboo so it’s replaced by another word, perhaps a borrowing or a word that originally meant something different, which itself eventually becomes taboo and needs to be replaced.

    These are the kind of phenomena that complicate historical linguistics. There’s a lot of “horizontal gene transfer” to use a biology-derived metaphor.

  2. #2 Greg Laden
    March 7, 2009

    Paper Hand: I had thought about using some of the other examples of this (taboo) that come to mind but thought better of it. In any event, yes, this is why Pagel et all use more data than I did!

    The bigger picture: There is a list of things that “complicate” language phylogenies, but there is also a list of things that “complicate” genetic phylogenies. Having walked away from this methodology decades ago, it would be reasonable for linguists to have a second look leaving behind their assumptions.

    But they won’t. Which is fine. Other people will do this for them, because it is kinda fun.

  3. #3 the real me
    March 7, 2009

    Awww, Greg, stop being such a kun….but the schwanz was always the pecker when I heard it from that old bird, my grandfather.

  4. #4 wazza
    March 7, 2009

    Linguistic phylogeny has the same problems as memetics, and indeed can be regarded as a form of memetics (given that a word is an idea). Both are harder to follow than genetics because of the borrowing and hybridisation between widely varying forms. It’s as if all biology had as much propensity to horizontal gene transfer as bacteria do.

    Still, I think it’s an idea worth pursuing. It’ll just be harder than developing genetic phylogeny.

  5. #5 Iain
    March 8, 2009

    A million things to say here, not least that all those scientists will be dead by now from stomach cancer caused by all that charred meat! And the Peabody is still there.

    Have not checked the paper (yet) but do they deal with the situation where irregular verbs seem to be the common ones and not to be conserved between languages?

  6. #6 Notagod
    March 8, 2009

    Also, words can just appear out of nowhere. Such as christians wouldn’t want to refer to a deceased person as deceased or dead, they would want a word that isn’t scary to them. Of course, christians will want to find a new word, as time passes, when passed becomes scary to them. Natural processes are unlikely to produce that arbitrary strangeness when all the facts are known.

    As much as some would like a dat (dog-cat), it is very unlikely to occur naturally. However, it is possible that a cButtarsMor[m]onic could be produced somewhere within the lowerchristian species.

    Wouldn’t the language trees be all tied in rather unnatural knots? Could be interesting though.

Current ye@r *