When I was an undergrad, I almost took a degree in linguistics because I was so fascinated by languages, especially by the rate and patterns of change that languages undergo. So of course, I was excited to read two fascinating papers that were published in this week's issue of Nature. These papers find that individual words evolve in a predictable manner and this rate of evolution depends upon their frequency of use. Further, this predictability can be defined mathematically.

To test this hypothesis, one group of researchers from Harvard University set out to determine the rate at which irregular verbs change into "regularized verbs" and compared that to their frequency of use. Another group from the University of Reading, UK, compared the frequency of word use to that of replacement by another word for four different languages. Despite their different approaches, both groups report similar findings; the more frequently a word is used, the less likely it will undergo change -- a phenomenon that mirrors the evolutionary rate for genes.

The study conducted by Erez Lieberman and Jean-Baptiste Michel and their team of evolutionary biologists, all from Harvard University, focused on conjugation changes to 177 irregular Old English verbs. They found that 145 of these irregular verbs were still irregular in Middle English, while only 98 remain irregular today.

An irregular verb is an action word that does not follow regular conjugation rules so that it has a peculiar past tense, for example, words such as sing/sang/sung and go/went/gone, whereas regular verbs are simply modified by adding -ed to the infinitive to create the simple past and past participle forms; for example, talk/talked/talked. Interestingly, the team mentions that new verbs entering the English language universally follow regular conjugation rules; for instance, google/googled/googled.

Curiously, even though less than 3% of all English verbs are irregular, the ten most commonly used verbs are irregular (see table at right for a representative sampling of irregular verbs in the study). Thus, there is a heavy bias towards irregular verbs maintaining their "irregularities" when they are used often, whereas infrequently used irregular verbs are regularized by adding -ed to the infinitive. It is unclear why the -ed rule emerged as dominant over the other seven (at least) classes of irregular verbs, but it probably is because other rules, and their unusual conjugations, are otherwise easy to forget.

To study evolution of irregular verbs, Lieberman and Michel's team first identified 177 irregular English verbs, then they used the English language CELEX corpus, a lexical and textual database containing 17.9 million words, primarily from American and British English text sources. CELEX determines the frequency of use for any given word within the database. Using these data, the team calculated the frequency at which the 177 irregular verbs became "regularized" over time, and compared that to their frequency of use (figure 1a, below);

By leaving out frequency data for the most commonly used irregular verbs (be, have, come, do, find, get, give, go, know, say, see, take, think), which are most resistant to change, Lieberman and Michel's group found that the least frequently used irregular verbs changed, or regularized, fastest. In fact, there was a mathematical relationship that described this rate of change such that those verbs that were used 100 times less frequently evolved 10 times faster (figure 1b, above). Thus, the team used this mathematical relationship to predict the half-life for regularization for other irregular verbs (see table 1, above). According to their hypothesis, only 83 of the original 177 irregular verbs will retain their unusual conjugations by 2500.

Using these data, the team predicts the half-lives of 'be' and 'have' are 38,800 years, making these two verbs the most resistant to regularization. The team also predicts that the next irregular verb that will regularize is wed/wed/wed, because it is the least frequently used modern irregular verb. In fact, this verb is already is being replaced by wed/wedded/wedded in the four major English-language dictionaries.

So based on this study, it is perhaps not surprising to learn that infrequently used irregular verbs evolved fastest. For instance, 'stode', the antiquated past tense for 'study,' was regularized to 'studied' and 'holp,' the past tense of 'help,' became the modern 'helped.'

In fact, another study published in the same issue of Nature revealed the same basic rate of linguistic evolution. That study, by Mark Pagel, an evolutionary bio-informaticist at the University of Reading, UK, and his group also suggested that less commonly-used words evolve faster. Pagel's group found this by comparing frequency of word use to the rate of replacement with a different word for four languages (Greek, Spanish, English and Russian). Further, based on their findings, Pagel's group estimates that frequency of word use explains half the overall rate of language evolution.

Interestingly, these linguistic data are analogous to the way that genes evolve: those genes that are expressed most frequently, such as the so-called "housekeeping genes", are least likely to change, whereas genes that encode rarer and more specialized functions have less selective pressure and thus change faster.


"Quantifying the evolutionary dynamics of language" by Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang & Martin A. Nowak. Nature 449:713-716 (11 October 2007 | doi:10.1038/nature06137) [PDF].

"Frequency of word-use predicts rates of lexical evolution throughout Indo-European history" by Mark Pagel, Quentin D. Atkinson & Andrew Meade. Nature 449:717-720 (11 October 2007 | doi:10.1038/nature06176) [PDF].

