The Effects of Color Names on Color Concepts, or Like Lazarus Raised from the Tomb

Note: This was originally posted at the old blog on August 14, 2005. Enjoy.

After I finally finished Language in Mind, about which I posted the other day, I went back and looked at some of the literature on linguistic relativity that I had read over the years, but had mostly forgotten. And since linguistic relativity has always been a favorite topic of mine, I thought I'd post a little more about it (it may not be a favorite topic of yours, but hey, this is my blog!). In the early days of cognitive science, the majority of the studies designed to test the Sapir-Whorf hypothesis, or other linguistic relativity hypotheses, looked at the effects of color terms on color concepts. Early on, much of this work produced promising results for supporters of the S-W hypothesis. But in 1972, E.R. Heider published a paper titled "Universals in color naming and memory" that effectively killed the Sapir-Whorf hypothesis1. Heider compared memory for colors in speakers of two different languages, English and the language of the Dugum Dani. The Dugum Dani is a remote hunter-gatherer tribe in New Guinea that has had little exposure to western culture. They have two basic color terms, compared to 11 in English. Thus, the comparison between the two presented a particularly strong test of the Sapir-Whorf hypothesis: if the speakers of a language with 2 color terms and the speakers of a language with 11 both have the same or highly similar color concepts, then we're justified in dropping the idea of linguistic relativity, at least for color. And that's what Heider found: English speakers and members of the Dugum Dani tribe displayed highly similar color memory. Other researchers found similar evidence in the comparisons of speakers of several other languages with varying numbers of color terms, but for all intents and purposes, linguistic relativity was dead after Heider.

Until the 1990s, that is. Looking at color terms presents a test of a particularly strong version of linguistic relativity. Color is a concrete, physically-defined category with a well-known neural basis. Thus, the explanation for Heider's data, which was widely accepted, was that color concepts are determined not by color names, but by the physiology of color perception. However, in the 90s, researchers began to think color was too strong a test, and that for more abstract domains, language does play a constraining role. Over the last several years, a growing body of evidence has shown that this is in fact the case. For abstract domains like time, number, space, and substance, language can be highly influential. But anyone who's really interested in linguistic relativity always has color in the back of his or her mind. If evidence for universal color categories can kill the Sapir-Whorf hypothesis, then evidence for cultural variance in color categories can bring it back to life.

But there are difficulties in studying cultural variation in color. You can't just study the speakers of languages spoken by people in industrialized nations, because there has been a lot of interaction between the speakers of those languages. You can't even study languages spoken exclusively by people in unindustrialized nations who have had a lot of exposure to western culture, because speakers of those languages tend to adopt color terms from western languages (especially from English). So, you have to find remote tribes that speak languages that have had relatively little outside influence. That takes money and time. Furthermore, there is always the problem of running the same experiment in multiple languages. You never know whether the experiment is exactly the same to speakers of different languages (and if you're an adherent of some version of linguistic relativity, you have to believe that it isn't!). That's a particularly big problem when you're studying members of remote hunter-gatherer tribes. Psychology experiments seem weird to American undergraduates who are taking a course about psychology and its experiments. Imagine how odd they must seem to hunter-gatherers who've never heard of psychology. But Heider's experiments suffered from these problems, too, so if there's reason to doubt any cross-cultural research on color concepts, then there's reason to doubt Heider's. For some, that doubt is all the motivation they need to do more research.

Enter Debi Roberson, and her colleagues. Roberson believes that there is evidence in Heider's data that Heider's conclusions may have been a bit hasty. For instance, Dani color memory was much worse than that of English speakers, even though the error patterns were highly similar. Heider has no explanation for this. Perhaps it is an indication that Dani color concepts really are different from those of English speakers. Armed with her doubt of Heider's data, Roberson set out to replicate his results, and further test the effects of color names on color concepts using new methods. For this post, I'll describe two of her methods: color memory, which attempts to replicate Heider's findings, and categorical perception.

The experiments on color memory go like this. First, you have to determine the number of basic color terms in a language. You do this by having people name Munsell color chips, which depict colors across the visible spectrum. You then determine the color names that were used to describe the bulk of the spectrum. In doing so, you get graphs that look like the following for English-speakers (from Roberson et al. 20002:


The numbers at the top and on the side (which are hard to see, I know) are the numbers and labels for the Munsell chips. While you're eliciting color names, you also ask the participants to indicate the best example of each color (the most common answer to this question for each name is represented in the above graph by the dots). Roberson and her colleagues have done this for three cultures, English (in the graph above, which gives 10 basic color terms, as compared to the 11 that Heider found), as well as the Berinmo tribe, which is also from New Guinea, and the Himba tribe from Namibia. Both the Berinmo and Himba tribes have had very little exposure to western culture, and their languages lack any color terms borrowed from other languages. Both of them have five color terms. Here are the graphs of their color names, which you can compare to the graph of English color names above3:



The Berinmo and Himba graphs look somewhat similar, but are different enough for comparison. The numbers on these two graphs indicate the number of participants who said that that chip was the best example.

After you've got the naming data, you can do the memory task. The materials for the task include low saturation color chips (i.e., chips that are near naming boundaries, or otherwise far from the best examples of a color name) from the English color categories. The participants are shown a chip by itself for five seconds, and then the chip is covered. After thirty seconds, the participants are shown a full array of chips (40 total) and asked to identify the color they had just seen. The low saturation chips are chosen because they will produce high error rates. The key data is what sorts of errors participants make. If they tend to mistakenly choose other chips from the same color category in English, as English participants do, then we can infer that color categories are universal. However, if there errors tend to involve choosing chips that have the same name in Berinmo or Himba, then we can reason that color terms affect color perception and memory, and thus that color categorization is not universal. This would be strong evidence for linguistic relativity in color concepts. And that's what Roberson and her colleagues found. Berinmo participants tended to make errors consistent with their color names, while Himba participants made errors consistent with theirs. Neither made errors consistent with English color names (the correlations between naming and memory for Himba participants were r = .559 for memory and Himba names and r = .036 memory and English names; the correlations were similar for Berinmo vs. English for Berinmo speakers).

To provide further evidence that color names affect color categories, Roberson et al. (2000) and Roberson et al. (2005) conducted an experiment on categorical perception with the Berinmo and Himba. In categorical perception, within-category exemplars tend to be treated as more similar than between-category exemplars, even when the between-category exemplars are more similar physically. This is particularly interesting in color perception: a color exemplar classified as red will be more similar to other exemplars of red, particularly the best examples of red, than it will be to examplars of neighboring colors, even if the exemplar falls on the physical boundary between the two colors and is thus closer, physically, to exemplars from the neighboring colors than it is to the best example of red. If Berinmo and Himba speakers demonstrate categorical perception effects consistent with their labels, but not with other labels (particularly English), then we can conclude that their color naming affects their color concepts.

To test this with Berinmo speakers, they tested participants on a category distinction present in English, but not Berinmo (green-blue) and one present in Berinmo, but not English ("nor" and "wor," as in the graph above). They presented participants with three Munsell color chips, and asked them which two were the most similar to each other. Two of the chips were highly physically similar, i.e., they were close to each other in the physical color space. One of those two chips also shared the same label as the third chip. Thus, you might have two "wor" chips, and one "nor" chip, with the "nor" chip being physically more similar to one of the "nor" chips than the other "nor" chip. If participants consistently answer that the two "nor" chips are more similar than the physically similar "nor" and "wor" chips, then they will have exhibited a categorical perception effect. Furthermore, if they do not exhibit a categorical perception effect for the green-blue distinction (i.e., they pick the more physically similar chips when the choices are two classified as green and one as blue), then we can conclude that it is the naming, rather than any universal physiological aspect of color perception, that is driving the categorical perception effect. And that's what Roberson et al. (2000) found for Berinmo. Roberson et al. (2005) found similar categorical perception effects for the Himba speakers.

It's interesting that the Berinmo and Himba tribes have the same number of color terms, as well, because that rules out one possible alternative explanation of their data. It could be that as languages develop, they develop a more sophisticated color vocabulary, which eventually approximates the color categories that are actually innately present in our visual systems. We would expect, then, that two languages that are at similar levels of development (in other words, they both have the same number of color categories) would exhibit similar effects, but the speakers' of the two languages remembered and perceived the colors differently. Thus it appears that languages do not develop towards any single set of universal color categories. In fact, Roberson et al. (2004) reported a longitudinal study that implies that exactly the opposite may be the case4. They found that children in the Himba tribe, and English-speaking children in the U.S., initially categorized color chips in a similar way, but as they grew older and more familiar with the color terms of their languages, their categorizations diverged, and became more consistent with their color names. This is particularly strong evidence that color names affect color concepts.

It appears, then, that Roberson and her colleagues have laid their hands on the Sapir-Whorf hypothesis and raised it from the dead with their experiments using members of the Berinmo and Himba tribes. We should, of course, take these results with a healthy dose of skepticism, because it does involve testing people in very different languages and cultures and comparing their results, which, as I said earlier, is a big problem. However, the growing body of evidence from Roberson and her colleagues' experiments is hard to deny. I don't know about you folks, but I find the revival of Sapir-Whorf incredibly exciting.

1 Heider, E.R. (1972). Universals in color naming and memory. Journal of Experimental Psychology, 93, 10-20.
2 Roberson, D., Davies I. & Davidoff, J. (2000) Colour categories are not universal: Replications and new evidence from a Stone-age culture. Journal of Experimental Psychology: General , 129, 369-398.
3 The Himba graph is from Roberson, D., Davidoff, J., Davies, I. & Shapiro, L. (2005) Colour categories in Himba: Evidence for the cultural relativity hypothesis. Cognitive Psychology, 50, 378-411.
4 Roberson, D., Davidoff, J., Davies, I.R.L. & Shapiro, L. R. (2004) The Development of Color Categories in Two languages: a longitudinal study. Journal of Experimental Psychology: General, 133, 554-571.

UPDATE 10/16/06: I just re-added the images, which were showing up when I first reposted this, but subsequently disappeared. I hope that didn't lead to any confusion.

More like this

Great post. I'm waaay into the revival of the Sapir-Whorf hypothesis as well, and Lera Boroditsky is my personal favorite right now. My Psych of Language students are dissecting lots of this work right now and leading some discussion of it in class.

You describe Roberson's work well, and categorical perception is one way to test without so much memorial influence.

One quibble - isn't Heider (1972)actually Eleanor Rosch's maiden name, and thus the pronoun should be "she"...?

What do you make of the Piraha work? I think there are so many ways this has yet to be tested, and in fact, with the claim about their lack of numerosity, I was wondering if you couldn't put your scepticism of mirror neurons together with your interest in Sapir-Whorf and dream up some cool thought experiments...I have been doing precisely this...

I'm not sure about the color memory test. An explanation consonant with universal color perception and color memory is that all individuals perceived the same color, attached a linguistic label to it for convenient storage and retrieval, retrieved the linguistic label when asked to pick the chip, and then chose a chip in the acceptable range of that linguistic label. It doesn't follow that the initial perception varied, nor that memory / recall varied -- it could well be that humans recall memories more in linguistic / propositional terms ("COLOR of CHIP was RED") than by recalling an image from their visuospatial sketchpad, especially as linguistic / propositional terms are more discrete and definite (facilitating storage & retrieval) rather than hazy like values on a color continuum.

I think the same applies to categorical perception effects -- the visual stimuli are encoded in linguistic / propositional terms, and "similarity" is assessed by length of path distance in a propositional tree (or something similar), where two similarly-named colors would only need to travel up to the shared color name node to meet, whereas the physically more similar ones would have to traverse a longer path to meet.

So, if the conclusion is that naming correlates with conception -- that is, the storage, organization, and recall of information -- then sure. But this is a watered-down S-W hypothesis. The strong, interesting S-W hypothesis is that English and Berinmo speakers would literally perceive different colors, or would recall different images from their visuospatial sketchpad memory.

Digitizing the continuous is the hallmark of human thought, whether it's chunking colors under a common label, dicing up the speech stream into crisp words, etc. It facilitates information-processing, compared to what would be needed if we impressed stimuli on our memory like a silk-screen design rather than encode them in chunks. Occasionally border disputes will occur, but as far as I know, no human being can be made to think that complementary colors are physical neighbors, by attaching a common label to both. That would be the real test, but no language dices up the color continuum so wantonly that red and green or black and white share names, which goes against the strong S-W approach.

To be more concrete about red being chunked with green under a common label -- the original Berlin & Kay study on color naming showed that all languages have at least separate labels for "black" and "white." So, to the extent that we carve up the spectrum at all, we instinctively emphasize contrasting colors (rather than, say, "yellow" and "white"). The next color that shows up is "red" -- I don't have access to the Berlin & Kay study right now, but I'd assume that in such languages, green colors do not get chunked under "red" but rather under "black" or "white." Support for this hunch (again, the data would resolve it) comes from the fact that the next color to show up is "green," following the pattern of emphasizing contrasts. That indicates a universal queasiness about grouping red and green together, again belying the strong flavors of S-W.

Michael, you're right, wrong pronoun. And I haven't really thought about putting linguistic relativity and metaphor together, though given the fact that the mirror neuron folk want them to give us meaning, I would be pretty natural to do so. How are you connecting them?

Agnostic, I think almost everyone doing research in the field is thinking along the lines of a pretty weak S-W. It's linguistic relativity rather than linguistic determinism. Now, your explanation would be hard to sort out from an explanation in which perception actually differs, even if only slightly, or if there are only differences in conception. Given that no one that I know have has really specified a model laying out the way in which color language could affect color perception (rather than conception), I'm not sure which explanation is simpler, either.

There are top-down influences on color perception, od course, and color perception is never all that exact anyway (and varies, wildly, with context), so I do see plenty of room for linguistic and/or cultural influences.


Thanks for reposting this. The figures (graphs for English, Berinmo and Himba color maps) are not visible on this post, though I could find them on the 14aug,2005 post as well as elsewhere on the web. As per the figures it is clear that both the Berinmo as well as Himba have unambiguous terms corresponding to English color terms red, yellow, pink and purple. Both also had a term for green-blue. Its not clear that Wor is a term unique to Bernimo as it maps to English yellow very closely. Also Nor , which maps to blue-green is a term for grue (green-blue) that is commonly found in other languages ( that are at an earlier stage of development and have lesser number of color terms).

Thus, a developmental account of color terms cannot be ruled out. I had attempted exactly that in my earlier post on this matter, and would like to see your comments in the light of recent developments.

The fact that Bernito and Himba had color terms for 5 color puts them into the seventh developmental stage ( as they definitely have the color terms black and white and thus actually have 7 color terms). It is somehow surprising that they have terms for pink and purple, but no separate terms for blue and green and this does complicate things a bit (because as per the strict developmental approach, they must at least distinguish between blue and green before moving on), but still a developmental account that balances the extreme universalistic (nature) and relativist (nurture) positions seems to best fit the evidence.

Link to my developmental account of color terms :…