Note: This was originally posted at the old blog on August 14, 2005. Enjoy.
After I finally finished Language in Mind, about which I posted the other day, I went back and looked at some of the literature on linguistic relativity that I had read over the years, but had mostly forgotten. And since linguistic relativity has always been a favorite topic of mine, I thought I’d post a little more about it (it may not be a favorite topic of yours, but hey, this is my blog!). In the early days of cognitive science, the majority of the studies designed to test the Sapir-Whorf hypothesis, or other linguistic relativity hypotheses, looked at the effects of color terms on color concepts. Early on, much of this work produced promising results for supporters of the S-W hypothesis. But in 1972, E.R. Heider published a paper titled “Universals in color naming and memory” that effectively killed the Sapir-Whorf hypothesis1. Heider compared memory for colors in speakers of two different languages, English and the language of the Dugum Dani. The Dugum Dani is a remote hunter-gatherer tribe in New Guinea that has had little exposure to western culture. They have two basic color terms, compared to 11 in English. Thus, the comparison between the two presented a particularly strong test of the Sapir-Whorf hypothesis: if the speakers of a language with 2 color terms and the speakers of a language with 11 both have the same or highly similar color concepts, then we’re justified in dropping the idea of linguistic relativity, at least for color. And that’s what Heider found: English speakers and members of the Dugum Dani tribe displayed highly similar color memory. Other researchers found similar evidence in the comparisons of speakers of several other languages with varying numbers of color terms, but for all intents and purposes, linguistic relativity was dead after Heider.
Until the 1990s, that is. Looking at color terms presents a test of a particularly strong version of linguistic relativity. Color is a concrete, physically-defined category with a well-known neural basis. Thus, the explanation for Heider’s data, which was widely accepted, was that color concepts are determined not by color names, but by the physiology of color perception. However, in the 90s, researchers began to think color was too strong a test, and that for more abstract domains, language does play a constraining role. Over the last several years, a growing body of evidence has shown that this is in fact the case. For abstract domains like time, number, space, and substance, language can be highly influential. But anyone who’s really interested in linguistic relativity always has color in the back of his or her mind. If evidence for universal color categories can kill the Sapir-Whorf hypothesis, then evidence for cultural variance in color categories can bring it back to life.
But there are difficulties in studying cultural variation in color. You can’t just study the speakers of languages spoken by people in industrialized nations, because there has been a lot of interaction between the speakers of those languages. You can’t even study languages spoken exclusively by people in unindustrialized nations who have had a lot of exposure to western culture, because speakers of those languages tend to adopt color terms from western languages (especially from English). So, you have to find remote tribes that speak languages that have had relatively little outside influence. That takes money and time. Furthermore, there is always the problem of running the same experiment in multiple languages. You never know whether the experiment is exactly the same to speakers of different languages (and if you’re an adherent of some version of linguistic relativity, you have to believe that it isn’t!). That’s a particularly big problem when you’re studying members of remote hunter-gatherer tribes. Psychology experiments seem weird to American undergraduates who are taking a course about psychology and its experiments. Imagine how odd they must seem to hunter-gatherers who’ve never heard of psychology. But Heider’s experiments suffered from these problems, too, so if there’s reason to doubt any cross-cultural research on color concepts, then there’s reason to doubt Heider’s. For some, that doubt is all the motivation they need to do more research.
Enter Debi Roberson, and her colleagues. Roberson believes that there is evidence in Heider’s data that Heider’s conclusions may have been a bit hasty. For instance, Dani color memory was much worse than that of English speakers, even though the error patterns were highly similar. Heider has no explanation for this. Perhaps it is an indication that Dani color concepts really are different from those of English speakers. Armed with her doubt of Heider’s data, Roberson set out to replicate his results, and further test the effects of color names on color concepts using new methods. For this post, I’ll describe two of her methods: color memory, which attempts to replicate Heider’s findings, and categorical perception.
The experiments on color memory go like this. First, you have to determine the number of basic color terms in a language. You do this by having people name Munsell color chips, which depict colors across the visible spectrum. You then determine the color names that were used to describe the bulk of the spectrum. In doing so, you get graphs that look like the following for English-speakers (from Roberson et al. 20002:
The numbers at the top and on the side (which are hard to see, I know) are the numbers and labels for the Munsell chips. While you’re eliciting color names, you also ask the participants to indicate the best example of each color (the most common answer to this question for each name is represented in the above graph by the dots). Roberson and her colleagues have done this for three cultures, English (in the graph above, which gives 10 basic color terms, as compared to the 11 that Heider found), as well as the Berinmo tribe, which is also from New Guinea, and the Himba tribe from Namibia. Both the Berinmo and Himba tribes have had very little exposure to western culture, and their languages lack any color terms borrowed from other languages. Both of them have five color terms. Here are the graphs of their color names, which you can compare to the graph of English color names above3:
The Berinmo and Himba graphs look somewhat similar, but are different enough for comparison. The numbers on these two graphs indicate the number of participants who said that that chip was the best example.
After you’ve got the naming data, you can do the memory task. The materials for the task include low saturation color chips (i.e., chips that are near naming boundaries, or otherwise far from the best examples of a color name) from the English color categories. The participants are shown a chip by itself for five seconds, and then the chip is covered. After thirty seconds, the participants are shown a full array of chips (40 total) and asked to identify the color they had just seen. The low saturation chips are chosen because they will produce high error rates. The key data is what sorts of errors participants make. If they tend to mistakenly choose other chips from the same color category in English, as English participants do, then we can infer that color categories are universal. However, if there errors tend to involve choosing chips that have the same name in Berinmo or Himba, then we can reason that color terms affect color perception and memory, and thus that color categorization is not universal. This would be strong evidence for linguistic relativity in color concepts. And that’s what Roberson and her colleagues found. Berinmo participants tended to make errors consistent with their color names, while Himba participants made errors consistent with theirs. Neither made errors consistent with English color names (the correlations between naming and memory for Himba participants were r = .559 for memory and Himba names and r = .036 memory and English names; the correlations were similar for Berinmo vs. English for Berinmo speakers).
To provide further evidence that color names affect color categories, Roberson et al. (2000) and Roberson et al. (2005) conducted an experiment on categorical perception with the Berinmo and Himba. In categorical perception, within-category exemplars tend to be treated as more similar than between-category exemplars, even when the between-category exemplars are more similar physically. This is particularly interesting in color perception: a color exemplar classified as red will be more similar to other exemplars of red, particularly the best examples of red, than it will be to examplars of neighboring colors, even if the exemplar falls on the physical boundary between the two colors and is thus closer, physically, to exemplars from the neighboring colors than it is to the best example of red. If Berinmo and Himba speakers demonstrate categorical perception effects consistent with their labels, but not with other labels (particularly English), then we can conclude that their color naming affects their color concepts.
To test this with Berinmo speakers, they tested participants on a category distinction present in English, but not Berinmo (green-blue) and one present in Berinmo, but not English (“nor” and “wor,” as in the graph above). They presented participants with three Munsell color chips, and asked them which two were the most similar to each other. Two of the chips were highly physically similar, i.e., they were close to each other in the physical color space. One of those two chips also shared the same label as the third chip. Thus, you might have two “wor” chips, and one “nor” chip, with the “nor” chip being physically more similar to one of the “nor” chips than the other “nor” chip. If participants consistently answer that the two “nor” chips are more similar than the physically similar “nor” and “wor” chips, then they will have exhibited a categorical perception effect. Furthermore, if they do not exhibit a categorical perception effect for the green-blue distinction (i.e., they pick the more physically similar chips when the choices are two classified as green and one as blue), then we can conclude that it is the naming, rather than any universal physiological aspect of color perception, that is driving the categorical perception effect. And that’s what Roberson et al. (2000) found for Berinmo. Roberson et al. (2005) found similar categorical perception effects for the Himba speakers.
It’s interesting that the Berinmo and Himba tribes have the same number of color terms, as well, because that rules out one possible alternative explanation of their data. It could be that as languages develop, they develop a more sophisticated color vocabulary, which eventually approximates the color categories that are actually innately present in our visual systems. We would expect, then, that two languages that are at similar levels of development (in other words, they both have the same number of color categories) would exhibit similar effects, but the speakers’ of the two languages remembered and perceived the colors differently. Thus it appears that languages do not develop towards any single set of universal color categories. In fact, Roberson et al. (2004) reported a longitudinal study that implies that exactly the opposite may be the case4. They found that children in the Himba tribe, and English-speaking children in the U.S., initially categorized color chips in a similar way, but as they grew older and more familiar with the color terms of their languages, their categorizations diverged, and became more consistent with their color names. This is particularly strong evidence that color names affect color concepts.
It appears, then, that Roberson and her colleagues have laid their hands on the Sapir-Whorf hypothesis and raised it from the dead with their experiments using members of the Berinmo and Himba tribes. We should, of course, take these results with a healthy dose of skepticism, because it does involve testing people in very different languages and cultures and comparing their results, which, as I said earlier, is a big problem. However, the growing body of evidence from Roberson and her colleagues’ experiments is hard to deny. I don’t know about you folks, but I find the revival of Sapir-Whorf incredibly exciting.
1 Heider, E.R. (1972). Universals in color naming and memory. Journal of Experimental Psychology, 93, 10-20.
2 Roberson, D., Davies I. & Davidoff, J. (2000) Colour categories are not universal: Replications and new evidence from a Stone-age culture. Journal of Experimental Psychology: General , 129, 369-398.
3 The Himba graph is from Roberson, D., Davidoff, J., Davies, I. & Shapiro, L. (2005) Colour categories in Himba: Evidence for the cultural relativity hypothesis. Cognitive Psychology, 50, 378-411.
4 Roberson, D., Davidoff, J., Davies, I.R.L. & Shapiro, L. R. (2004) The Development of Color Categories in Two languages: a longitudinal study. Journal of Experimental Psychology: General, 133, 554-571.
UPDATE 10/16/06: I just re-added the images, which were showing up when I first reposted this, but subsequently disappeared. I hope that didn’t lead to any confusion.