Evolution for Everyone

In the storybook portrayal of science, theories are tested by experiments, which are conducted in laboratories so that the conditions can be rigorously controlled.

How would group selection be tested in the laboratory? Let’s begin with the thousands of selection experiments that have already been conducted in the laboratory at the individual level. A population of animals, such as fruit flies or chickens, is measured for a particular trait, such as bristle number or egg productivity. Individuals that score high or low (depending upon the desired direction of selection) for the trait are selected to breed the next generation. If the average value of the trait in the offspring generation shifts in the direction of selection, then the trait is heritable and there has been a response to selection. Over many generations, artificial selection can cause organisms to become completely different from their ancestors, as our domesticated plants and animals attest.

Group selection can be studied in the laboratory by a simple extension of the protocol outlined above. A population of groups is created, a particular trait is measured for the groups, and the highest (or lowest) scoring groups are used to breed the next generation. If the average value of the trait in the offspring generation shifts in the direction of selection, then group selection is proven to be efficacious, at least under the conditions of the laboratory experiment.

To make the procedure less abstract, consider my favorite group selection experiment, which I have written about in Evolution for Everyone. William Muir, an animal breeder at Purdue University, selected for egg productivity in hens in two different ways. Both involved housing hens in cages (groups), which is standard practice in the poultry industry. The first method involved selecting the most productive hen within each cage to breed the next generation of hens. The second method involved selecting the most productive cages and using all the hens from those cages to breed the next generation of hens. It might seem that this is a subtle difference, that the same trait (egg productivity) should be selected in both cases, and that the first method should be more efficacious. After all, eggs are produced by individual hens, so why not directly select the best? Why select at the group level, when even the best groups might have some individual duds?

The results told a completely different story. The first method caused egg productivity to perversely decline, even though the most productive hens were chosen each and every generation. The second method caused egg productivity to increase 160 percent in six generations, an astonishing response as artificial selection experiments go.

What happened? If you’ve been paying attention to my Truth and Reconciliation blogs, you’ll recognize a classic case of multilevel selection. Natural selection within groups is sensitive only to relative fitness, relentlessly favoring hens who lay more eggs than their neighbors. The first method favored the nastiest hens who achieved their productivity by suppressing the productivity of other hens. After six generations, Muir had produced a nation of psychopaths, who plucked and murdered each other in their incessant attacks. No wonder egg productivity plummeted! It would be hard to imagine a more graphic example of what I have called “the original problem” throughout this series of blogs; traits that are “for the good of the group” are not always locally advantageous within the group and require a process of group-level selection to evolve.

That’s why the second method worked. Selecting the most productive groups favored peaceful and cooperative hens, despite their selective disadvantage within groups. Moreover, group-level selection was sufficiently strong to successfully counteract selection within groups, which was taking place within cages for the second method, just as much as the first. Muir’s experiment proves the efficacy of group selection, at least under the conditions of the experiment.

By the way, the groups of chickens were siblings. When some of my colleagues learn this fact, they shout “Aha! It’s kin selection, not group selection!” Wrong. The groups were siblings in both methods, so their kinship cannot explain the difference between the methods. As I relate in T&R XIII, everything about kin selection theory can be understood in terms of the parameters of multilevel selection theory. Creating groups of siblings caused the psychopaths to cluster in some groups and the peaceniks to cluster in other groups, providing lots of variation to select upon at the group level. Psychopaths still beat peaceniks within each group; the fact that they were siblings is beside the point.

This experiment also raises important questions about what counts as an individual trait. Egg productivity seems like an individual trait because you can count the eggs coming out the hind end of a hen. The experiment reveals that egg productivity is in fact a highly social trait that depends upon the genetic composition of one’s group, not just the individual’s genetic composition. This example has profound consequences for how we think about human traits that seem individual but in fact are highly social.

This is only one of many experiments demonstrating the efficacy of group selection in the laboratory, for creatures as diverse as insects, plants, and vertebrates. In 1997, I organized a symposium on multilevel selection in my capacity as Vice President of the American Society of Naturalists, one of the premier evolution-oriented societies. The symposium took place at their annual meeting and was published as a special issue in the American Naturalist, arguably the premier journal for evolutionary research at the time. Among the speakers was John Maynard Smith, one of the premier evolutionists in the world and a chief critic of group selection, as I recount in T&R VIII and IX. I mention these credentials not to boast, but to emphasize how much the symposium occupied center stage in the world of evolutionary biology.

Another speaker at the symposium was Charles Goodnight, a student of Michael Wade, who conducted the first group selection experiments in the 1970’s. Charles reviewed the literature and concluded that every group selection experiment conducted in the laboratory demonstrated an efficacious role for between-group selection, even when between-group selection was opposed by within-group selection. You can judge for yourself from the published version of Charles’ talk (co-authored with Laurie Stevens) titled “Experimental Studies of Group Selection: What Do They Tell Us About Group Selection in Nature?”, which is available on his website.

As Charles recently recounted to me, he was convinced that his talk would be a career-maker for himself and a turning point for acceptance of group selection. Why not, given the import of his conclusions for one of the most important controversies in evolutionary theory, the prominent forum, and the likes of John Maynard Smith in the audience? He was sorely disappointed. The laboratory evidence for group selection had virtually no impact on the acceptance of group selection by the evolutionary community at large. So much for the storybook portrayal of science.

The only legitimate reason to discount the results of a laboratory experiment is when the conditions are highly artificial, therefore irrelevant to real-world situations. But this is not the case for the group selection experiments, in which groups are formed much as they might form in nature. Moreover, the whole beauty of laboratory experiments is that conditions can be varied in a systematic fashion. If a critic thinks that the conditions of one experiment are artificial, the answer is to conduct another experiment, not to discount laboratory evidence entirely.

In fact, group selection is so efficacious in the laboratory that even the proponents of group selection were surprised. As often happens, the laboratory experiments revealed factors operating in real biological systems that were beyond the imagination of the theorists. In particular, the theorists had limited their models to traits with a simple genetic basis, such as selfish and altruistic genes that code directly for selfish and altruistic behaviors. Given this assumption, phenotypic variation among groups corresponds directly to genetic variation among groups, which in turn depends critically on the number of individuals initiating each group. The larger the initial group size, the less variation among groups for between-group selection to act upon. That is the entire import of kin selection and the early conclusion that group selection requires special conditions, such as small initial group size, to be efficacious.

In the laboratory, groups vary substantially at the phenotypic level, even when they are initiated by large numbers of individuals — because the relationship between the genetic composition of a group and its phenotype is complex rather than simple. Even when groups initially vary by only a small amount, complex interactions within groups causes them to become more variable over time, a kind of “butterfly effect” that also accounts for why complex physical systems, such as the weather, are so variable.

An experiment that I performed my former student, William Swenson, will make this idea less abstract. We grew a fast growing plant called Arabidopsis in small flowerpots. The soil was sterilized except for a slurry of six grams of unsterilized soil from a single well-mixed source. To be precise, to make the slurry, we placed unsterilized soil and sterilized water in a kitchen blender and blended it like crazy before delivering six grams of soil to each pot of sterilized soil. If you know anything about microbiology, you know that millions and millions of microbes comprising hundreds and hundreds of species are contained in a single gram of soil. Thus, the initial variation among pots in the genetic and species composition of the soil microbes was vanishingly small.

We grew the pots under constant environmental conditions until the plants were large enough to harvest. We weighed the biomass of the plants and performed a standard artificial selection experiment with a single twist. Instead of the selecting the largest or smallest plants to breed for the next generation, we selected the soil from under the largest and smallest plants (in separate treatments) to make into a slurry and inoculate the next generation of pots. In other words, we were selecting at the level of whole microbial ecosystems rather than at the level of individual plants. Plant biomass was being used as a phenotypic trait of the ecosystem.

Even though the initial variation among pots was miniscule based on the large number of microbes colonizing each pot, variation did not stay miniscule because each pot was a complex biological system. Just as a butterfly flapping its wings can change the trajectory of a complex physical system such as the weather, each pot embarked upon a separate trajectory during the course of the first plant generation. This was apparent even to the naked eye; for example, some pots but not others developed a mat of algae over the surface of the soil. These differences made a difference for plant growth, so that by selecting the soil from beneath the largest and smallest plants, we were selecting microbial ecosystems that caused the plants to become large or small. Over a number of ecosystem “generations” (each comprising many microbial generations), the high and low selected lines diverged from each other — proof that variation among ecosystems was heritable. This work was published in one of the nation’s premier science journals, the Proceedings of the National Academy of Sciences. Once again, I say this not to boast, but to demonstrate that if multilevel selection experiments in the laboratory have failed to have an impact, it is not for lack of legitimacy or exposure. As with Goodnight’s review of the literature, however, our experiment had virtually no impact on attitudes about group selection by the evolutionary community at large.

Muir’s chicken experiment and our soil ecosystem experiment show that multilevel selection is not just an arcane scientific subject; it can be put to practical use. The eggs in your refrigerator come from group selected hens, regardless of what you might think of group selection. In a second set of experiments, Bill Swenson and I selected microbial ecosystems to degrade a toxic compound. In his current research, Bill Muir and his colleagues are using group selection to create strains of livestock that have not been inadvertently selected to make each other miserable. What else might we select groups and ecosystems to do?

Multilevel selection experiments in the laboratory vividly illustrate why a truth and reconciliation process is needed for the subject of group selection. The experiments are published in the best journals because a core group of evolutionists does understand their import. For them, the storybook portrayal of science actually takes place. For the evolutionary community at large, however, the rules governing the acceptance and rejection of group selection are a different story.


  1. #1 george
    November 5, 2009

    Does the chicken selection experiment demonstrate ‘group selection’, or does it just demonstrate that chickens, like many animals, experience increased stress when removed from their groups?

  2. #2 piker
    November 5, 2009

    “Plant biomass was being used as a phenotypic trait of the ecosystem.”

    Which Dr. Wilson has also referred to in a paper as “Artificial ecosystem selection.”

    Group selection or directed group selection?

  3. #3 Ewan R
    November 5, 2009

    Isn’t another take on the chicken experiment that in the ‘group selection’ method you are applying two pressures on the chickens rather than just the one:-

    In the individual selection method you pick a single phenotype – more eggs than neighbours, which obviously didn’t just select for ‘productivity’ – a chicken which destroys neighboring eggs/chickens will have more success but not due to any increase in productivity(just relative productivity).

    In the ‘group selection’ version chickens which are psychotic are individually selected against because they by virtue of their psychoticness end up in groups with less production – I wonder if the same result could have been achieved had the first experiment taken into account this psychosis. Arguably in this case (I think) you were still selecting individuals, albeit indirectly – by punishing nutjobs.

  4. #4 piker
    November 5, 2009

    The experiments didn’t select for psychopathic chickens, they made the more vulnerable chickens adopt psychopathic behavior strategies to their own detriment. Then punished them, somewhat like god supposedly made sinners and punished them for acquiescing to that making.

  5. #5 Guy
    November 5, 2009

    Ewan – Your question reflects a kind of criticism of experimental group selection experiments that I have often seen. Every level of selection in a multilevel view is composed of lower level agents. Individuals, for example, can be decomposed into a of a group of genes. No matter which level an experimenter imposes selection, lower level agents are caught up in the selected entities all the way down the scale of organizational levels. So when you select on groups of individuals there are particular individuals that get swept along, as well as particular genes, etc.. The fact that some individuals were retained in the experiment and others were not does not mean that selection was imposed at the individual level in the process. Drift is the term we use to describe sampling without selection and an experimental design for group selection could impose selection at the group level and only drift at the individual level. It is also possible for group selection to entail biased selection at other levels. For example, group selection for high egg-producing chicken groups may, as a side-effect, also impose selection at the individual level for being social (or at least benign). How then should we interpret the process of selection overall? In my view this is pretty easy to answer given an experimental design. In the chicken experiment groups were selected to seed the next generation without regard to their composition, so it is a clear case of group selection inducing individual selection secondarily. I can see why this may be unconvincing to a staunch individual selectionist. They may claim that the entire effect is explainable by individual selection because of this induction. I personally think this is an irrational interpretation showing bias, rather than objectivity, but some degree of bias is universal. Perhaps it would be more convincing if an experiment could be devised where group selection induced only drift at the individual level (and below) to illustrate group selection in a more pure form. In the bigger picture of multilevel selection theory, however, I wouldn’t expect selection to act in an uncoupled way across levels very often. So the coupling observed in experiments like the one with chickens probably gives us a better understanding of how selection generally works across levels.

  6. #6 bob koepp
    November 5, 2009

    We can extend the analogy with the punishment of sinners 8^)

    Corresponding to original sin… the acquired psychopathic behaviors were inherited by the poor chicken’s offspring, unto the third and fourth generations…

    Wait, genetic inheritance doesn’t work that way. Oops. Never mind.

  7. #7 piker
    November 5, 2009

    bob koepp,
    Since Dr. Wilson has already conceded they were selecting for behaviors that were cultural adaptations, at least on other than the first group level (thus avoiding the mutual exclusivity problem among various others mentioned), the psychopathic behaviors were not genetically inherited and the analogy retains its consistency. Oops, never mind if that was beyond your capacity to understand.

  8. #8 bob koepp
    November 5, 2009

    piker – Lighten up a bit. What part of 8^) do you not understand?

  9. #9 piker
    November 5, 2009

    I confess I don’t process emoticons well, if at all, and have no udea what 8^ is supposed to represent. Based on your other commentary, my guess was that it wasn’t some term of endearment.

  10. #10 Alan Tabor
    November 5, 2009

    re: Ewan R sez “In the ‘group selection’ version chickens which are psychotic are individually selected.”

    This is correct. It’s >always< individuals that are selected. The issue is whether an idividual trait such as 'altruism' can become and stable evolutionary strategy despite the existence of cheaters and, if so, under what conditions could that happen.

  11. #11 Alan Tabor
    November 5, 2009

    (weird…I’ll try again) re: Evan R sez – “In the ‘group selection’ version chickens which are psychotic are individually selected.”

    This is correct and not the point at issue. It’s always individuals that are selected. What’s at issue is whether an individual trait such as ‘altruism’ can be an evolutionary stable strategy depite the existence of cheaters and under what conditions that trait might evolve. This model indicates under what conditions pro-social traits might win by allow groups of individuals to out-perform other groups of individuals.

  12. #12 piker
    November 5, 2009

    Based on experience growing up on a farm, I’d hold that chickens are seldom if ever psychotic. In any case, this was about psychopathic behavior, which would be a behavioral anomaly, but not likely to be a heritable trait. And not prevalent as a behavior if chickens are allowed to select their own pecking orders. Some chickens will have a natural place in that order, others may have to fight for their place.
    Screw up that order deliberately, especially by putting a number of naturally dominant chickens together, and they fight, rules of the game out the coop. Why this behavior was repeated in a generational progression, I don’t know, except that once dominant traits were selected for by the experimenters this likely acted to throw the progression of the usual array of traits to the next generation out of balance. Causing more fighting to put the naturally dominant in more and more secondary or worse positions. Which such chickens will never completely accept as their due. Which makes them the most vulnerable to being driven a bit nuts as well. And driven that way by direction of the experimental operators, not by group selection per se.

  13. #13 David Sloan Wilson
    November 5, 2009

    I was using poetic license with the word psychotic. To be more accurate, the within-group selection experiment favored the most aggressive hens, aggressiveness is highly heritable, so high aggressiveness (first experiment) and low aggressiveness (second experiment) can be selected in six generations, which is the duration of the experiment. The genetic artificial selection story is pretty simple.

    One of the most difficult points to get about multilevel selection is that individual selection (=within-group selection) is defined entirely on the basis of local fitness differentials. This is in contrast to other theoretical frameworks, which average the fitness of genes and/or individuals across groups and/or time to derive what evolves in the total population. For example, gene fitness in selfish gene theory is the fitness of genes averaged across all contexts, not within groups. Individual selection in game theory is the fitness of individual strategies averaged across all groups of size N, not fitness within groups. Inclusive fitness theory calculated the effect of a behavioral act on self and others, weighted by their relatedness. All of these frameworks reach the same conclusion about what evolves in the total population, employing their own definitions of terms such as “individual selection”. There are truly different languages that use the same words requiring a lot of sophistication to understand the meaning of words in their contexts. That’s why I keep stressing that there is a factual issue at stake regardless of the framework within which it is examined: Can a trait evolve globally when it is locally disadvantageous?

    A word about norms of discourse: We should all strive to be vigorous but cordial and respectful. I might be “aggressive” in critiquing various positions, but I haven’t belittled anyone with phrases such as “beyond your capacity to understand”, except maybe toward Richard Dawkins, who I must confess I love to hate. Name calling is hitting below the belt for scientific discourse. I suspect that all of us are smart and well informed in our own ways. If we disagree, it is for more interesting reasons than lacking basic intelligence. Fire away if you think that I fail to take my own advice.

  14. #14 piker
    November 5, 2009

    Hey, when the guy gives me a WTF sign and then some sort of finger icon, I felt it only fair to speculate as to his intelligence.
    And if I was somewhat correct about the chickens, was I not correct that the manipulation by the human experimenters somewhat compromises the experiment as to the effects to be expected from “manipulation” by groups that are left to their own selective devices?

    And how is this significantly different from other selective breeding for traits, such as those found highly heritable in dogs? Which, I might add were not selected out from their wolf ancestors before or since by any of the groups wolves are prone to belong to, and migrate between, etc.

  15. #15 David Sloan Wilson
    November 5, 2009

    To Piker: I was not trying to single you out. You’re right that the experimenter, not nature, defines fitness in an artificial selection experiment, and I myself make the point that artificial group selection is just good old artificial selection with a twist: we’re selecting the properties of groups rather than individuals. But that twist is exactly what we need to evaluate the claim that within-group selection is always weak compared to between-group selection. If groups composed in the laboratory show lots of phenotypic variation, for example, then groups composed in the field should also, regardless of whether the selection is natural or artificial. In this fashion, what we discover about artificial multilevel selection in the laboratory is relevant to natural multilevel selection in the field.

  16. #16 piker
    November 5, 2009

    Well have you considered that there may be a commonality in the general area of directed selection as well, as some here have already alluded to?

  17. #17 Bob O'H
    November 6, 2009

    As Charles recently recounted to me, he was convinced that his talk would be a career-maker for himself and a turning point for acceptance of group selection. Why not, given the import of his conclusions for one of the most important controversies in evolutionary theory, the prominent forum, and the likes of John Maynard Smith in the audience?

    Because it was too late by then – people had moved on. For example, this from 1999. It’s an old debate that I think most people are bored of.

New comments have been disabled.