Update on paper access: You can get it here already.
Note: I’m going to put a link roundup (updated) at this post. End Note
Genomic surveys in humans identify a large amount of recent positive selection. Using the 3.9-million HapMap SNP dataset, we found that selection has accelerated greatly during the last 40,000 years. We tested the null hypothesis that the observed age distribution of recent positively selected linkage blocks is consistent with a constant rate of adaptive substitution during human evolution. We show that a constant rate high enough to explain the number of recently selected variants would predict (i) site heterozygosity at least 10-fold lower than is observed in humans, (ii) a strong relationship of heterozygosity and local recombination rate, which is not observed in humans, (iii) an implausibly high number of adaptive substitutions between humans and chimpanzees, and (iv) nearly 100 times the observed number of high frequency linkage disequilibrium blocks. Larger populations generate more new selected mutations, and we show the consistency of the observed data with the historical pattern of human population growth. We consider human demographic growth to be linked with past changes in human cultures and ecologies. Both processes have contributed to the
This paper has long been in the works, and there are many moving part that go into making the machine operative. I’ll be focusing on more general evolutionary genetic theory and leave the nitty-gritty of genomic analysis to others (though I’ll survey that angle cursorily).
What’s the bottom line here? More mutations → more beneficial mutations → a higher rate of fixation of mutations → more shifts toward adaptive peaks → more rapid evolutionary (adaptive) change. Got it? The basic logic is pretty simple, but the most informative citation in my opinion in this work is to a 2003 Genetics paper, Models of Experimental Evolution: The Role of Genetic Chance and Selective Necessity. This is a theoretical biology paper heavy on technical math and simulation, but there’s plenty of intelligibility here. For example:
When the product Np, the population size times the per locus error rate, is small, the rate of evolution is limited by the chance occurrence of beneficial mutations; when Np is large and selective pressure is strong, the rate-limiting step is the waiting time while existing beneficial mutations sweep through the population.
One can think of “per locus error rate” as basically mutation rate. The paper suggests that Np = 1 is a major threshold, below this point stochastic effects are powerful and the ambient rate of mutation is a major parameter shaping adaptive evolution (not neutral evolution, which is not dependent on population size). When Np > 1 stochastic effects are less salient and selection is powerful enough to swamp random deviation so that it can drive beneficial mutations to fixation. The mutations exist in sufficient quantity, and selection operates powerfully enough, that substitution at loci from beneficial mutants is simply a matter of time; that is, the rate limiting step is the time it takes until the selective sweep completes due to differential reproduction of individuals who carry different alleles. Again, from the Genetics paper:
When Np is small, the waiting time for a beneficial mutation to occur may be long. When such a mutation does occur, it may be lost through drift or may be outcompeted by a different strain that does not share the same mutation; genetic chance is the rate-limiting step for evolution….
Alternatively, when Np is large, a substantial number of mutations are produced in every generation. In this case the entire neighborhood of genotype space surrounding the dominant genotype is explored–thoroughly and simultaneously…The stronger the selective pressure, the more likely it is that somewhere in this genetic neighborhood a genotype with some fitness advantage over the wild type exists. The fittest of such genotypes will necessarily outcompete its neighbors, and the rate-limiting step for evolution is the rate at which this genotype takes over the population and begins to explore a new genetic neighborhood. We note that these conditions hold, i.e., selective pressure is strong and Np is large, in a number of systems apart from experimental evolution, including the inhost evolution of human immunodeficiency virus, the evolution of antibiotic resistance, and possibly the development of cancer.
Why is the waiting time for a beneficial mutation long when Np is small? Let’s ignore p, assume it’s constant, and just vary N, the population size. Let’s imagine a simple model. Consider a population fixed at 1000 per generation. Now, consider a theoretical distribution of mutations of various fitnesses, let’s imagine that 0.0001 proportion of mutations per generation have an expected selection coefficient of 0.10, that is, it confers a ~10% fitness boost for most of the period it increases in frequency (remember that as alleles approach an adaptive peak and it replaces a previous variant its own fitness increase relative to the population mean will necessarily decrease). 10% is a very large selection coefficient, so the rareness of such mutations being produced by error processes in genetic transmission isn’t implausible. Additionally, let’s imagine that at a given locus there will be 1 mutation per generation per thousand replications. So,
1 mutation X 0.0001 10% selection coefficient per mutation = 0.0001 10% selection coefficient mutations per generation expected
0.0001 10% selection coefficient mutations? How does that work. Well, obviously mutations exist, or they don’t, so the 0.0001 is simply going to be the theoretical expectation around which there will be variance. On occasion such a mutation will manifest itself on that locus, but most of the time (generations) it won’t (the authors of the Genetics paper model this formally). In other words, in a population of 1000 individuals there won’t likely be a mutation of 10% selective benefit in a given generation. In fact, there’s a ~90% chance that across 1000 generations such a mutation won’t manifest within such a population!
Now let’s increase the N by 5 orders of magnitude, this results in 100000 mutations (remember, 1 mutation per thousand individuals):
100000 mutations X 0.0001 10% selection coefficient per mutation = 10 10% selection coefficient mutations per generation expected
So what changed? Note that because there are many more individuals there are many more replications, so more mutational events will occur. The expectation is now that there will be 100000 mutations out of a population of 100 million, more mutations means that there will be a greater absolute number of beneficial mutants assuming that the expected proportion remains the same. The probability of fixation of a mutant of additive power (that is, two copies is twice as beneficial as one copy) is 2s, where s is a selection coefficient. There is theoretically a 20% chance that any one of the 10 mutants will fix, but of course the model is more complex than the assumptions which underly this because there are so many copies in any given generation that the allele isn’t expanding into a space of wild type until fixation but rather competing with other alleles. In any case, the implication is the same, there are so many mutants that the question is not if it will fix and substitute at the locus but when. Most of the time even a beneficial mutant will go extinct, but if you have enough production of these mutants then the likelihood that they will all go extinct is rather low. That results in the inevitability of their increase in frequency and eventual fixation.
What is at work here is the difference between theoretical distributions and the realized genotypes which selection is liable to see. Selection and adaptive evolution do not work upon a 1/10th of a mutation; most of the frequency distribution when the expectation is well below 1 is irrelevant and of equal increased adaptive value, that is, none. On the other hand, most of the frequency distribution when the expectation is 10 is going to be available for selection. If one expects 10 mutants per generation of beneficial nature, even deviations so that 1, 2 or 3 mutants are produced are subject to selective pressure and so can be brought to bear on adaptive evolution. Or, imagine a normally distributed trait where positive selection only operates on trait values 3 standard deviations above the median. In an idealized normal distribution only 0.135% of the frequency distribution will be above this value. In a population of 100 only 0.135 individuals will be above the threshold. When was the last time you met 0.135 individuals? In a population of 10000 13.5 individuals are expected to be above the threshold, even assuming imperfect correspondence between the theoretical and realized distribution one expects that some individuals may be above the trait value and so pass on their characteristics to the next generation. One can imagine the dynamics here as one where increased population size increases the realized variation (the range of the distribution will be larger), and we know that increased variation (additive genetic variance) results in a greater response to selection.
The authors of Recent acceleration of human adaptive evolution are claiming that these somewhat abstruse distinctions between idealized and theoretical distributions and the discrete clumpy nature of reality, as well as selection’s blindness less than 1 of a kind, have resulted in some very peculiar evolutionary dynamics with important implications for our species. Since sometimes pictures are worth many words I’ve cropped out two figures from the paper and juxtaposed them. On top you can see human population growth over time on a log scale, so the increase really is much sharper than what you see. Below is a chart which displays the number of selected variants which began to rise in frequency particular time in the past for two populations, Africans (Yoruba) and Europeans (Utah Whites). The drop off in selected variants of very recent vintage is expected, as they are currently going to be at such low frequency that it is likely that they won’t be detectable by the methods being used (remember that we’re talking about sample sizes on the order of hundreds of people). But the authors point out that there seems to be a concomitant rise in adaptive mutations which began to be selected along with increased population size. Additionally, the greater time depth of African variants might be due to larger populations in the past vis-a-vis Europeans, who have only begun to outstrip the numbers of Africans with the rise of agriculture. Note that the population of Europeans over the last 50,000 years has increased by greater than 4 orders of magnitude! The equivalent values for Africans are only slightly more modest. This is the “N” above. We are now the most numerous large mammal on the face of this planet. Using the data above the authors imply that our species has been subject to somewhat more that 1/2 a substitution per year. Remember, a substitution is a replacement of one allele for another at a locus on a population wide scale. If this is correct that means right now every few years alleles driven by selection are being fixed within our species. According to Haldane’s Dilemma one would assume that for a given species 1 gene should be fixed for every 300 generations! (that’s 0.00013 substitutions per year). Either the results from tests for detecting natural selection are very wrong somewhere, or J.B.S. Haldane was working with faulty assumptions. This incredible rate of substitution may imply that the past is truly an alien landscape, today we see variation across human populations as a function of space (i.e., the geographic races), but it might be important to consider that the dimension of time is just as critical and historic populations in their current genetic and physical form are creatures of a new age.
In any case, the theory implies that the rate of substitution should naturally increase with increased effective population size. But the authors of the paper make the empirical case by assessing the data for the null hypothesis that the rate of substitution has been constant across human history, in other words, that the sweeps to fixation have been characterized by the same dynamics across time.
1) Heterozygosity is far greater than constant substitution rates would imply.
2) Because recombination breaks apart linkage blocks variation in rates of recombination should allow one to predict diversity as homogeneous segments of DNA are shuffled across (ancient linkage blocks generated by powerful selection events where long segments of DNA are dragged upward in frequency should be broken apart much more than recent blocks). This is generally not true.
3) The rate of substitution inferred from the detected recent selection extrapolated to the chimp-human point of separation would imply a much greater genetic divergence than is empirically observed.
4) If the rate of substitution was constant many more alleles should be approaching fixation, that is ~100% frequency within the population. This is not so, rather, many more alleles are in transient states, between low and high frequency.
The word transient is critical, the authors make the argument that our species is in a state of transition from relatively low population sizes to large population sizes. This results in a concomitant change in the underlying evolutionary genetic parameters as stochasticity becomes less important than selection in determining the fate of beneficial mutations, which are naturally extant at higher frequencies recently than they were in the past. Much hinges on the empirical evolutionary genomic data, and the nature of the tests to detect selection events at any given time. Obviously if the test employed are biased toward particular frequency ranges then that might pose problems. The main method is outlined in detail in an earlier PNAS paper, Global landscape of recent inferred Darwinian selection for Homo sapiens. Other bloggers of orders of magnitude more competance (who I will link to) have promised to address the evolutionary genomics in detail, so with that I’ll move on to some broader issues.
In the discussion the authors state:
…Demographic change may be the major driver of new adaptive evolution, but the detailed pattern must involve gene functions and gene-environment interactions.
Cultural and ecological changes in human populations may explain many details of the pattern. Human migrations into Eurasia created new selective pressures on features such as skin pigmentation, adaptation to cold, and diet…Over this time span, humans both inside and outside of Africa underwent rapid skeletal evolution…Some of the most radical new selective pressures have been associated with the transition to agriculture…For example, genes related to disease resistance are among the inferred functional classes most likely to show evidence of recent positive selection…Virulent epidemic diseases, including smallpox, malaria, yellow fever, typhus, and cholera, became important causes of mortality after the origin and spread of agriculture…Likewise, subsistence and dietary changes have led to selection on genes such as lactase….
The paper above focuses on an endogenous or intrinsic parameter, the number of mutations produced through the normal processes of genetic transmission, which have a discrete Mendelian manifestation and biophysical basis in DNA. In other words, one genetic parameter, the size of the breeding population, results in a change which would occur naturally ceteris paribus. Similarly, smaller population size will result in greater stochasticity. Increasing selection coefficients will result in more adaptive evolution. We know this from our armchairs through analysis of a priori Mendelian logic, comprehensible when understood in light of Meiosis and the limitations of DNA repair mechanisms. But there is another angle too: exogenous or extrinsic factors, natural selection as we conventionally understand it. These exterior parameters frame the selection coefficients which drive mutations toward extinction or sweep them up to fixation. And, importantly, some of these parameters are strongly shaped by population size as well. Consider disease, which is likely strongly subject to density and interconnectedness. A world with more humans is a world with more hosts, and remember that many of the arguments for increased rates of substitution derive from pathogen models whose effective population size is contingent upon warm bodies available to infect! Additionally, human population size likely has a strong affect on the rate of cultural evolution, which itself is a major driver of natural selection though changes in physical and social environment. Consider lactase persistence, the canonical example of a recently selected trait subject to gene-culture coevolution. In The Human Web the historian William H. McNeill argues that increased population sizes and more interconnected cultural centers have resulted in a synergistic acceleration of cultural evolution and a more robust world-wide civilization. Random natural disasters are less likely to cause the winking out of literate societies because there are multiple redundant centers of any given tradition, so that recolonization will almost certainly occur and recreation of a given trait de novo is not necessary. As our cultures have changed faster and faster selective pressures have likely also revved up. This might change the proportion of new mutants which are beneficial, as well as the nature of the selection coefficient.
Finally, now that we are addressing culture, it might be important to consider the possibility that recent acceleration of human adaptive evolution may not apply simply to genetic processes! It may be that a Lunar Society can only exist in a population of a certain size where a critical mass of oddballs are in communication. It may be that men of singular and incomprehensible intellect such as Isaac Newton are only imaginable in the sample space of possibilities when our base population has increased enough so that impossible genius crosses the threshold to improbable genius. This summer I had lunch with some economists and they kept talking about the idea that as human population increases the number of innovations would increase and so the frontier of the rate economic growth would naturally expand outward in value. Ultimately Nature is One, and insights and logic which bear fruit in one branch of science may be more relevant than we thought across the whole arc of creation.
Related: My post Important papers on recent human evolution has some links of note which can help bring you up to speed on the assumptions which suffuse the paper reviewed here.
Reference: Hawks J, Wang E. T., Cochran G. M., Harpending H. C., and Moyzis R. K., Recent acceleration of human adaptive evolution, PNAS (early online)