Selection, drift, disease and complexity, all rolled into one....

One of the great things about evolutionary genetics is that it is such a diverse field in terms of the cognitive toolkit which one must access as a matter of course. Since R. A. Fisher's The Genetical Theory of Natural Selection (along with the contemporaneous work of Sewall Wright and J. B. S. Haldane) we've been habituated toward thinking of evolutionary processes on an abstract level which might allow us to make general deductive inferences from first principles. Genetic drift, selection, migration, etc., are parameters which are used to construct models that allow us to generate predictions and obtain deeper insight. The discovery of DNA and the elucidation of the biophysical substrate which constrains the modes of inheritance in a concrete manner opened up the startling vistas of molecular evolutionary genetics. This discipline has allowed an inspection of how the predictions of evolutionary theory are born out on a more fine grained level. And today the genomics revolution is ramping up the data sets as computational power enables more powerful extraction of the patterns and dynamics which emerge out of these discrete streams of information.

But this is all rather philosophical and abstract. Yesterday I posted on a paper which showed how selection and drift might have operated upon the frequency of an allele which has a disease implication, and so has a pragmatic impact on quality of life. Today I'd like to bring your attention to another paper which synthesizes the big picture ideas which might entail consequences in terms of the utilitarian details of daily life, Natural Selection on Genes that Underlie Human Disease Susceptibility:

What evolutionary forces shape genes that contribute to the risk of human disease? Do similar selective pressures act on alleles that underlie simple versus complex disorders ...Answers to these questions will shed light onto the origin of human disorders...and help to predict the population frequencies of alleles that contribute to disease risk, with important implications for the efficient design of mapping studies...As a first step toward addressing these questions, we created a hand-curated version of the Mendelian Inheritance in Man database (OMIM). We then examined selective pressures on Mendelian-disease genes, genes that contribute to complex-disease risk, and genes known to be essential in mouse by analyzing patterns of human polymorphism and of divergence between human and rhesus macaque. We found that Mendelian-disease genes appear to be under widespread purifying selection, especially when the disease mutations are dominant (rather than recessive). In contrast, the class of genes that influence complex-disease risk shows little signs of evolutionary conservation, possibly because this category includes targets of both purifying and positive selection.

i-c8347f15fb99a6daa1305f7f7b0a0fe5-natselfigure4.jpgTo the right you see Figure 4 from this paper. It's a chart which shows the cumulative distribution of the ratio of the nonsynonymous substitutions divided by synonymous substitutions (comparing between humans and rhesus macaques), where the latter in the paper is defined by Dn/Ds, so that Dn = nonsynonymous substitution and Ds = synonymous substitution. A substitution is a process where one allele, genetic variant, replaces another at a locus over time. Nonsynonymous refers to a change in the genome which should not change function, while a nonsynonymous may, because of the manner in which they alter the coding of amino acids from DNA. In this chart hOMIM refers to "hand-curated version" of Mendelian Inheritance in Man database. Complex refers to genes implicated in complex-disease susceptibilities (e.g., "You carry allele Scary, that means you have a 5% chance of developing a really scary disease by the age of 60"). Cancer, Essential and Other are pretty straightforward, at least in terms of label.

Does the results fit our expectations? First, the x-axis represents the value of Dn/Ds, while the y-axis shows the proportion of genes which fall at that threshold. The y-axis converges upon 1 because obviously you can't go over 100% by definition . The Cancer and Essential distributions are shifted furthest to the left. That means that a higher proportion of genes have low Dn/Ds, while you see that the Complex line is the one that rises up the most slowly and has the greatest number of genes which exhibit a high Dn/Ds (the color isn't clear at this resolution, but trust me, it is if you have higher res). hOMIM is somewhere in the middle. In short, neutrality is a more powerful force in the category of genes which are shifted further to the right, while forms of selection are presumably more powerful (purifying selection which removes mutants, or positive selection which fixed new mutants) for distributions shifted to the left.

I would have to say that the distributions here are not totally surprising based on other things we know, this is an empirical confirmation to a great extent of rules-of-thumb which many hold because of the theoretical and experimental insights of a century. For example, it is well known that complex-traits which exhibit a continuous distribution and are highly heritable tend to have weak fitness implications. Conversely, Mendelian diseases are usually classified as diseases for a reason! Additionally, the authors find that diseases which are expressed dominantly, that is, one copy results in the disease, have lower values of Dn/Ds, than those which express recessively so that two copies are necessary. This is what we would expect from the fact that when low frequency alleles which only express as homozygotes are segregating within the population randomly most copies are carried within heterozygotes who are not subject to selection; in other words, there is little purification of these genes unless their frequencies are very high as per Hardy-Weinberg. To make the difference between complex-disease loci and Mendelian ones more concrete, think of it in a non-disease context. Height is a quantitative trait, while eye color seems quasi-Mendelian. HMGA2 is a height locus which explains 0.3% of the variation within a population for the trait in question, while the region around OCA2 seems to account for 75% of the variation in blue-brown eye color. In addition the region around OCA2 may have been subject to selection and this selection may explain the difference in eye color across populations. It seems unlikely that we'll find strong signatures around height loci that explain the variation of height across populations.

As the human genome is mapped with greater detail, and the parameters (selection, drift, etc.) are teased apart we will gain some understanding of the genetic past of our species as well as its current trajectory. Loci which are implicated in large effects because of their variation are the low hanging fruit. The parameters are going to be easier to examine, after all, it's one locus (or two, or six, etc.). On the other hand, when you are finding genes which account for 0.1% of the variance of a trait, and are happy to find even those genes, as in height, the attention to detail and caution for confounds are necessary to separate the wheat from chaff. If you read the above paper closely you'll see that the words "ascertainment bias" crops up over and over, but unfortunately it is not as often found in popular press summaries of these sorts of results and inferences. The authors note that we are just at the beginning of mapping for complex-disease susceptibilities. The bioethical conundrums which emerged due to the discovery of smaller effect but more numerous breast cancer susceptibility loci last winter after the splash which BRCA1 made is a taste of what is to come. One thing is for sure, the future is going to be more complex....

Cite: Current Biology, Vol 18, 883-889, 24 June 2008, Natural Selection on Genes that Underlie Human Disease Susceptibility, Ran Blekhman, Orna Man, Leslie Herrmann, Adam R. Boyko, Amit Indap, Carolin Kosiol, Carlos D. Bustamante, Kosuke M. Teshima, and Molly Przeworski.

Tags

More like this

Nonsynonymous refers to a change in the genome which should not change function, while a synonymous may, because of the manner in which they alter the coding of amino acids from DNA.

Dude, you know you're backwards, right? I'm suspending your popgen license for one week :)

haha. i have an excuse, i can't spell and was cutting and pasting back and forth. ergo, i say in reference to Dn/Ds:
In short, neutrality is a more powerful force in the category of genes which are shifted further to the right, while forms of selection are presumably more powerful (purifying selection which removes mutants, or positive selection which fixed new mutants) for distributions shifted to the left.

don't let the lying prose fool you! so i think i warrant a ticket, not a suspension.

This is a great little paper. I'm most intrigued by the right-shifting of the "complex disease" curve relative to all other classes of gene; I guess this could be the result of relaxed purifying selection in these genes, but the authors' suggestion that this may be due to pervasive positive selection on complex disease genes is pretty appealing.

Oh, and kudos to the authors for hand-curating the OMIM database - that can't have been much fun...