Mike the Mad Biologist

One of the mysteries of genome-wide association studies (“GWAS”) is the problem of ‘missing heritability’: quantitative genetics indicates that a trait (e.g., height, heart disease) has a significant genetic component, but the genetic variation we can link to that trait only explains a small amount of the suggested heritability. Christophe Lambert describes why he thinks GWAS hasn’t had that much success so far:

One major limitation is that the microarrays used in most major GWAS efforts to date employ common genetic variants originally identified in a rather small number of presumably healthy people (HapMap Phase I). Many high-profile and heavily researched diseases, such as Type 1 diabetes, are really not so common, appearing in 1 person out of, perhaps, 500-800. Why, then, should we expect that common genetic polymorphisms found in a handful of HapMap individuals would be linked to the causes of disease in the relatively small proportion of people who have Type 1 diabetes?

To the extent we do find linkage with disease, these are weak correlations:

Admittedly, we will find some additional signal if we use massive sample sizes, but we will still be missing the bulk of the heritability because of one important mathematical fact: correlation does not obey a transitive relationship. If A is correlated with B, and B with C, then A is not necessarily correlated to C, unless the correlation is perfect. The first generation of microarrays operated off the premise that nearby SNPs in linkage disequilibrium will be sufficiently correlated with the causative SNP to get in the ballpark of the causative variant.

I’m inclined to agree, but what I don’t understand is his argument that we should be looking for rare variants of strong effect:

Some who espouse the rare variant hypothesis say that there will be many, many rare variants that add up to explaining the missing heritability. I’m inclined to think that for most diseases there will be relatively few, and that we just haven’t found them yet….

For me there is a disconnect between demonstrating that thousands of variables in a regression model together have a high correlation with height, and concluding that therefore there must be thousands of weakly penetrant causes. Rather the data also seems consistent with there being many weak correlations with potentially quite few untyped causal variants….

Why wouldn’t multiple alleles of weak effect be plausible? Consider height: we all have it, just some of us have more of it than others. A classic population genetics approach would argue that if there’s any kind of balancing selection (i.e., don’t be too short or too tall), then we would expect to see the evolution of alleles of small effect, followed by modifiers of those alleles, leading to a big, ugly, hairy network of alleles (I do find it difficult to believe there are thousands, but dozens of weak effect wouldn’t surprise me at all).

If we move on to genetic disease (e.g., heart disease), these strike me as alleles that fit the nearly-neutral model (unless you want to make many, many adaptationist tradeoff stories) that have been kept from extinction by being linked to other strongly-selected genes. While to us, heart disease ‘genes’ that kill us in middle age are seen as highly deleterious, it’s not clear that they, from a fitness vantage point, are very costly. I could see a lot of weak alleles, building up, and being purged slowly, if it at all.

Of course, data could resolve this.

Discuss (and tell me what obvious stuff I’m missing).


  1. #1 Josef Uyeda
    August 30, 2010

    Why not hundreds of genes of small effect? When I think of something like body size, that is a complex expression of phenotype generated over the development of an organism. It seems reasonable that basal metabolic rate expressed over the course of the an individual’s development could affect body size at adulthood. If you accept that premise, then when I look at a network of metabolic pathways, I see a hell of a lot of mutational targets and a lot of ways to jiggle it one way or another. Sure there’s a few large effect loci, but think about how the trait is expressed, and how complex it is, and I don’t think it’s at all unreasonable that it’s hundreds, maybe even one thousand.

  2. #2 John Hawks
    August 30, 2010

    To have the same explanatory power on heritability, a given rare variant must have a larger effect than a given common variant. Beyond that, it depends, since “strong” and “weak” are totally meaningless until we look at actual cases.

    We aren’t talking about Mendelian lethals, so the rare variants are mostly in the territory of slightly deleterious to nearly neutral. “Strong” is about the effect on the phenotype of interest, not on fitness.

    It’s then an empirical question — how many could there be?

    The number is not endless, because standing variation in humans isn’t very great. Beyond that, I don’t think “strong” means very much, other than if the alleles were common, they could be weaker and have the same explanatory power on heritability.

  3. #3 daedalus2u
    September 1, 2010

    You and all the other gene researchers are really not going to like the actual answer; which is why no one doing genetics of complex genetic “diseases” has allowed themselves to even think along those lines.

    The reason is because the missing heritability is “features” and not disorders.


    They are “features” from deep evolutionary time and so are deeply embedded in the genome, in multiple and very complex interactions that are very highly redundant and robust.

    The heart disease genes that kill people in what we now consider to be middle age helped their ancestors survive infancy, childhood, young adult hood (teens), real middle age (20′s to 30′s) and even helped in old age (40′s).

    Virtually all of the complex genetic diseases are made worse by stress. This is because they are actually “features” of a more robust stress response. Virtually all stress responses are from deep evolutionary time and are so thoroughly integrated into physiology that they can’t be separated out. They only show up when an organism is under stress.

    The archetypal stress is running from a predator. Diverting ATP away from ongoing maintenance and toward escape is a fabulously valuable stress response when you are running from a bear. When you are trying to meet a deadline for your abusive boss, not so much.

  4. #4 Dr Rob Peers
    February 17, 2011

    I am a doctor, and have never heard of common heart disease being genetic. It has been strongly linked with high ratio of dietary saturated to polyunsaturated fatty acids–the fatty Western diet, rich in dairy and meat fats and chocolate fat–a ratio that is reflected in cell membrane and mitochondrial membrane structure: polyunsaturate-deficient membranes cause insulin resistance (pre-diabetes) and mitochondrial hydrogen peroxide release (which oxidizes all tissues, including arteries and heart muscle). A few weak genes won’t do much, in the absence of this ubiquitous oxidizing diet.