One of the mysteries of genome-wide association studies (“GWAS”) is the problem of ‘missing heritability’: quantitative genetics indicates that a trait (e.g., height, heart disease) has a significant genetic component, but the genetic variation we can link to that trait only explains a small amount of the suggested heritability. Christophe Lambert describes why he thinks GWAS hasn’t had that much success so far:
One major limitation is that the microarrays used in most major GWAS efforts to date employ common genetic variants originally identified in a rather small number of presumably healthy people (HapMap Phase I). Many high-profile and heavily researched diseases, such as Type 1 diabetes, are really not so common, appearing in 1 person out of, perhaps, 500-800. Why, then, should we expect that common genetic polymorphisms found in a handful of HapMap individuals would be linked to the causes of disease in the relatively small proportion of people who have Type 1 diabetes?
To the extent we do find linkage with disease, these are weak correlations:
Admittedly, we will find some additional signal if we use massive sample sizes, but we will still be missing the bulk of the heritability because of one important mathematical fact: correlation does not obey a transitive relationship. If A is correlated with B, and B with C, then A is not necessarily correlated to C, unless the correlation is perfect. The first generation of microarrays operated off the premise that nearby SNPs in linkage disequilibrium will be sufficiently correlated with the causative SNP to get in the ballpark of the causative variant.
I’m inclined to agree, but what I don’t understand is his argument that we should be looking for rare variants of strong effect:
Some who espouse the rare variant hypothesis say that there will be many, many rare variants that add up to explaining the missing heritability. I’m inclined to think that for most diseases there will be relatively few, and that we just haven’t found them yet….
For me there is a disconnect between demonstrating that thousands of variables in a regression model together have a high correlation with height, and concluding that therefore there must be thousands of weakly penetrant causes. Rather the data also seems consistent with there being many weak correlations with potentially quite few untyped causal variants….
Why wouldn’t multiple alleles of weak effect be plausible? Consider height: we all have it, just some of us have more of it than others. A classic population genetics approach would argue that if there’s any kind of balancing selection (i.e., don’t be too short or too tall), then we would expect to see the evolution of alleles of small effect, followed by modifiers of those alleles, leading to a big, ugly, hairy network of alleles (I do find it difficult to believe there are thousands, but dozens of weak effect wouldn’t surprise me at all).
If we move on to genetic disease (e.g., heart disease), these strike me as alleles that fit the nearly-neutral model (unless you want to make many, many adaptationist tradeoff stories) that have been kept from extinction by being linked to other strongly-selected genes. While to us, heart disease ‘genes’ that kill us in middle age are seen as highly deleterious, it’s not clear that they, from a fitness vantage point, are very costly. I could see a lot of weak alleles, building up, and being purged slowly, if it at all.
Of course, data could resolve this.
Discuss (and tell me what obvious stuff I’m missing).