David Goldstein, a geneticist at Duke, has critiqued the current focus on large-scale genomwide associations before. Now he is taking to the next step, as his group has a paper out which suggests that the reason that association studies have been relatively unfruitful in terms of bang-for-buck is due to the fact that they’re picking up “synthetic associations.” Rare Variants Create Synthetic Genome-Wide Associations:
It has long been assumed that common genetic variants of modest effect make an important contribution to common human diseases, such as most forms of cardiovascular disease, asthma, and neuropsychiatric disease. Genome-wide scans evaluating the role of common variation have now been completed for all common disease using technology that claims to capture greater than 90% of common variants in major human populations. Surprisingly, the proportion of variation explained by common variation appears to be very modest, and moreover, there are very few examples of the actual variant being identified. At the same time, rare variants have been found with very large effects. Now it is demonstrated in a simulation study that even those signals that have been detected for common variants could, in principle, come from the effect of rare ones. This has important implications for our understanding of the genetic architecture of human disease and in the design of future studies to detect causal genetic variants.
The conclusion in the discussion elaborates on the relevance:
… Under our model, the causal sites are both rare and relatively high-penetrant contributors to disease, and will therefore be unlikely to be detected in a small number of control samples. Finally, the focus of attention on genes that are near GWAS signals may be incomplete or misleading in that the actual causal sites may occur in many different genes surrounding the implicated common variant. It is also worth emphasizing that as few as one or two rare variants, at much lower frequency than the associated common SNP, can create a significant synthetic association. In such a case, sequencing a small number of cases that carry the “at risk” common variant might miss entirely the causal rare variants even if the correct genome region is resequenced. These considerations argue for caution in efforts to resequence around genome-wide associations and argue instead that genome-wide sequencing in carefully phenotyped cohorts might be a better use of resources.
PLoS thought that this paper was important enough to commission and accompanying article, Common Disease, Multiple Rare (and Distant) Variants:
The consequence, the authors suggest, is that sequencing near the SNP to find “the” causative gene will often be fruitless, and many causative genes will be missed if that is the only approach taken.
The alternative, whole-genome sequencing, is becoming increasingly practical, and offers the possibility of finding any variant, no matter how far away. But how will it be possible to pick out the needle of a causative variant in the haystack of genomic variability, if it is no longer right next to the signpost? Under the assumption that the variant exerts only a weak effect, it probably wouldn’t be. Weak effects are thought to be due to subtle changes that still retain functionality of the encoded protein, like a dimmer switch on a light bulb. The genome is loaded with these kinds of variants, and most of them won’t be involved in the disease.
But the weak-effect assumption may be wrong as well, since it rests on the assumption that the variant is common. If instead the variant is rare, its effect could be strong–not just contributing to the disease, but causing it–more like an on-off switch, but one that only a few people have. In that case, the sought-after variant is likely to be a classic kind of mutation–a nonsense sequence, for example–that is easy to find.
If this model is correct, it suggests that a SNP association in a GWA study may be pointing not at one gene, but lots of them; that these genes are likely to have stronger and perhaps easier-to-understand effects than presumed; and that finding these genes is likely to be simpler than has been the case so far. If the authors are right that some of the signals are synthetic, GWA results may be of particular value in interpreting the results of whole-genome sequencing studies. Focussing attention on regions of the genome that show GWA signals may help to identify likely the causal variants amongst the millions of variants identified in any sequencing study.
You probably know that the genetics of height and intelligence have not been revolutionized by genomic techniques. One assumption is that the effect sizes of height & IQ QTLs are too small to be detected by GWAS. But on the other hand to my knowledge these quantitative traits haven’t been elucidated very well by family based linkage studies either, which should pick up QTLs of large effect which are rare. In the Goldstein paper he points out that GWAS have’t been easily replicated across populations, and argues that the reason for this is that very rare alleles of large effect size won’t be common and span geographic locales.
It seems that what is being argued here is that the genetic architecture of many traits of interest are going to be in the blind spot whereby the QTLs are more common and of lower effect size and penetrance than can be detected by linkage studies, but are rarer than can be picked up usefully by GWAS of a few hundred individuals. The idea that quantitative traits like height and intelligence, or diseases such as schizophrenia, might be controlled by fewer large effect QTLs is appealing in some ways because it can explain more easily the variance across siblings in very heritable traits. The smaller the number of relevant QTLs, the greater the expected sample variance. On the other hand, it is notable that one of Goldstein’s test cases within the paper is sickle-cell disease, which is one of the few undisputed cases of heterozygote advantage in human genetics. These associations don’t exist unperturbed by other evolutionary dynamics.
In any case, here’s a quote from Goldstein:
“This tells us that we will surely need to turn to more comprehensive whole genome sequencing studies of more carefully selected subjects if we want to discover more meaningful relationships between genetic variation and disease,” says Goldstein. “While such studies are undoubtedly more complex, expensive and time-consuming, we really have no choice if we want to deepen our knowledge about the genetic underpinnings of human disease.”
Samuel P. Dickson, Kai Wang, Ian Krantz, Hakon Hakonarson, David B. Goldstein. Rare Variants Create Synthetic Genome-Wide Associations. PLoS Biology, 2010; 8 (1): e1000294 DOI: 10.1371/journal.pbio.1000294
Richard Robinson. Common Disease, Multiple Rare (and Distant) Variants. PLoS Biology, 2010; 8 (1): e1000293 DOI: 10.1371/journal.pbio.1000293