David Goldstein, a geneticist at Duke, has critiqued the current focus on large-scale genomwide associations before. Now he is taking to the next step, as his group has a paper out which suggests that the reason that association studies have been relatively unfruitful in terms of bang-for-buck is due to the fact that they're picking up "synthetic associations." Rare Variants Create Synthetic Genome-Wide Associations:
It has long been assumed that common genetic variants of modest effect make an important contribution to common human diseases, such as most forms of cardiovascular disease, asthma, and neuropsychiatric disease. Genome-wide scans evaluating the role of common variation have now been completed for all common disease using technology that claims to capture greater than 90% of common variants in major human populations. Surprisingly, the proportion of variation explained by common variation appears to be very modest, and moreover, there are very few examples of the actual variant being identified. At the same time, rare variants have been found with very large effects. Now it is demonstrated in a simulation study that even those signals that have been detected for common variants could, in principle, come from the effect of rare ones. This has important implications for our understanding of the genetic architecture of human disease and in the design of future studies to detect causal genetic variants.
The conclusion in the discussion elaborates on the relevance:
... Under our model, the causal sites are both rare and relatively high-penetrant contributors to disease, and will therefore be unlikely to be detected in a small number of control samples. Finally, the focus of attention on genes that are near GWAS signals may be incomplete or misleading in that the actual causal sites may occur in many different genes surrounding the implicated common variant. It is also worth emphasizing that as few as one or two rare variants, at much lower frequency than the associated common SNP, can create a significant synthetic association. In such a case, sequencing a small number of cases that carry the "at risk" common variant might miss entirely the causal rare variants even if the correct genome region is resequenced. These considerations argue for caution in efforts to resequence around genome-wide associations and argue instead that genome-wide sequencing in carefully phenotyped cohorts might be a better use of resources.
PLoS thought that this paper was important enough to commission and accompanying article, Common Disease, Multiple Rare (and Distant) Variants:
The consequence, the authors suggest, is that sequencing near the SNP to find "the" causative gene will often be fruitless, and many causative genes will be missed if that is the only approach taken.
The alternative, whole-genome sequencing, is becoming increasingly practical, and offers the possibility of finding any variant, no matter how far away. But how will it be possible to pick out the needle of a causative variant in the haystack of genomic variability, if it is no longer right next to the signpost? Under the assumption that the variant exerts only a weak effect, it probably wouldn't be. Weak effects are thought to be due to subtle changes that still retain functionality of the encoded protein, like a dimmer switch on a light bulb. The genome is loaded with these kinds of variants, and most of them won't be involved in the disease.
But the weak-effect assumption may be wrong as well, since it rests on the assumption that the variant is common. If instead the variant is rare, its effect could be strong--not just contributing to the disease, but causing it--more like an on-off switch, but one that only a few people have. In that case, the sought-after variant is likely to be a classic kind of mutation--a nonsense sequence, for example--that is easy to find.
If this model is correct, it suggests that a SNP association in a GWA study may be pointing not at one gene, but lots of them; that these genes are likely to have stronger and perhaps easier-to-understand effects than presumed; and that finding these genes is likely to be simpler than has been the case so far. If the authors are right that some of the signals are synthetic, GWA results may be of particular value in interpreting the results of whole-genome sequencing studies. Focussing attention on regions of the genome that show GWA signals may help to identify likely the causal variants amongst the millions of variants identified in any sequencing study.
You probably know that the genetics of height and intelligence have not been revolutionized by genomic techniques. One assumption is that the effect sizes of height & IQ QTLs are too small to be detected by GWAS. But on the other hand to my knowledge these quantitative traits haven't been elucidated very well by family based linkage studies either, which should pick up QTLs of large effect which are rare. In the Goldstein paper he points out that GWAS have't been easily replicated across populations, and argues that the reason for this is that very rare alleles of large effect size won't be common and span geographic locales.
It seems that what is being argued here is that the genetic architecture of many traits of interest are going to be in the blind spot whereby the QTLs are more common and of lower effect size and penetrance than can be detected by linkage studies, but are rarer than can be picked up usefully by GWAS of a few hundred individuals. The idea that quantitative traits like height and intelligence, or diseases such as schizophrenia, might be controlled by fewer large effect QTLs is appealing in some ways because it can explain more easily the variance across siblings in very heritable traits. The smaller the number of relevant QTLs, the greater the expected sample variance. On the other hand, it is notable that one of Goldstein's test cases within the paper is sickle-cell disease, which is one of the few undisputed cases of heterozygote advantage in human genetics. These associations don't exist unperturbed by other evolutionary dynamics.
In any case, here's a quote from Goldstein:
"This tells us that we will surely need to turn to more comprehensive whole genome sequencing studies of more carefully selected subjects if we want to discover more meaningful relationships between genetic variation and disease," says Goldstein. "While such studies are undoubtedly more complex, expensive and time-consuming, we really have no choice if we want to deepen our knowledge about the genetic underpinnings of human disease."
Samuel P. Dickson, Kai Wang, Ian Krantz, Hakon Hakonarson, David B. Goldstein. Rare Variants Create Synthetic Genome-Wide Associations. PLoS Biology, 2010; 8 (1): e1000294 DOI: 10.1371/journal.pbio.1000294
Richard Robinson. Common Disease, Multiple Rare (and Distant) Variants. PLoS Biology, 2010; 8 (1): e1000293 DOI: 10.1371/journal.pbio.1000293
Thank you for linking to those articles, its exactly what I needed right now :)
The idea that rare variants with large effect may be responsible for some sites of weak association with diseases.
On the other hand, height and IQ are are not diseases, and the observation of the phenotypes make them appear to vary smoothly across populations. These may be strong cases of many genes/alleles with moderate contributions to the phenotype.
mike, sure. but i've read it's hard to tell beyond 10 QTLs how many QTLs there are from the smoothness of a trait's distribution.
Many diseases are likely the effects of smoothly continuous traits. As Falconer pointed out, for most cases the trait is not the disease, but the likelihood of developing it (perhaps interacting with other genetic or environmental factors). Also, just because we have defined a cluster of symptoms as a (singular) disease does not mean that there are not multiple etiologies, another confound for GWAS. Think autism, which is likely a result of many possible perturbations of development that result in the same effect on circuit architecture. I always feel like GWAS rest on naively simplistic assumptions about the relationships between genotype and phenotype.
The best justification for why we should actually be surprised by "common disease common variant" was in a talk Andy Clark gave not long ago talking about super deep sampling (1,000s of individuals) of human variation. Anyway, one upshot was that with the relatively recent human population expansion from the ancestral size to the current ~6.5 Billion (or whatever), there is a fairly large tree that is very sparse for most of the history and it is crazily bushy only very recently. If you're thinking from a coalescent perspective and randomly throwing down mutations, most of those mutations will land in the bushy area and only a few will land in the sparse older branches.
So, most variants will be low frequency and will have a very recent coalescence time. In order to have a lot of "common disease/common variant" examples, you'd have to imagine a very orderly disease (disease A is caused by amino acid i being substituted for amino acid j at position k in peptide X, and in no other way, for one cartoon example). And even so, it would only be identical by state, not ibd. Perhaps this could be caused by hypermutable sites that cause a lot of convergence or whatever.
Anyway, in reality, we are stuck with tons of diseases like thalassemia which has tons of various mutations which can cause similar symptoms, but some versions are worse than other, depending largely on the nature of the actual indel or substitution that causes it. And many mutations are unique to individuals, as you'd predict if you were thinking about how demography influences when and where segregating sites were dropped down on the coalescent tree.