David Goldstein on the failures of genome-wide association studies

The genome-wide association study has been the technique du jour in human genetics for much of the last two years. It's a pure brute force approach, surveying up to a million sites of common variation throughout the genomes of thousands of people at a time, some of whom suffer from a particular disease, and some of whom are healthy controls. The underlying principle is simple: if a genetic variant increases the risk of the disease in question, it will be more common in patients than in controls.

The whole exercise is predicated on one major assumption: that common diseases, such as type 2 diabetes or breast cancer, are to a large extent caused by common genetic variants - variants with a frequency of 5% of greater in the general population.

This "common disease, common variant" (CDCV) hypothesis was one of the foundations of the HapMap project, a massive international collaboration that seeks to characterise patterns of common genetic variation in populations around the world. The HapMap project provided the catalogue of common variants required to perform genome-wide association studies; companies like Illumina and Affymetrix generated the chip-based technology needed to cost-effectively genotype hundreds of thousands of markers from a patient's DNA sample. Researchers were already collecting DNA samples from thousands of well-characterised disease patients for genetic studies. So long as the CDCV hypothesis was broadly correct, genome-wide association studies held the promise of unlocking the genetic factors underlying common diseases, providing predictive markers that could be used to identify at-risk individuals for early intervention before the onset of symptoms.

Over the last eighteen months, hundreds of millions of dollars have been poured into massive genome-wide association studies, often incorporating patient samples and research institutes from around the world. These studies have certainly taught us much about the genetic architecture of human traits; but sadly, things have not turned out as neatly as the CDCV proponents hoped.

In the New York Times yesterday, Nick Wade profiles highly-regarded geneticist David Goldstein of Duke University, who provides the most sober assessment I have yet seen in the mainstream press about the outcomes of the genome-wide association study frenzy:


"There is absolutely no question," he said, "that for the whole hope of personalized medicine, the news has been just about as bleak as it could be."

Of the HapMap and other techniques developed to make sense of the human genome, Dr. Goldstein said, "Technically, it was a tour de force." But in his view, this prodigious labor has produced just a handful of genes that account for very little of the overall genetic risk.

"After doing comprehensive studies for common diseases, we can explain only a few percent of the genetic component of most of these traits," he said. "For schizophrenia and bipolar disorder, we get almost nothing; for Type 2 diabetes, 20 variants, but they explain only 2 to 3 percent of familial clustering, and so on."

Goldstein's views make for a refreshing change from the hype of most stories about the new era of large-scale human genetics. In fact, it's been clear for quite some time that the yield from genome-wide association studies was unlikely to be anywhere near as impressive as the optimists had predicted - I noted back in April that the poor results of massive genome-wide scans for height-related genes (involving tens of thousands of subjects) were a particularly ominous sign for disease geneticists. Nonetheless, the media has tended to play up the minor success stories, no doubt with some encouragement from the scientists involved, rather than report the more depressing reality of the situation: despite hundreds of millions of dollars invested in genome-wide studies, we are not much closer to developing reliable predictive tests for most common diseases than we were two years ago.

Why has the genome-wide approach failed to meet expectations? As I noted in my long post yesterday, there are a large number of complicating factors that are hindering the success of the common variant approach. Probably the most important factor - and one that Goldstein highlights - is that most disease risk is probably due to large numbers of rare variants, rather than small numbers of common variants. Chip-based assays, which can only really assess variants with a reasonable frequency in the population (greater than 1%, even with newer technologies), simply can't find these rare variants.

Not all is lost, however. Although the genome-wide association approach has failed to find enough disease risk to have much predictive utility, the genes highlighted by this approach have provided new insights into the molecular pathways underlying common diseases. In addition, there is hope around the corner for researchers seeking rare disease-causing genes: the plummeting cost of DNA sequencing means that it won't be long before whole-genome sequencing (determining the code at every single one of the six billion positions in the genome) of large numbers of patients and controls becomes affordable.

Finally, there is the intriguing question of why common diseases are not caused by common genetic variants. Like any "why" question in biology, this is at heart an evolutionary problem, and a particularly interesting one at that. But rather than ramble on redundantly, I will simply point you to three erudite bloggers who beat me to the punch: John Hawks, Razib and Dienekes. All three are well worth a read.

More like this

I had a big problem with this article when I read it last night--with how it was written by Nicholas Wade from the NYTimes. It was hard to tell where Goldstein's opinions ended and Wade's (mis?)interpretation of Goldstein's opinions began. For example:

"He says he thinks that no significant genetic differences will be found between races because of his belief in the efficiency of natural selection. Just as selection turns out to have pruned away most disease-causing variants, it has also maximized human cognitive capacities because these are so critical to survival. [followed by quote from Goldstein on human intelligence]"

There are three sentences here, one that is attributed to Goldstein without direct quotations, one that is implied to be a continuation of Goldstein's thoughts (but I think probably weren't, at least not direclty), and one with a direct quotation from Goldstein.

Sentence 1. Well, I certainly hope Goldstein doesn't think that no significant genetic differences will be found between races... what about lactase? I'll chalk this up to a poor paraphrasing on Wade's part or Wade generalizing a very specific statement made by Goldstein.

Sentece 2. How exactly does "no significant genetic differences" between races become defined by the existence or lack thereof of "human cognitive capacities"? There is clearly the *possibility* for genetic differences to exist between races that aren't intelligence-related. At the very least, if we want to remain controversial, there *could* be genetic differences that enhance the athletic abilities of one "race" and not another. I can't tell here whether Goldstein talked about genetic differences between races and transitioned into talking about intelligence, which Wade then mashed together into halfway inaccurate statements... or (hopefully less likely) Goldstein has ignored all published literature thus far that has discovered signs of selection on everything from testes genes to our immune systems.

Also, I know Wade is probably feeling the modern reporter's duty to exaggerate science news to make bigger headlines, but I agree with Razib from Gnxp... Goldstein's pessimism towards the ability to use GWAS results to explain all of a person's disease risk isn't exactly controversial.

By autumnmist (not verified) on 16 Sep 2008 #permalink

You're right, although I was using "headlines" in a more general sense and was trying to refer to articles that attempt to sensationalize things that aren't really that sensational. Actually, in this case, the headline isn't even that exaggerated--at least no more so than the content.

By autumnmist (not verified) on 16 Sep 2008 #permalink

i agree with your general point. this might not accurately reflect what goldstein said to wade. genuinely out of context....

Given the context, I'm pretty sure that's supposed to read "He says he thinks that no significant genetic differences [related to cognitive ability] will be found between races".

By Jason Malloy (not verified) on 16 Sep 2008 #permalink

The rather bleak picture painted by Dr. Goldstein on the success of GWAS and further distorted by the abstaction of Nicholas Wade (who interestingly in the past has repeatedly communicated the exact opposite message from choosen GWAS studies) is largely unfounded. It is utterly naive to generalize on GWAS study power or lack therof from studies pinpointing neurodevelopmental/psychiatric disorders or Type 2 diabetes, where the phenotype characterization of the former in often unclear and the heritability measures for the latter fluctuate markedly and are vastly inaccurate.
While every field person agrees that sequencing is needed to identify the rare and low-frequency variants that are likely to explain the genetic heritability that we are missing, the following few examples testify to the enourmous success of GWAS and should be kept in mind in this context:
1) AMD: the first variant identified using GWAS identified the CFH gene; the variant, which encodes for an AA change, explains 50% of all blindenss attributed to this disorder
2) Type 1 diabetes: we now have an explanation for almost 2/3 of the genetic risk (heritability) for this common and devastating pediatric disorder - if no gene had been identified in this disease in the past, GWAS would have caputerd all of them in one experiment.
3) Inflammatory Bowel Disease: We now have an explanation for 25-30% of the heritability of IBD, all of which is caputerd by GWAS, and the pediatric age of onset of this disease is adding several other genes to the pool of 30 already picked up in adult studies so far.
4) Exfoliative glaucoma: the heritability of this disorder is also largely explained by one single variant picked up by GWAS.

As such, the biological foothold that GWAS has unveiled by its capturing of over 300 replicating loci to this date has advanced the genetic field on a log scale, and even beyond expectaions.

By Hakon Hakonarson (not verified) on 17 Sep 2008 #permalink

Coming in late to the conversation...

I think Jason is right that Wade should have inserted "related to cognitive ability" into his sentence - Goldstein is certainly well-aware of the substantial functional genetic differences between human populations.

As for why Goldstein singles out intelligence as an area where no significant genetically-derived differences exist between populations, I think razib hits the nail on the head: so long as he successfully steers the conversation away from this particular taboo, "people will just believe David Goldstein and look the other way and he can go on with his research bounded only by his curiosity". After all, look what happened to Bruce Lahn.