The genome-wide association study has been the technique du jour in human genetics for much of the last two years. It’s a pure brute force approach, surveying up to a million sites of common variation throughout the genomes of thousands of people at a time, some of whom suffer from a particular disease, and some of whom are healthy controls. The underlying principle is simple: if a genetic variant increases the risk of the disease in question, it will be more common in patients than in controls.
The whole exercise is predicated on one major assumption: that common diseases, such as type 2 diabetes or breast cancer, are to a large extent caused by common genetic variants – variants with a frequency of 5% of greater in the general population.
This “common disease, common variant” (CDCV) hypothesis was one of the foundations of the HapMap project, a massive international collaboration that seeks to characterise patterns of common genetic variation in populations around the world. The HapMap project provided the catalogue of common variants required to perform genome-wide association studies; companies like Illumina and Affymetrix generated the chip-based technology needed to cost-effectively genotype hundreds of thousands of markers from a patient’s DNA sample. Researchers were already collecting DNA samples from thousands of well-characterised disease patients for genetic studies. So long as the CDCV hypothesis was broadly correct, genome-wide association studies held the promise of unlocking the genetic factors underlying common diseases, providing predictive markers that could be used to identify at-risk individuals for early intervention before the onset of symptoms.
Over the last eighteen months, hundreds of millions of dollars have been poured into massive genome-wide association studies, often incorporating patient samples and research institutes from around the world. These studies have certainly taught us much about the genetic architecture of human traits; but sadly, things have not turned out as neatly as the CDCV proponents hoped.
In the New York Times yesterday, Nick Wade profiles highly-regarded geneticist David Goldstein of Duke University, who provides the most sober assessment I have yet seen in the mainstream press about the outcomes of the genome-wide association study frenzy:
“There is absolutely no question,” he said, “that for the whole hope of personalized medicine, the news has been just about as bleak as it could be.”
Of the HapMap and other techniques developed to make sense of the human genome, Dr. Goldstein said, “Technically, it was a tour de force.” But in his view, this prodigious labor has produced just a handful of genes that account for very little of the overall genetic risk.
“After doing comprehensive studies for common diseases, we can explain only a few percent of the genetic component of most of these traits,” he said. “For schizophrenia and bipolar disorder, we get almost nothing; for Type 2 diabetes, 20 variants, but they explain only 2 to 3 percent of familial clustering, and so on.”
Goldstein’s views make for a refreshing change from the hype of most stories about the new era of large-scale human genetics. In fact, it’s been clear for quite some time that the yield from genome-wide association studies was unlikely to be anywhere near as impressive as the optimists had predicted – I noted back in April that the poor results of massive genome-wide scans for height-related genes (involving tens of thousands of subjects) were a particularly ominous sign for disease geneticists. Nonetheless, the media has tended to play up the minor success stories, no doubt with some encouragement from the scientists involved, rather than report the more depressing reality of the situation: despite hundreds of millions of dollars invested in genome-wide studies, we are not much closer to developing reliable predictive tests for most common diseases than we were two years ago.
Why has the genome-wide approach failed to meet expectations? As I noted in my long post yesterday, there are a large number of complicating factors that are hindering the success of the common variant approach. Probably the most important factor – and one that Goldstein highlights – is that most disease risk is probably due to large numbers of rare variants, rather than small numbers of common variants. Chip-based assays, which can only really assess variants with a reasonable frequency in the population (greater than 1%, even with newer technologies), simply can’t find these rare variants.
Not all is lost, however. Although the genome-wide association approach has failed to find enough disease risk to have much predictive utility, the genes highlighted by this approach have provided new insights into the molecular pathways underlying common diseases. In addition, there is hope around the corner for researchers seeking rare disease-causing genes: the plummeting cost of DNA sequencing means that it won’t be long before whole-genome sequencing (determining the code at every single one of the six billion positions in the genome) of large numbers of patients and controls becomes affordable.
Finally, there is the intriguing question of why common diseases are not caused by common genetic variants. Like any “why” question in biology, this is at heart an evolutionary problem, and a particularly interesting one at that. But rather than ramble on redundantly, I will simply point you to three erudite bloggers who beat me to the punch: John Hawks, Razib and Dienekes. All three are well worth a read.