The promise and challenges of Big Genetics

By dgmacarthur on January 14, 2009.

Olivia Judson's blog has a guest post by Aaron Hirsh that got me thinking about a topic that will be familiar to most scientists: the transition of research towards Big Science. Big Science basically includes any project involving a large consortium of research groups working together on a tightly-defined problem, usually with a very specific goal in mind (e.g. sequence and analyse a genome, or build a big machine to smash particles together at high speed).

Hirsh only mentions genetics in passing, but this field - and particularly human genetics - is an area where the trend towards Big Science has been spectacularly visible. Large-scale collaborative efforts such as the sequencing of the human genome and the production of the HapMap catalogue of common genetic variation have provided incredibly useful tools for the genetics community. Such projects have typically sparked heated debate at their inception, with detractors claiming that the money could be better spent elsewhere - but as I noted last year, such criticism has typically lost its bite once the value of the resulting data became clear.

Big Genetics is now a familiar feature of the human genetics landscape. This year will see major data releases from several massive projects (e.g. the 1000 Genomes Project and the Cancer Genome Atlas) and disease genetics consortia such as the Wellcome Trust Case-Control Consortium, GIANT, and dozens of other disease-specific collaborations.

The genome-wide association study (GWAS) field provides an interesting case study in both the power and the challenges of Big Genetics. The relentless demand for ever-larger samples of disease patients - to fuel the search for variants with ever-smaller effect sizes - has provided a powerful incentive to create large-scale collaborations, but there are considerable barriers to achieving this. Researchers have spent years and sometimes decades laboriously collecting and cataloguing DNA samples from disease patients, and are thus understandably reluctant to exchange these samples for the relative anonymity of middle-authorship on a large-scale GWAS paper.

Nonetheless, as it has become clear that even cohorts that would have been considered "large" in the human genetics field even 5 years ago - thousands of disease cases and matched healthy controls - are insufficient to detect the majority of the genetic variants contributing to complex disease risk, most researchers have pragmatically (albeit tentatively, and conditionally) agreed to pool their resources with other groups. The results have been GWAS of astonishing size, with tens of thousands of genotyped individuals now becoming almost mundane in high-level publications (see here for some recent examples).

The scale of these large GWAS has allowed them to identify and catalogue literally hundreds of genetic variants underlying variation in complex disease risk. Although the extremely small effect sizes of most of these variants has meant that in sum they explain only a small fraction of the total genetic contribution to most complex traits (for various reasons), this catalogue has provided novel insight into the molecular basis of many common diseases - revealing, for instance, the previously unknown role of the autophagy pathway in Crohn's disease. In addition, the collections of high-quality, well-curated DNA samples assembled by these consortia will be incredibly useful in probing for other forms of genetic variation (such as rare variants) as platforms to assess these become available.

Thus while large-scale GWAS collaborations have sorely tested the diplomatic and political skills of many researchers, the outcomes of these studies and other large collaborative projects mean that Big Genetics is here to stay.

Will Big Genetics eventually swallow the entire field, as some critics of the Human Genome Project argued towards the end of the last millennium? I'd argue that this is unlikely, and that in fact the Big Genetics approach carries within it the seeds of its own constraint.

My reasoning is this: firstly, the sheer size of these projects encourages the emergence of a public data-sharing mentality that now (thankfully) permeates most of the field, because with no one group feeling complete ownership of the resulting data there are fewer barriers to the idea of dumping it all online for the benefit of the community as a whole. The free release of data into the research community, like an influx of nutrients into an ecosystem, ultimately results in the increased availability of niches for researchers to exist in. Basically, Big Genetics generates far more data than its participants can ever hope to analyse themselves, and the hefty remainder is fodder for a plethora of small labs exploring small but important facets of the bigger picture.

The vast number of small-scale studies that have relied on the human genome reference sequence or the HapMap is an obvious testament to this process. We are also beginning to see small groups seize on the wealth of data from genome-wide association studies to drive both targeted genetic studies and functional and mechanistic analyses. The increasing hunger of high-impact journals for multi-disciplinary research will ensure that the drive for collaboration is always there, but groups won't need to be absorbed within these massive consortia in order to take advantage of their data output.

My guess is that - contrary to the Big critics - the human genetics ecosystem will continue to fluctuate around an equilibrium point marking a fairly comfortable balance between Big and Small Genetics. However, the crucial symbol in the equation is the free release of data, meaning anything that interferes with open data access is a threat to the research community as a whole - so we need to be wary both of an excessive focus on commercialisation within academia, and of well-meaning but excessive attempts to control the flow of data, like this.

Subscribe to Genetic Future.

More like this

To the barricades in defence of Big Genetics

Over at Gene Expression, p-ter has a post up defending the "big genetics" approach, noting that large-scale hypothesis-free genetics studies have consistently yielded important results for follow-up detailed fine-scale studies. It's a sound argument. I've argued in the past that many of the fears…

Genome-wide association studies: failure or success?

The latest issue of theÂ New England Journal of MedicineÂ has four excellent and thought-provoking articles on the recent revolution in the genetics of common disease and its implications for personalised medicine and personal genomics. Razib and Misha AngristÂ have already commented, and there's…

Why do genome-wide scans fail?

The successes of genome-wide association studies (GWAS) in identifying genetic risk factors for common diseases have been heavily publicised in the mainstream media - barely a week goes by these days that we don't hear about another genome scan that has identified new risk genes for diabetes, lupus…

Telegraph completely mangles debate over value of genetic research

I wrote a few days ago about a debate in the New England Journal of Medicine over the value of data emerging from recent genome-wide studies of the role of genetic variation in common human diseases and other traits. David Goldstein argued that genome-wide association studies (GWAS) have generated…

so we need to be wary both of an excessive focus on commercialisation within academia

If you have time, could you elaborate on commercialisation in academic environments?

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

EPA Reconsiders Its Biden Ban On Asbestos Everywhere

More by this author

Genetic Future is moving

January 18, 2011

After a semi-hiatus due to various distractions, I'm about to restart blogging in earnest again over at the new home of Genetic Future on Wired Science. Please update your RSS feed: my new one is here. And a reminder: you can always keep track of new posts here as well as other nuggets of…

One more step towards the end of recessive diseases

January 13, 2011

In the last century infant mortality has declined precipitously in the Western world, thanks in large part to the development of antibiotics and vaccination. Yet as the suffering and death from infectious disease has reduced, the burden from genetic disease has become proportionately greater:…

New FireFox plugin for 23andMe customers

January 11, 2011

Software company 5AM Solutions has just launched a neat little FireFox plug-in for customers of consumer genomics company 23andMe. The idea is very simple: Download your raw data from 23andMe (or use one of the files from me or my colleagues at Genomes Unzipped); Install the plug-in from here…

Why you CAN have your $1000 genome - so long as you learn what to do with it

January 7, 2011

As part of his Gene Week celebration over at Forbes, Matthew Herper has a provocative post titled "Why you can't have your $1000 genome". In this post I'll explain why, while Herper's pessimism is absolutely justified for genomes produced in a medical setting, I'm confident that I'll be obtaining…

Bioscience Resource Project critique of modern genomics: a missed opportunity

December 15, 2010

Late last week I stumbled across a press release with an attention-grabbing headline ("The Causes of Common Diseases are Not Genetic Concludes a New Analysis") linking to a lengthy blog post at the Bioscience Resource Project, a website devoted to food and agriculture. The post, written by two…