Olivia Judson’s blog has a guest post by Aaron Hirsh that got me thinking about a topic that will be familiar to most scientists: the transition of research towards Big Science. Big Science basically includes any project involving a large consortium of research groups working together on a tightly-defined problem, usually with a very specific goal in mind (e.g. sequence and analyse a genome, or build a big machine to smash particles together at high speed).
Hirsh only mentions genetics in passing, but this field – and particularly human genetics – is an area where the trend towards Big Science has been spectacularly visible. Large-scale collaborative efforts such as the sequencing of the human genome and the production of the HapMap catalogue of common genetic variation have provided incredibly useful tools for the genetics community. Such projects have typically sparked heated debate at their inception, with detractors claiming that the money could be better spent elsewhere – but as I noted last year, such criticism has typically lost its bite once the value of the resulting data became clear.
Big Genetics is now a familiar feature of the human genetics landscape. This year will see major data releases from several massive projects (e.g. the 1000 Genomes Project and the Cancer Genome Atlas) and disease genetics consortia such as the Wellcome Trust Case-Control Consortium, GIANT, and dozens of other disease-specific collaborations.
The genome-wide association study (GWAS) field provides an interesting case study in both the power and the challenges of Big Genetics. The relentless demand for ever-larger samples of disease patients – to fuel the search for variants with ever-smaller effect sizes – has provided a powerful incentive to create large-scale collaborations, but there are considerable barriers to achieving this. Researchers have spent years and sometimes decades laboriously collecting and cataloguing DNA samples from disease patients, and are thus understandably reluctant to exchange these samples for the relative anonymity of middle-authorship on a large-scale GWAS paper.
Nonetheless, as it has become clear that even cohorts that would have been considered “large” in the human genetics field even 5 years ago – thousands of disease cases and matched healthy controls – are insufficient to detect the majority of the genetic variants contributing to complex disease risk, most researchers have pragmatically (albeit tentatively, and conditionally) agreed to pool their resources with other groups. The results have been GWAS of astonishing size, with tens of thousands of genotyped individuals now becoming almost mundane in high-level publications (see here for some recent examples).
The scale of these large GWAS has allowed them to identify and catalogue literally hundreds of genetic variants underlying variation in complex disease risk. Although the extremely small effect sizes of most of these variants has meant that in sum they explain only a small fraction of the total genetic contribution to most complex traits (for various reasons), this catalogue has provided novel insight into the molecular basis of many common diseases – revealing, for instance, the previously unknown role of the autophagy pathway in Crohn’s disease. In addition, the collections of high-quality, well-curated DNA samples assembled by these consortia will be incredibly useful in probing for other forms of genetic variation (such as rare variants) as platforms to assess these become available.
Thus while large-scale GWAS collaborations have sorely tested the diplomatic and political skills of many researchers, the outcomes of these studies and other large collaborative projects mean that Big Genetics is here to stay.
Will Big Genetics eventually swallow the entire field, as some critics of the Human Genome Project argued towards the end of the last millennium? I’d argue that this is unlikely, and that in fact the Big Genetics approach carries within it the seeds of its own constraint.
My reasoning is this: firstly, the sheer size of these projects encourages the emergence of a public data-sharing mentality that now (thankfully) permeates most of the field, because with no one group feeling complete ownership of the resulting data there are fewer barriers to the idea of dumping it all online for the benefit of the community as a whole. The free release of data into the research community, like an influx of nutrients into an ecosystem, ultimately results in the increased availability of niches for researchers to exist in. Basically, Big Genetics generates far more data than its participants can ever hope to analyse themselves, and the hefty remainder is fodder for a plethora of small labs exploring small but important facets of the bigger picture.
The vast number of small-scale studies that have relied on the human genome reference sequence or the HapMap is an obvious testament to this process. We are also beginning to see small groups seize on the wealth of data from genome-wide association studies to drive both targeted genetic studies and functional and mechanistic analyses. The increasing hunger of high-impact journals for multi-disciplinary research will ensure that the drive for collaboration is always there, but groups won’t need to be absorbed within these massive consortia in order to take advantage of their data output.
My guess is that – contrary to the Big critics – the human genetics ecosystem will continue to fluctuate around an equilibrium point marking a fairly comfortable balance between Big and Small Genetics. However, the crucial symbol in the equation is the free release of data, meaning anything that interferes with open data access is a threat to the research community as a whole – so we need to be wary both of an excessive focus on commercialisation within academia, and of well-meaning but excessive attempts to control the flow of data, like this.