To the barricades in defence of Big Genetics

Over at Gene Expression, p-ter has a post up defending the "big genetics" approach, noting that large-scale hypothesis-free genetics studies have consistently yielded important results for follow-up detailed fine-scale studies.

It's a sound argument. I've argued in the past that many of the fears expressed about Big Genetics are overblown:

Will Big Genetics eventually swallow the entire field, as some critics of the Human Genome Project argued towards the end of the last millennium? I'd argue that this is unlikely, and that in fact the Big Genetics approach carries within it the seeds of its own constraint.

My reasoning is this: firstly, the sheer size of these projects encourages the emergence of a public data-sharing mentality that now (thankfully) permeates most of the field, becausewith no one group feeling complete ownership of the resulting data there are fewer barriers to the idea of dumping it all online for the benefit of the community as a whole. The free release of data into the research community, like an influx of nutrients into an ecosystem, ultimately results in the increased availability of niches for researchers to exist in. Basically, Big Genetics generates far more data than its participants can ever hope to analyse themselves, and the hefty remainder is fodder for a plethora of small labs exploring small but important facets of the bigger picture.

The vast number of small-scale studies that have relied on the human genome reference sequence or the HapMap is an obvious testament to this process. We are also beginning to see small groups seize on the wealth of data from genome-wide association studies to drive both targeted genetic studies and functional and mechanistic analyses. The increasing hunger of high-impact journals for multi-disciplinary research will ensure that the drive for collaboration is always there, but groups won't need to be absorbed within these massive consortia in order to take advantage of their data output.

My guess is that - contrary to the Big critics - the human genetics ecosystem will continue to fluctuate around an equilibrium point marking a fairly comfortable balance between Big and Small Genetics. However, the crucial symbol in the equation is the free release of data, meaning anything that interferes with open data access is a threat to the research community as a whole - so we need to be wary both of an excessive focus on commercialisation within academia, and of well-meaning but excessive attempts to control the flow of data, like this.

More like this

Olivia Judson's blog has a guest post by Aaron Hirsh that got me thinking about a topic that will be familiar to most scientists: the transition of research towards Big Science. Big Science basically includes any project involving a large consortium of research groups working together on a tightly…
In the emergent era of Big Science, will the work of small-scale genetics labs be overwhelmed—or worse, rendered obsolete—by massive genome studies like the International HapMap Project? Dan MacArthur of Genetic Future thinks that a happy equilibrium could be reached between the two approaches. "…
[Added in edit in response to concerned emails: The original title was deliberately provocative, and contrary to the message in the text; I apologise for any misunderstanding. I've largely rewritten the post to make my point more clearly.] One of the curious and paradoxical effects of Big Genetics…
Well, it's a little late, but I finally have a list of what I see as some of the major trends that will play out in the human genomics field in 2009 - both in terms of research outcomes, and shifts in the rapidly-evolving consumer genomics industry. For genetics-savvy readers a lot of these…

There are a couple of fallacies thrown around as if they were facts when critics argue about this point. The major one is the notion that it was a general assumption amongst the proponents of large scale science that this was the answer to diseases like cancer, heart disease, diabetes or genetic disorders. I can't remember anyone seriously suggesting at the time that there would be rapid cures coming out of the genome project.
The other fallacy is the ridiculous overinflated role now assigned to Craig Venter in pushing the genome project to completion. I think he's a smart guy with some good ideas but come on, Celeras initial genome was a mess and cost a fortune to get full access and their gene prediction was a joke.

Seems to me that the main objection about FC's appointment isn't big vs small genetics: it's the fact that he's a *geneticist*. By analogy, Zerhouni's appointment was the triumph of imaging at the expense of other diagnostic forms, and his resignation was the MRI's fall from grace.

Really, is this what political/policy discourse has devolved into? Pah.

We are also beginning to see small groups seize on the wealth of data from genome-wide association studies to drive both targeted genetic studies and functional and mechanistic analyses.

Is there any decent evidence that the vast sums of time, money, and effort spent on Genome Wide Association studies are actually any better at driving targeted genetic studies and functional and mechanistic analyses--which are the things that are actually required to get at real biology--than the fast, free, and easy process of Wild Ass Guessing?

20 years of candidate gene association studies for complex diseases = less than a dozen reliable associations, and a whole lot of wild goose chases that wasted time and resources (how many knockout mice were made to study non-existent genetic associations, I wonder?).

2 years of genome-wide association studies = well over 400 replicated genetic variants associated with more than 70 complex traits and diseases, most of which are now the target of fairly intensive mechanistic follow-up.

QED.

It's also worth emphasising that wild-ass guessing was neither fast, nor free, nor necessarily easy. :-)

Is there any decent evidence that the vast sums of time, money, and effort spent on Genome Wide Association studies are actually any better at driving targeted genetic studies and functional and mechanistic analyses--which are the things that are actually required to get at real biology--than the fast, free, and easy process of Wild Ass Guessing?

yes. this is not up for debate in the field--Wild Ass Guessing was an abject failure, and GWAS work infinitely better (admittedly, it's not hard to do much better than abject failure). i echo daniel's comment; also, for a couple concrete examples, see my post linked above.

IIRC, at the December NIH/CDC meeting Francis Collins suggested that the way to get to the bottom of the missing heritability, the common disease common variant hypothesis, gene-gene and gene-environment interactions, etc. etc. is to run a population-wide, 20-year longitudinal study in which genome-wide data and detailed environmental and behavioral minutiae were tracked for 100,000 participants.

The follow-up commenters starting with John Ioannidis each upped the sample size by an order of magnitude, until someone suggested that the entire U.S. population be sequenced, which it was then realized would require universal health care.

At that point, the meeting ended.

Perhaps Celera's initial assembly was a mess (and I'm not trying to defend it; just don't have basis for comment), Celera's formation caused an (IMHO) undeniable acceleration of the public effort.

The crisis is that when we look at things that are straightforwardly hereditary, for example height, we don't get much. We should. Why do we not?

We should be able to predict someone's height from their genome at least as accurately as by looking at their parents - and similarly for their IQ, the shape of their nose, and so on and so forth. What is the problem? Anything that is sufficiently straightforward for breeders to select for in rats, we should be able to predict from genes.