big genetics

In the comments to a previous post defending big genetics, Andro Hsu relates an anecdote that warrants repeating: IIRC, at the December NIH/CDC meeting Francis Collins suggested that the way to get to the bottom of the missing heritability, the common disease common variant hypothesis, gene-gene and gene-environment interactions, etc. etc. is to run a population-wide, 20-year longitudinal study in which genome-wide data and detailed environmental and behavioral minutiae were tracked for 100,000 participants. The follow-up commenters starting with John Ioannidis each upped the sample size by…
Over at Gene Expression, p-ter has a post up defending the "big genetics" approach, noting that large-scale hypothesis-free genetics studies have consistently yielded important results for follow-up detailed fine-scale studies. It's a sound argument. I've argued in the past that many of the fears expressed about Big Genetics are overblown: Will Big Genetics eventually swallow the entire field, as some critics of the Human Genome Project argued towards the end of the last millennium? I'd argue that this is unlikely, and that in fact the Big Genetics approach carries within it the seeds of…
David Dooling has a great post that starts with the conference blogging issue, and then leaps off in a different but related direction - the curious double standard in the data release policies applying to large genome sequencing centres compared to other genomic researchers. As David notes, the advent of second-generation sequencing technologies means that medium-size genomics labs can now generate more sequence in a month than the entire Human Genome Project was able to generate in a year; indeed, a single Illumina Genome Analyzer can now produce a high-quality entire human genome sequence…
[Added in edit in response to concerned emails: The original title was deliberately provocative, and contrary to the message in the text; I apologise for any misunderstanding. I've largely rewritten the post to make my point more clearly.] One of the curious and paradoxical effects of Big Genetics projects like the 1000 Genomes Project - which plans to generate low-coverage whole-genome sequences for ~1,500 people by the end of this year, providing a map of human genetic variation of unprecedented resolution - is that while they considerably accelerate research in the long term, they can…
Olivia Judson's blog has a guest post by Aaron Hirsh that got me thinking about a topic that will be familiar to most scientists: the transition of research towards Big Science. Big Science basically includes any project involving a large consortium of research groups working together on a tightly-defined problem, usually with a very specific goal in mind (e.g. sequence and analyse a genome, or build a big machine to smash particles together at high speed). Hirsh only mentions genetics in passing, but this field - and particularly human genetics - is an area where the trend towards Big…
This will probably only be of interest to population genetics afficianados, but I just noticed that the HapMap project has made its phase 3 data available through its browser (the data were previously available for download, but are much more accessible - especially to non-bioinformaticians - through the browser interface). The HapMap project is a massive international collaboration collecting information on common sites of genetic variation (called single nucleotide polymorphisms, or SNPs) in anonymised individuals from a variety of human populations. Phase 3 has data on about 1.5 million…
Nature News has a special feature on "big data" - a broad look at the demands of the brave new world of massively high-throughput data generation, and the solutions adopted by research institutes and corporations to deal with those demands. The image to the left (from an article in the feature by Boing Boing's Cory Doctorow) is a picture of the office door of Tony Cox, head of sequencing informatics at the Sanger Institute in Cambridge, UK. The 320 terabytes refers to the scale of the raw data being produced by the Sanger's next-generation sequencing machines as they chew through kilometres…