I’ll Take Genomes for $1000

Nature Genetics is asking:

What would you do if it became possible to sequence the equivalent of a full human genome for only $1,000?

George Church would repeat the Applera dataset for everyone on earth, sequencing every exon from every human being. Francis Collins would sequence people with diseases and old people. Stephen O’Brien would sequence the genomes of all 38 extant species of cats (big surprise) to study the evolution of that taxon and generate SNP markers. O’Brien would also sequence the genomes of the 100 most endangered mammals and every species of primate. Evan Eichler would sequence 1,000 humans from around the globe ($1 million), people with mental retardation and their parents (at least a couple hundred grand), the genomes of all mammals (because O’Brien wasn’t being ambitious enough), and 100 germ cells (eggs or sperm) and their donor (another hundred grand). Jonathan Pritchard would look for rare variants through resequencing — which is a more specific description of Eichler’s proposal — and sequence his three year old son’s genome.

The question leaves some issue unresolved. What does one get for $1000? Is that merely the cost to generate the sequence reads (sequences of a few hundred bases, known as traces) or does one get an assembled genome for $1000? Assembling future human genomes should be a snap because the new sequences can be built on top of a backbone consisting of the most current assembly. But assembling genomes from other species de novo is much more labor intensive. You may be able to get the genic information (ie, most of the open reading frames and other functional sequences) from a poorly assembled genome, but you won’t have chromosomes represented by single scaffolds.

With that in mind, Eichler brings up an interesting technological goal:

I would be even more enthusiastic about technology that would allow >200,000 base pairs of contiguous sequence to be obtained directly from genomic DNA in a single pass…this would allow us to understand more complex regions of our genome such as segmental duplications, telomeres and centromeres as well as underlying individual variation.

Of course Eichler would be the one to bring up segmental duplications, but this technology is useful for more than just identifying structural polymorphism. If we could increase the length of DNA sequenced in each pass (think HUGE trace reads), assembling genomes will be so much easier. One of the difficulties of assembling genomes (especially large ones with lots of repetitive DNA) is figuring out whether two extremely identical sequences are for the same exact part of the genome or if they’re paralogous (the result of a recent duplication event).

What would I sequence? Along the same lines as O’Brien and Eichler, Pritchard suggests sequencing the genomes of a bunch of species:

With cheap genome sequencing, one could take any interesting clade (e.g., the Hawaiian Drosophilids) and quickly determine the full complement of genomic differences among species, and from there head into comparative expression arrays and so forth. Cheap genome sequencing will lower the divide between ‘model’ and ‘non-model’ organisms.

The point about lowering the divide between model and non-model organisms is important, although overstated. Model organisms don’t just have genome sequences freely available; they also have lots of laboratory tools (transposon insertion lines, mutation lines, deletion lines, constructs for doing nifty genetic tricks) that have been developed thanks to decades of research. An organism cannot become a model by genome sequence alone.

But I’m more in favor of the approach proposed by Eichler: sequencing the genome of many individuals from a single species. Eichler chooses humans, but this could very well be any species you fancy. I like this better than O’Brien’s proposal to develop SNPs for two reasons. First, studying polymorphism based on SNPs identified in a small panel then genotyped in a larger population introduces ascertainment bias into your analysis. Second, if it’s so cheap to sequence a genome, why not just go all out and sequence the whole thing rather than genotype known SNPs?

Given that the human genome is on the large end of animal genomes, we could probably sequence multiple individuals from other species of interest for that $1000. So, if we could sequence a human sized genome for $1000, I’d sequence the genomes of 100 individuals from my favorite Drosophila species for about $10,000. That’s the first thing I’d do. After that, I’d move on to another taxon, sequencing a bunch of species, and then focus on one species in which to study genome wide patterns of polymorphism.

(Via Genetics and Health.)


  1. #1 Mustafa Mond, FCD
    January 17, 2007

    Are you going to put up a post about the new blog name?

  2. #2 sparc
    January 17, 2007

    and on actgctgtagcat?

  3. #3 RPM
    January 17, 2007

    On the new name.

    Sparc, not sure what you’re saying there. Am I supposed to translate that: TAVA?

  4. #4 Mustafa Mond, FCD
    January 17, 2007

    and on actgctgtagcat?

    It’s a new exam for those wishing to attend graduate school. Reports are that it is a lot tougher to make the cut on actgctgtagcat than on the lsat or mcat.

  5. #5 RPM
    January 17, 2007

    now I get it. those are the letters from my banner. and they’re meaningless.

  6. #6 Mike the Mad Biologist
    January 17, 2007

    I would sequence thousands of E. coli genomes so we could do some kickass linkage disequilibrium studies (among many, many other things). But I’m crazy like dat.

  7. #7 Amit
    January 17, 2007

    While the amount of data cheap re-sequencing would generate gives me bioinformatics nightmares, it would ensure me lifetime employment as a programmer :) In any case, the geneticist in me would do similar things as George Church and Francis Collins state in the article.

    Will you be changing your url to reflect the new name?