I'll Take Genomes for $1000

By evolgen on January 17, 2007.

Nature Genetics is asking:

What would you do if it became possible to sequence the equivalent of a full human genome for only $1,000?

George Church would repeat the Applera dataset for everyone on earth, sequencing every exon from every human being. Francis Collins would sequence people with diseases and old people. Stephen O'Brien would sequence the genomes of all 38 extant species of cats (big surprise) to study the evolution of that taxon and generate SNP markers. O'Brien would also sequence the genomes of the 100 most endangered mammals and every species of primate. Evan Eichler would sequence 1,000 humans from around the globe ($1 million), people with mental retardation and their parents (at least a couple hundred grand), the genomes of all mammals (because O'Brien wasn't being ambitious enough), and 100 germ cells (eggs or sperm) and their donor (another hundred grand). Jonathan Pritchard would look for rare variants through resequencing -- which is a more specific description of Eichler's proposal -- and sequence his three year old son's genome.

The question leaves some issue unresolved. What does one get for $1000? Is that merely the cost to generate the sequence reads (sequences of a few hundred bases, known as traces) or does one get an assembled genome for $1000? Assembling future human genomes should be a snap because the new sequences can be built on top of a backbone consisting of the most current assembly. But assembling genomes from other species de novo is much more labor intensive. You may be able to get the genic information (ie, most of the open reading frames and other functional sequences) from a poorly assembled genome, but you won't have chromosomes represented by single scaffolds.

With that in mind, Eichler brings up an interesting technological goal:

I would be even more enthusiastic about technology that would allow >200,000 base pairs of contiguous sequence to be obtained directly from genomic DNA in a single pass...this would allow us to understand more complex regions of our genome such as segmental duplications, telomeres and centromeres as well as underlying individual variation.

Of course Eichler would be the one to bring up segmental duplications, but this technology is useful for more than just identifying structural polymorphism. If we could increase the length of DNA sequenced in each pass (think HUGE trace reads), assembling genomes will be so much easier. One of the difficulties of assembling genomes (especially large ones with lots of repetitive DNA) is figuring out whether two extremely identical sequences are for the same exact part of the genome or if they're paralogous (the result of a recent duplication event).

What would I sequence? Along the same lines as O'Brien and Eichler, Pritchard suggests sequencing the genomes of a bunch of species:

With cheap genome sequencing, one could take any interesting clade (e.g., the Hawaiian Drosophilids) and quickly determine the full complement of genomic differences among species, and from there head into comparative expression arrays and so forth. Cheap genome sequencing will lower the divide between 'model' and 'non-model' organisms.

The point about lowering the divide between model and non-model organisms is important, although overstated. Model organisms don't just have genome sequences freely available; they also have lots of laboratory tools (transposon insertion lines, mutation lines, deletion lines, constructs for doing nifty genetic tricks) that have been developed thanks to decades of research. An organism cannot become a model by genome sequence alone.

But I'm more in favor of the approach proposed by Eichler: sequencing the genome of many individuals from a single species. Eichler chooses humans, but this could very well be any species you fancy. I like this better than O'Brien's proposal to develop SNPs for two reasons. First, studying polymorphism based on SNPs identified in a small panel then genotyped in a larger population introduces ascertainment bias into your analysis. Second, if it's so cheap to sequence a genome, why not just go all out and sequence the whole thing rather than genotype known SNPs?

Given that the human genome is on the large end of animal genomes, we could probably sequence multiple individuals from other species of interest for that $1000. So, if we could sequence a human sized genome for $1000, I'd sequence the genomes of 100 individuals from my favorite Drosophila species for about $10,000. That's the first thing I'd do. After that, I'd move on to another taxon, sequencing a bunch of species, and then focus on one species in which to study genome wide patterns of polymorphism.

(Via Genetics and Health.)

More like this

Are you going to put up a post about the new blog name?

and on actgctgtagcat?

On the new name.

Sparc, not sure what you're saying there. Am I supposed to translate that: TAVA?

and on actgctgtagcat?

It's a new exam for those wishing to attend graduate school. Reports are that it is a lot tougher to make the cut on actgctgtagcat than on the lsat or mcat.

now I get it. those are the letters from my banner. and they're meaningless.

I would sequence thousands of E. coli genomes so we could do some kickass linkage disequilibrium studies (among many, many other things). But I'm crazy like dat.

While the amount of data cheap re-sequencing would generate gives me bioinformatics nightmares, it would ensure me lifetime employment as a programmer :) In any case, the geneticist in me would do similar things as George Church and Francis Collins state in the article.

Will you be changing your url to reflect the new name?

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Universities Can Agree On All Hate Speech Except Antisemitism

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…