Basics: How do you sequence a genome?

By sporte on January 22, 2007.

About a week ago, I offered to answer questions about subjects that I've either worked with, studied or taught.

I haven't had many questions yet, but I can certainly answer the ones I've had so far. Today, I'll answer the first question:

How do you sequence a genome?

Before we get into the technical details, there are some other genomic questions that you might like answered.

How much does it cost to sequence a genome?

I remember in 2002, when we were at the O'Reilly bioinformatics conference and we heard Lee Hood challenge the DNA sequencing community to lower the costs of genomic sequencing to $1000 for a human genome. It was all pretty exciting!

We're not there yet. But, we're getting closer. I've heard secondhand, from one of our customers, that it costs about $10,000 to sequence an average-sized bacterial genome, once you've purchased your sequencers, bought your software, and built your lab. Just for a bit of perspective, an average bacterial genome is about 750 times smaller than the human genome.

I'll leave you to do the math, but I imagine it scales pretty well. Ten million for a human genome seems about right, especially considering the original version was estimated to cost about 3 billion dollars.

What kind of infrastructure do you need to have?

You will need lots of robots for pipetting and preparing DNA, DNA sequencing instruments, computers, and software for tracking samples, evaluating sequence quality, and assembling the sequences at the end.

Some of the other types of equipment will depend on the methods that you're using. If you're using an older method, you'll need autoclaves and special incubators for growing bacteria. If you're using a newer method, like pyrosequencing, you need to have a special clean room where you can work with a lower risk of contamination.

Fine, so how do you go about doing it?

This used to be an easier question to answer. But now that pyrosequencing (from 454) has come along, this answer isn't as simple.

Still, I can divide the steps into three general parts, and then, since there are some nice movies and FlashÂ® animations on the internet, I will send you out to go watch them.

Here are the steps:

Break the genome into lots of small pieces at random positions.
Determine the sequence of each small piece of DNA.
Use an assembly program to figure out which pieces fit together.

The last two steps are a lot like determining what was written in the Dead Sea Scrolls.

Stay tuned, there will be more.

And there is:
Part II: Sequencing strategies
Part III: Reads and chromats
Part IV: How many reads does it take?
Part V: checking out the library

More like this

I'm not sure if this is the right place to ask, but what is shotgun sequencing? I've always wondered how they're able to sequence AND differentiate different species...

This is a fine place to ask.

Shotgun sequencing is a strategy for determining a DNA sequence that involves breaking a DNA molecule into several smaller pieces, then determining the sequence of DNA in each piece, and last, using software to put the smaller pieces together into a longer piece.

It's called "shotgun sequencing" because it doesn't involve mapping.

As far as differentiating between species, this is pretty easy to do. You know where you got your DNA sample, so you only need to distinguish between the DNA pieces that you're trying to sequence and DNA from the vector or from E. coli. That's pretty easy to do using standard sequence comparison programs like BLAST or cross_match.

I'll discuss shotgun sequencing in more in detail in the future posts on this subject.

Ten million for a human genome seems about right, especially considering the original version was estimated to cost about 3 billion dollars.

According to a current press release from Solexa:

Solexa expects its first-generation instrument, the 1G Genome Analyzer, to generate over a billion bases of DNA sequence per run and to enable human genome resequencing below $100,000 per sample, making it the first platform to reach this important milestone.

Their 1G machine allows sequencing of 1 billion basepairs per run. It is a chip based massive parallel modified Sanger sequencing method. The principle is depicted here:
http://www.solexa.com/technology/sbs.html
and
http://www.solexa.com/technology/demo.html

Cool!

I'm not sure if this is the right place to ask, but what is shotgun sequencing? I've always wondered how they're able to sequence AND differentiate different species...

maybe this is a reference to metagenomics?

like in this paper:
http://www.sciencemag.org/cgi/content/abstract/304/5667/66

if you sequence DNA from a microbial community, there's a certain stretch of DNA that acts as sort of a tag for a bacterial species. The amounts of time you see the tag and all the variants of it act as a count on the abundance of different species, and you can make a phylogeny from them.

Sequences can still be assembled as normal, it's just that it's difficult to know when you have a complete genome from a given species. In metagenomics, however, that isn't the goal-- instead, you want to look at which genes are present, which sequences are already in databases, which are novel, etc.

If there are only a couple species, you can distinguish them by GC content or some other measure of base composition.

P-ter

Good point about the possible metagenomics slant to that question. You're right, in those instances you are not sequencing genomes, you're taking a sample and looking to find out what's present in that sample. Usually, people identify bacteria by looking at the genes for ribosomal RNA, but GC content is helpful, too.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

What An Eclipse Means For US President Donald Trump

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…