Now on ScienceBlogs: "Investigative science journalism" and books I like to read [All of My Faults Are Stress Related]

Seed Media Group

The Week In ScienceBlogs: Sign up for our newsletter.

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Digital World Biology).

Search

Digital World Biology

Discover Biology with Bioinformatics

Subscribe to our newsletter


e-mail digitalbio at scienceblogs.com

use 'Digital World Biology' news as the subject

DigitalBio Favorites

Science Blogs School Fundraiser


link_donorschoose_small.gif


Recent Posts

Recent Comments

Categories

Blogroll

Science Education Groups

Keep up to date

Awards

Red Orbit

Digital Bio at Blogged

Wikio - Top Blogs - Sciences
Add Digital Bio to your Technorati Favorites!





Follow me on Twitter

When you need to laugh

Interesting places

The Tangled Bank
MicrobeWorld Radio

Locations of visitors to this page

Archives

« Bioinformatics teaching tip #1: Remember Julia Child! | Main | Basics: How do you sequence a genome? part III, reads and chromats »

Basics: How do you sequence a genome, part II

Category: Ask Dr. ScienceBasicsBioinformaticsBiotechnologyGenomicsScience education
Posted on: January 27, 2007 1:50 PM, by Sandra Porter

Considering that several genomes that have been sequenced in the past decade, it seems amazing in retrospect, that the first complete bacterial genome sequence was only published 12 years ago (1). Now, the Genome database at the NCBI lists 450 complete microbial genomes (procaryotes and archea), 1476 genomes from eucaryotes, 2145 viruses, and genome sequences from 407 phage.

Much of the methodology used for sequencing DNA is designed to confront one big technical hurdle.

That is, we can only determine the sequence of small pieces of DNA at a time. This means that you must break a larger piece of DNA into smaller pieces, determine the sequence of each piece, and then put the sequence together.

DNA_seq_challenge.gif


Mapping vs. Shotugn
When people were sequencing smaller pieces of DNA, in the 80's, it was common to map the DNA first using restriction enzymes, so that you knew how the pieces fit together. At first, many insisted that this same strategy should be applied to genomes as well. There were those who argued that genomes should be broken apart and each piece carefully mapped before sequencing began. And on the other hand, there was Craig Venter arguing that genome sequencing would be much quicker with a shot-gun approach.

Thinking along the lines of a traditional laboratory, where the labor is cheap and the reagents are expensive, the mapping approach seemed pretty logical. Each piece of DNA would be carefully mapped, so you would know where it fit into a larger piece, and then sequenced. The downside of mapping first, is that there's a cost in terms of time and of labor. Currently, you can obtain sequences that are about 900 bases long, using ABI instruments and chemistry. This would mean that to sequence a genome, like that of E. coli, that's 4,638,858 bp in length (2), by mapping it first, you would need at least 6000 fragments that were well mapped. The shot gun approach, where DNA is broken into several overlapping pieces, each piece is sequenced, and computer programs figure out how the pieces fit together, turned out to be much faster, and less costly in terms of labor.

Today, genome sequencing uses a combination of mapping and shot gun sequencing. Large pieces of DNA, on the order of 150,000 bp, are first cloned in BACs (Bacterial Artificial Chromosomes). The positions of the BACs are mapped, so it's known where they fit relative to each other and where they overlap. Then the sequence of each BAC is determined using a shotgun strategy.

I'll write more on the shot gun approach in the next post.

Read part I.
Part III: Reads and chromats
Part IV: How many reads does it take?
Part V: checking out the library

References:
1. Fraser CM, et.al. 1995 "The minimal gene complement of Mycoplasma genitalium." Science. Oct 20;270(5235):397-403.

2. Koonin, E. 1997. "Big Time for Small Genomes." Genome Research, 7:418-421.

         
Add to: Del.icio.us Digg  StumbleUpon Reddit  Facebook   Twitter

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/31679

Comments

1

This would mean that to sequence a genome, like that of E. coli, that's 4,638,858 bp in length (2), by mapping it first, you would need at least 6000 fragments that were well mapped.

That's a bit misleading. I don't think anyone was ever suggesting that each sequence read be mapped. A pure shotgun approach would involve no mapping of clones. A non-shotgun approach would map each clone, digest the clones, and sequence the fragments from each clone. Each read isn't mapped, but the clone from which they came is mapped.

Today, genome sequencing uses a combination of mapping and shot gun sequencing. Large pieces of DNA, on the order of 150,000 bp, are first cloned in BACs (Bacterial Artificial Chromosomes). The positions of the BACs are mapped, so it's known where they fit relative to each other and where they overlap. Then the sequence of each BAC is determined using a shotgun strategy.

The hybrid strategy maps a subset of clones, but the clones are not digested and sequenced. Instead, only paired end reads are generated for lots of clones of multiple sizes (from a few kb to large BACs). Some of the large clones are mapped to anchor the scaffolds.

I was actually planning to write about this topic soon (the que of intended posts is getting too long...). I may pump out a complementary post to yours.

Posted by: RPM | January 27, 2007 2:31 PM

2

That's a bit misleading. I don't think anyone was ever suggesting that each sequence read be mapped.

I didn't say "each read." I wrote "each fragment" (although I guess I should have specified that the fragments were clones). My crude estimate is also probably an underestimate, since at the time when these issues were most strongly debated (early-mid 90's) the reads were much shorter (more like 300-500 bp). You would also want a large number of clones so that you could have clones that overlapped.

I think a complementary post is a good idea. I will be posting more on this subject as well.

Posted by: Sandra Porter | January 27, 2007 5:11 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Advertisement

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM