Now on ScienceBlogs: "Investigative science journalism" and books I like to read [All of My Faults Are Stress Related]

Seed Media Group

The Week In ScienceBlogs: Sign up for our newsletter.

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Digital World Biology).

Search

Digital World Biology

Discover Biology with Bioinformatics

Subscribe to our newsletter


e-mail digitalbio at scienceblogs.com

use 'Digital World Biology' news as the subject

DigitalBio Favorites

Science Blogs School Fundraiser


link_donorschoose_small.gif


Recent Posts

Recent Comments

Categories

Blogroll

Science Education Groups

Keep up to date

Awards

Red Orbit

Digital Bio at Blogged

Wikio - Top Blogs - Sciences
Add Digital Bio to your Technorati Favorites!





Follow me on Twitter

When you need to laugh

Interesting places

The Tangled Bank
MicrobeWorld Radio

Locations of visitors to this page

Archives

« Basics: How do you sequence a genome? Part IV. How many reads does it take? | Main | Sequencing a Genome, part VI: Chimeras are not just funny-looking animals »

Sequencing a Genome, part V: checking out the library

Category: Ask Dr. ScienceBasicsBioinformaticsBiotechnologyScience education
Posted on: January 31, 2007 1:02 PM, by Sandra Porter

The general steps in genome sequencing were presented in the earlier installments ( there are links at the bottom of the page), but it's worth repeating them again since each of the earlier steps has a bearing on the outcome of those that come later.

These are:

  • Break the genome into lots of small pieces at random positions.
  • Determine the sequence of each small piece of DNA.
  • Use an assembly program to figure out which pieces fit together.

That first step, making a collection of DNA fragments (a library), with breakpoints at random positions is of critical importance to the success of later steps. As you can see in the image below, if you're going to reconstruct a genome sequence from pieces of DNA, you want pieces that overlap at several different points. If you don't have clones that begin and end at random points, you can't put the genome back together again.

random_library.gif

We had the opportunity a few years ago, to see someone test this idea and look at what happens with different methods of library preparation. We were hired as consultants to oversee some contract genome sequencing work and evaluate the quality of the sequencing operation. At that time, the broken bits of DNA for libraries were often prepared through sonication (1). Sonication involves bombarding something with sound waves. When a solution of DNA in a tube is sonicated, the DNA breaks at random positions. You can control the average-sized piece of DNA that's produced by changing the length of the sonication time, but it's kind of a crude technique. So, perhaps it's not surprising that every few years or so, someone will try other methods.

One other method for breaking DNA into pieces is to digest it with restriction enzymes. If you use restriction enzymes and limit the enzyme concentration or the digestion time, you can obtain conditions the DNA gets cut at some sites and not at others. When only some of the potential cut sites are actually cut, we call this a "partial digest." It seemed likely to the sequencing company (this was their first contract), that the digestion sites would be random and that this method could be used for making a random library.


What happens when you make a library by digesting DNA with restriction enzymes?

Two different libraries were prepared by using restriction digests, one used DraI and the other, AseI. DNA was isolated and sequenced from E. coli colonies that had been transformed with samples from each of the libraries. The chromatograms were loaded in the Finch® Server, processed through the standard analysis and assembly pipelines, and we looked at the results.

I'm going to present some of the results in a later post and for the moment concentrate on whether or not the fragments are random.

Did the libraries consist of random fragments?
Our past experience with partial RE digests suggested that it might be difficult to control the extent of RE digestion, leading to bias in the start positions of reads relative to each other. On the basis of those past observations, we decided to test whether, in fact, RE digest libraries represented random or non-random subclones. To test this idea, we assembled the different libraries and looked at the positions where the reads aligned to the contigs.


read_table.gif

Figure 3 (from ref. 2) shows an example report, from the Geospiza Finch Suite, that identifies where reads align to a contig sequence. You can see that many reads begin and end at the same positions in the contig. We also graphed some of these results with DrawMap (3) so you can see where different reads from the AseI library align to the contig.

align_graph.gif

Alignment between reads from the AseI library and one of the contigs (2).

Conclusion:
The genomic libraries that were prepared through partial digestion with restriction enzymes consisted on non-random fragments. But why would a library of short non-random fragments be a bad thing?

The image below, and also the DrawMap graph above, show that it will be difficult to assemble or reconstruct a DNA sequence with non-random fragments. If the fragments do not overlap each other, you can't join them together and you must do additional work and spend additional time to determine how the pieces fit together in the genome puzzle.


non_random_lib.gif

References:
1. E. Green. 2001. Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2:573-583.

2. Porter, S., Slagel, J., and T. Smith. 2004. Analysis of Genomic DNA Library Quality with the Finch®-Server. Geospiza, Inc. You can download the paper as a pdf document from here: http://www.geospiza.com/research/white-papers.htm
Look in the middle of the page.

3. Smith, T.M., Lee, M.K., Szabo, C.I., Jerome, N., McEuen, M., Taylor, M., Hood, L., and M.C. King. 1996. Complete Genomic Sequence and Analysis of 117 kb of Human DNA Containing the Gene BRCA1. Genome Res. 6:1029-49.

The earlier installments are here:
Part I: Introduction
Part II: Sequencing strategies
Part III: Reads and chromats
Part IV: How many reads does it take?


Copyright Geospiza, Inc.

         
Add to: Del.icio.us Digg  StumbleUpon Reddit  Facebook   Twitter

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/31980

Comments

1

You can see that many reads begin and end at the same positions in the contig.

I come across this a lot when blasting against tracefiles from Drosophila sequencing project. But when I see the pattern it's due to repetitive DNA. For example, if a blast a region containing a sequence that is present multiple times throughout the genome (either a duplicated region or transposable element) I'll get traces that came from the region I'm searching with along with traces from paralogous regions. The paralogous sequences will tend to have alignments that terminate at the same exact spot because that's where the duplicated region ends and unique sequence begins.

Posted by: RPM | January 31, 2007 4:11 PM

2

You're describing a different kind of thing and a different kind of experiment. I will get to repetitive DNA later in this series, but in this case, we were looking at a genome that doesn't contain repetitive DNA, so we knew that wasn't the problem.

Also, I forgot to mention that we had the control experiments, where a library from this same organism was prepared by sonication. So, we knew the explanation for our results.

With what you're seeing, there are many different reasons why blast alignments will show reads beginning and ending at the same positions in contigs. Exons and introns are one reason, repeats are another. I will do more things with blast later on and go through some of these.

Posted by: Sandra Porter | January 31, 2007 4:32 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Advertisement

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM