Sequencing a Genome, part V: checking out the library

By sporte on January 31, 2007.

The general steps in genome sequencing were presented in the earlier installments ( there are links at the bottom of the page), but it's worth repeating them again since each of the earlier steps has a bearing on the outcome of those that come later.

These are:

Break the genome into lots of small pieces at random positions.
Determine the sequence of each small piece of DNA.
Use an assembly program to figure out which pieces fit together.

That first step, making a collection of DNA fragments (a library), with breakpoints at random positions is of critical importance to the success of later steps. As you can see in the image below, if you're going to reconstruct a genome sequence from pieces of DNA, you want pieces that overlap at several different points. If you don't have clones that begin and end at random points, you can't put the genome back together again.

We had the opportunity a few years ago, to see someone test this idea and look at what happens with different methods of library preparation. We were hired as consultants to oversee some contract genome sequencing work and evaluate the quality of the sequencing operation. At that time, the broken bits of DNA for libraries were often prepared through sonication (1). Sonication involves bombarding something with sound waves. When a solution of DNA in a tube is sonicated, the DNA breaks at random positions. You can control the average-sized piece of DNA that's produced by changing the length of the sonication time, but it's kind of a crude technique. So, perhaps it's not surprising that every few years or so, someone will try other methods.

One other method for breaking DNA into pieces is to digest it with restriction enzymes. If you use restriction enzymes and limit the enzyme concentration or the digestion time, you can obtain conditions the DNA gets cut at some sites and not at others. When only some of the potential cut sites are actually cut, we call this a "partial digest." It seemed likely to the sequencing company (this was their first contract), that the digestion sites would be random and that this method could be used for making a random library.

What happens when you make a library by digesting DNA with restriction enzymes?

Two different libraries were prepared by using restriction digests, one used DraI and the other, AseI. DNA was isolated and sequenced from E. coli colonies that had been transformed with samples from each of the libraries. The chromatograms were loaded in the Finch® Server, processed through the standard analysis and assembly pipelines, and we looked at the results.

I'm going to present some of the results in a later post and for the moment concentrate on whether or not the fragments are random.

Did the libraries consist of random fragments?
Our past experience with partial RE digests suggested that it might be difficult to control the extent of RE digestion, leading to bias in the start positions of reads relative to each other. On the basis of those past observations, we decided to test whether, in fact, RE digest libraries represented random or non-random subclones. To test this idea, we assembled the different libraries and looked at the positions where the reads aligned to the contigs.

Figure 3 (from ref. 2) shows an example report, from the Geospiza Finch Suite, that identifies where reads align to a contig sequence. You can see that many reads begin and end at the same positions in the contig. We also graphed some of these results with DrawMap (3) so you can see where different reads from the AseI library align to the contig.

Alignment between reads from the AseI library and one of the contigs (2).

Conclusion:
The genomic libraries that were prepared through partial digestion with restriction enzymes consisted on non-random fragments. But why would a library of short non-random fragments be a bad thing?

The image below, and also the DrawMap graph above, show that it will be difficult to assemble or reconstruct a DNA sequence with non-random fragments. If the fragments do not overlap each other, you can't join them together and you must do additional work and spend additional time to determine how the pieces fit together in the genome puzzle.

References:
1. E. Green. 2001. Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2:573-583.

2. Porter, S., Slagel, J., and T. Smith. 2004. Analysis of Genomic DNA Library Quality with the Finch®-Server. Geospiza, Inc. You can download the paper as a pdf document from here: http://www.geospiza.com/research/white-papers.htm
Look in the middle of the page.

3. Smith, T.M., Lee, M.K., Szabo, C.I., Jerome, N., McEuen, M., Taylor, M., Hood, L., and M.C. King. 1996. Complete Genomic Sequence and Analysis of 117 kb of Human DNA Containing the Gene BRCA1. Genome Res. 6:1029-49.

The earlier installments are here:
Part I: Introduction
Part II: Sequencing strategies
Part III: Reads and chromats
Part IV: How many reads does it take?

More like this

Top 25 Books

Via a mailing list, the Top 1000 Books in the US, ranked in order of library holdings. The Top 25 (after the cut):

The Canadian War on Science: A chronological account of chaos & consolidation at the Department of Fisheries & Oceans libraries

As is occasionally my habit when a big story breaks, I have gathered together all the relevant documents I could find concerning the recent controversy about the Canadian Conservative government's recent consolidation of the libraries at their Department of Fisheries & Oceans.

Around the Apocalyptic Web: 7 Things Librarians Are Tired of Hearing and much, much more

7 Things Librarians Are Tired of Hearing Library without books debuts at Florida’s newest college

Friday Fun: The 7 Most Impressive Libraries From Throughout History

Thanks to Mark Spicer for bringing this item to my attention. Note that the site I'm linking to sells printer cartridges, but still has some cool content.

You can see that many reads begin and end at the same positions in the contig.

I come across this a lot when blasting against tracefiles from Drosophila sequencing project. But when I see the pattern it's due to repetitive DNA. For example, if a blast a region containing a sequence that is present multiple times throughout the genome (either a duplicated region or transposable element) I'll get traces that came from the region I'm searching with along with traces from paralogous regions. The paralogous sequences will tend to have alignments that terminate at the same exact spot because that's where the duplicated region ends and unique sequence begins.

You're describing a different kind of thing and a different kind of experiment. I will get to repetitive DNA later in this series, but in this case, we were looking at a genome that doesn't contain repetitive DNA, so we knew that wasn't the problem.

Also, I forgot to mention that we had the control experiments, where a library from this same organism was prepared by sonication. So, we knew the explanation for our results.

With what you're seeing, there are many different reasons why blast alignments will show reads beginning and ending at the same positions in contigs. Exons and introns are one reason, repeats are another. I will do more things with blast later on and go through some of these.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…