Basics: How do you sequence a genome? part III, reads and chromats

Shotgun sequencing. Sounds like fun.

Speculations on the origin of the phrase
I think that this term came from shotgun cloning. In the early days of gene cloning before cDNA, PCR, or electroporation; molecular biologists would break genomic DNA up into lots of smaller pieces, package DNA in lambda phage, transduce E. coli, and hope for the best. Consistent with the shotgun metaphor, we even used to store our microfuge tubes in plastic bullet boxes that my boss found at the sporting goods store. (Apparently this practice was unique to Minnesota, though. When I moved out west for graduate school, and asked where people bought bullet boxes, I got a lot of strange looks).

The dog-eat-dog world of DNA sequencing
I forgot to mention in the last post (but RPM reminded me) that there were some very heated debates about which sequencing strategy (mapping vs. shotgun) should be adopted. My husband was a post-doc in a genome center during the mid-90's, so I was treated to many amusing tales over the years about the controversial issues in the sequencing community. I'm not going to take sides but the story of sequencing the human genome is quite entertaining and I really enjoyed reading about it in: The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World by James Shreeve. The book is a quick read, quite a bit of fun, and presents the story from a viewpoint that's rarely heard. If you ever forget that scientists are as human and petty as anyone else, this is a book you should read.

And now, we return to our story:

So, what does shotgun sequencing a genome involve?

The basic steps, as I've mentioned before, are to:

  1. Break DNA up into fragments.
  2. Determine the sequence of nucleotides in each piece.
  3. Use an assembly program to put the pieces in order.

These steps sound simple enough, but each one has it's own complications and I've oversimplified this process quite a bit so that I could focus on the general principles. Unlike some more optimistic bloggers, I doubt that anyone is going to sequence a genome in their kitchen anytime soon. The first challenge would be growing the bacteria. While it may be pretty easy to make L broth in your kitchen and sterilize it with a pressure cooker, I really don't want to walk into the kitchen and be struck by the aroma of E. coli in broth. Uh, uh. Our days of storing smelly bacterial cultures in our home refrigerator have mostly passed.

How did bacteria get involved? I don't see them in your three steps.

Bacteria come in between steps 1 and 2. Just wait, you'll see.

Send in the clones
i-c70f27837746fa4d6b611c2e535551e9-Escherichia-coli.gifIn the first step of genomic sequencing, DNA is broken up into several smaller fragments that, ideally, overlap each other at random positions. The smaller fragments are cloned in E. coli (see, I told you E. coli would be involved). Template DNA is prepared from these bacterial colonies and used for sequencing.

One question you might have at this point, is how many clones do you need? Or perhaps, a better question, though is how many reads do you need? I'm going to discuss that question in the next post, before I do that though, I want to define the term "read."

What is a read?
In shotgun sequencing, sequences are obtained from each cloned fragment of DNA. Each nucleotide sequences is called a "read." The reads are used later to reconstruct the original sequence.

i-c722b6a3a1b3d56ba330e862cba3062f-reads.gif

Reads are obtained from the data files produced by DNA sequencing instruments. These data files are known sometimes as "electropherograms," sometimes as "trace files," and sometimes as "chromatograms" or "chromats" (as we affectionately refer to them at work).

A chromatogram contains lots of experimental details and information about the run conditions, along with data that can be plotted and viewed as a trace or graph, outlining the signal strengths from each of the four bases. A read is the sequence of nucleotides obtained from the chromatogram file.

The image below is from FinchTV® (a program that we make). The trace is the colorful graph. The read is the sequence of letters at the top of each row, and above the read are quality values. In this chromatogram, the quality values came from Applied Biosystem's KB® base-calling program.

i-c51739c6a8bb7dd134dfd221506a5224-finchtv.gif

How many reads do you need to put a genome together? Learn about this part in the next installment.

Read the other bits: part I, part II
Part V: checking out the library

More like this

Sandy,
Thanks for this series on how to sequence a genome. Although I've already told my students about your blog, I'll point them to it again. What a great teaching tool!

By Ying-Tsu Loh (not verified) on 28 Jan 2007 #permalink