Basics: How do you sequence a genome? part III, reads and chromats

By sporte on January 28, 2007.

Shotgun sequencing. Sounds like fun.

Speculations on the origin of the phrase
I think that this term came from shotgun cloning. In the early days of gene cloning before cDNA, PCR, or electroporation; molecular biologists would break genomic DNA up into lots of smaller pieces, package DNA in lambda phage, transduce E. coli, and hope for the best. Consistent with the shotgun metaphor, we even used to store our microfuge tubes in plastic bullet boxes that my boss found at the sporting goods store. (Apparently this practice was unique to Minnesota, though. When I moved out west for graduate school, and asked where people bought bullet boxes, I got a lot of strange looks).

The dog-eat-dog world of DNA sequencing
I forgot to mention in the last post (but RPM reminded me) that there were some very heated debates about which sequencing strategy (mapping vs. shotgun) should be adopted. My husband was a post-doc in a genome center during the mid-90's, so I was treated to many amusing tales over the years about the controversial issues in the sequencing community. I'm not going to take sides but the story of sequencing the human genome is quite entertaining and I really enjoyed reading about it in: The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World by James Shreeve. The book is a quick read, quite a bit of fun, and presents the story from a viewpoint that's rarely heard. If you ever forget that scientists are as human and petty as anyone else, this is a book you should read.

And now, we return to our story:

So, what does shotgun sequencing a genome involve?

The basic steps, as I've mentioned before, are to:

Break DNA up into fragments.
Determine the sequence of nucleotides in each piece.
Use an assembly program to put the pieces in order.

These steps sound simple enough, but each one has it's own complications and I've oversimplified this process quite a bit so that I could focus on the general principles. Unlike some more optimistic bloggers, I doubt that anyone is going to sequence a genome in their kitchen anytime soon. The first challenge would be growing the bacteria. While it may be pretty easy to make L broth in your kitchen and sterilize it with a pressure cooker, I really don't want to walk into the kitchen and be struck by the aroma of E. coli in broth. Uh, uh. Our days of storing smelly bacterial cultures in our home refrigerator have mostly passed.

How did bacteria get involved? I don't see them in your three steps.

Bacteria come in between steps 1 and 2. Just wait, you'll see.

Send in the clones
i-c70f27837746fa4d6b611c2e535551e9-Escherichia-coli.gif In the first step of genomic sequencing, DNA is broken up into several smaller fragments that, ideally, overlap each other at random positions. The smaller fragments are cloned in E. coli (see, I told you E. coli would be involved). Template DNA is prepared from these bacterial colonies and used for sequencing.

One question you might have at this point, is how many clones do you need? Or perhaps, a better question, though is how many reads do you need? I'm going to discuss that question in the next post, before I do that though, I want to define the term "read."

What is a read?
In shotgun sequencing, sequences are obtained from each cloned fragment of DNA. Each nucleotide sequences is called a "read." The reads are used later to reconstruct the original sequence.

Reads are obtained from the data files produced by DNA sequencing instruments. These data files are known sometimes as "electropherograms," sometimes as "trace files," and sometimes as "chromatograms" or "chromats" (as we affectionately refer to them at work).

A chromatogram contains lots of experimental details and information about the run conditions, along with data that can be plotted and viewed as a trace or graph, outlining the signal strengths from each of the four bases. A read is the sequence of nucleotides obtained from the chromatogram file.

The image below is from FinchTV® (a program that we make). The trace is the colorful graph. The read is the sequence of letters at the top of each row, and above the read are quality values. In this chromatogram, the quality values came from Applied Biosystem's KB® base-calling program.

How many reads do you need to put a genome together? Learn about this part in the next installment.

Read the other bits: part I, part II
Part V: checking out the library

More like this

Sandy,
Thanks for this series on how to sequence a genome. Although I've already told my students about your blog, I'll point them to it again. What a great teaching tool!

Thanks Ying-Tsu! I hope I see you in Berkeley this June!

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

What An Eclipse Means For US President Donald Trump

More by this author

New home for Discovering Biology in a Digital World

October 30, 2017

Sometime in the next day or two, Scienceblogs will shut down. We've enjoyed the opportunity to blog here for the past 10+ years. Not to worry, @digitalbio and @finchtalk will continue blogging, but more so from their own site at Digital World Biology. The Scienceblogs posts have been…

Synbiobeta: The Future is Now

October 12, 2017

@synbiobeta concluded it’s #sbbsf17 annual meeting on synthetic biology Oct 5, 2017. The progress companies are making in harnessing biology as a platform for manufacturing and problem solving is world changing. Locations of Synbio Companies What is Synthetic Biology? Synthetic biology is a term…

Understanding the CRISPR Cas9 system

September 18, 2016

On Sept. 30th, I'm going to be co-presenting a Bio-Link webinar on Genome Engineering with CRISPR-Cas9 with Dr. Thomas Tubon from Madison College. If you're interested, Register here. Since my part will be to help our audience understand the basics of this system, I prepared a…

Zika virus, drug discovery, and student projects

March 8, 2016

It's well understood in science education that students are more engaged when they work on problems that matter. Right now, Zika virus matters. Zika is a very scary problem that matters a great deal to anyone who might want to start a family and greatly concerns my students. I…

DNA: it's in your blood

February 28, 2016

Did you know small fragments of DNA are circulating in your blood stream? These short pieces of DNA are left behind after cells self-destruct. This self-destruction, or apoptosis, is a normal process. In the case of fetal development, certain cells in our hands die, leaving behind individual…