Sequencing a Genome, part VI: Chimeras are not just funny-looking animals

To the ancient Greeks, a chimera was a kind of monster, with the body of a goat, the tail of a dragon, and a lion's head. To geneticists, a chimera can be an animal that's derived from two embryos, such as a transgenic mouse. Or if the organism is a plant, it can be a plant with a graft. We have a chimeric cherry tree in our back yard with branches from Rainier cherries, Bing cherries, and Van cherries. And you should see the chimeras that hang out at evolgen.

Naturally, the DNA cloning and sequencing world has it's chimeras, too. There are two main kinds that I know. Sometimes chimeras are created during a cloning step when two fragments of DNA that are normally separated, become joined by accident in the cloning step.

i-fe7ebfd91037eae2415a9e83afe52bde-chimeric_clones.gif

Can you spot the chimera?

Sometimes chimeras are created during electrophoresis. In the early days of DNA sequencing instruments, when fluorescently-labeled DNA fragments were separated by size in polyacrylamide gels, chimeras could appear during the gel runs.

i-8ea7671b6f8e2a8f97d64c0af8627758-abigel.gifThis image shows what the DNA in a gel would look like. Labeled DNA fragments from 8 different samples, are lined up in 8 lanes, straight up and down. Occasionally, while the gel was running, the tracking software would shift lanes, and start reading information from the next lane. These events produced chimeric reads, that is, reads with DNA sequences originating from two different samples. (What is a read?)

i-b340673968df1f6164ff06a6cd40e4d4-chimeric_read.gif

A chimeric read.

Believe it or not, one of my former students had a night job at a genome center that involved staring at computer screens and making corrections if the gel tracking software started to misbehave.

I'm sure you're wondering, about now, how this all relates to genome sequencing.

Yesterday, I wrote about some of our experiences with analyzing some genomic libraries from phage (1). Some of the libraries had been created from DNA that was broken into random pieces by sonication. Other libraries were created from genomic DNA that was digested with restriction enzymes.

Remember, there are 3 main steps in sequencing genomes (and I suspect if you read a few of these posts you will never forget this):

  • Break the genome into lots of small pieces at random positions.
  • Determine the sequence of each small piece of DNA.
  • Use an assembly program to figure out which pieces fit together.

We got such strange results from trying to reconstruct genome sequences from the RE (restriction enzyme) libraries that we knew there must be something strange going on. Yesterday, we established that the libraries weren't random, but we thought there must be other problems, too.

i-daf3e6e54b915243417dc0d56856abe7-random_library.gif

This is what happens when you assemble DNA that's been broken at random positions.

Hunting for chimeras.

You can imagine that if the DNA sequences in our reads are from different locations, themselves, things can get pretty confusing, pretty fast. We thought that if some of the reads were chimeras, that might explain the bizarre results from our assemblies.

Fortunately, Phrap (2), the assembly program that we use in the Finch Suite, has an option to detect chimeras and problem reads. We put it to the test. We assembled different sets of sequences that had been sampled over time, and to each assembly, we added different numbers of new reads.

i-2f52a811e8e5729250b864e1cb81dace-chimera_graph.gif

What happens when we add RE reads to our assemblies?

The first half of the graph shows the results from assembling 3148 reads obtained from the sonicated DNA library. We started reads from the RE libraries at the position of the vertical bar. In total, we added 1187 reads from the AseI and DraI libraries. Phrap detected 67 chimeras in the last assembly (4.7% of the added reads) and comparatively few in the reads from the sonicated DNA library.

I'm sure glad we weren't sequencing manatees.

References:
1. E. Green. 2001. Strategies for the systematic sequencing of complex genomes. Nature Reviews Genetics 2:573-583.

2. Porter, S., Slagel, J., and T. Smith. 2004. Analysis of Genomic DNA Library Quality with the Finch®-Server. Geospiza, Inc. You can download the paper as a pdf document from here: http://www.geospiza.com/research/white-papers.htm
Look in the middle of the page.

The earlier installments are here:
Part I: Introduction
Part II: Sequencing strategies
Part III: Reads and chromats
Part IV: How many reads does it take?
Part V: Checking out the library

Copyright Geospiza, Inc.

More like this

The general steps in genome sequencing were presented in the earlier installments ( there are links at the bottom of the page), but it's worth repeating them again since each of the earlier steps has a bearing on the outcome of those that come later. These are: Break the genome into lots of small…
How to win the X PRIZE in genomics In October, 2006, the X PRIZE foundation announced that second X prize would focus on genomics. The first team to successfully sequence 100 human genomes in 10 days will win $10 million dollars. And I would venture to guess, that the winning team would also win…
"How much do I love you? I'll tell you no lie. How deep is the ocean? How high is the sky?" - Irving Berlin The other installments are here: Part I: Introduction Part II: Sequencing strategies Part III: Reads and chromats Part V: checking out the library We all know that sequencing a genome…
About a week ago, I offered to answer questions about subjects that I've either worked with, studied or taught. I haven't had many questions yet, but I can certainly answer the ones I've had so far. Today, I'll answer the first question: How do you sequence a genome? Before we get into the…