What do you do when base-callers disagree?
Okay DNA sequencing community, I want your help with this one. One of these sequences was called by phred and the other by the ABI KB base calling program.
Which one should I believe?
Sometimes I open up files and do short experiments just because – well, I’m curious. And sometimes I immediately wish I hadn’t done that because what I opened looks like a larger can of worms than I really want to see.
These graphs show the quality of each base, in a DNA sequence, on the y axis and the position of that base on the X axis. For phred, a quality value corresponds a probability of a base being correctly identified. A quality value of 20, means a 1% chance that the base is wrong, 30 corresponds to 0.1% chance that’s been called incorrectly, and 40, means a one in 10,000 chance that the base has been misidentified. People accept values around 20, but want values around 40. (read more about phred)
These graphs were generated from the same chromatogram file, but processed by different base calling programs.
I won’t tell you which graph was produced by which base caller, but the chromatogram was obtained in 2006 from an ABI 3730 DNA sequencing instrument.
In theory, these graphs should be identical, or at least very similar, but unfortunately, I’m not sure which of these graphs is the one that I should believe. One of the base callers is considerably more optimistic, quality-wise than than the other.
So I’m asking you. What are you using for base callers these days? How are you checking the accuracy of your data?