The simple fact is this: some DNA sequences are more believable than others.
The problem is, that many students and researchers never see any of the metrics that we use for evaluating whether a sequence is "good" and whether a sequence is "bad."
All they see are the base calls and sequences: ATAGATAGACGAGTAG, without any supporting information to help them evaluate if the sequence is correct. If DNA sequencing and personalized genetic testing are to become commonplace, the practice of ignoring data quality is (in my opinion) simply unacceptable.
So, for awhile anyway, I'm making a bunch of this data available on-line and I'll describe how to work with it and what it means.
To see some DNA sequence data, with quality values:
1. go to http://classroom1.bio-rad.ifinch.com
2. log in with the user name: BR_guest
3. and the password: guest
When you get there, click the link to see the folders that I've set up.
This link takes you to a folder with student data from 2005. (Learn more about the project) Then, click the link to see a summary of information about the chromatograms.

Which values do you think correspond to good data?
Which values are associated with poor quality data?
Feel free to sort the data and play with it a bit. What fraction of the sequences would you say are "good"?
I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (




Having problems commenting? (UPDATED)