Seed Media Group

Discovering Biology in a Digital World

My thoughts on biology, teaching, life, and exploring the living world via the digital one. Only my opinions are represented by these postings, they do not represent the viewpoints of any funding agency or Geospiza, Inc.

Profile

Sandra Porter I am a microbiologist and molecular biologist turned tenured biotech faculty turned bioinformatics scientist turned entrepreneur. My passion is developing instructional materials for 21st century biology (Geospiza Education).

Search this blog

Learn about DNA with molecular models

Exploring DNA Structure


Subscribe to Geospiza Education News


e-mail digitalbio at gmail.com


DigitalBio Favorites

Molecular Momentos


Recent Posts

Recent Comments

Archives

Categories

Rotating Blogroll

Science Education Groups

Science Blogs School Fundraiser



Keep up to date

Awards

Red Orbit

Digital Bio at Blogged


Add Digital Bio to your Technorati Favorites!

Interesting places

  • xkcd
  • The Tangled Bank
    MicrobeWorld Radio

    « Playing in the dirt: metagenomics on the JHU campus | Main | Tales from the lab »

    Metagenomics, biomes, and dirt: separating good data from bad

    Category: BioinformaticsMicrobiologyScience educationbiotechnologyclassroom activitiesenvironmental educationsequence analysisteachingweb resources
    Posted on: October 27, 2007 7:00 PM, by Sandra Porter

    The simple fact is this: some DNA sequences are more believable than others.

    The problem is, that many students and researchers never see any of the metrics that we use for evaluating whether a sequence is "good" and whether a sequence is "bad."

    All they see are the base calls and sequences: ATAGATAGACGAGTAG, without any supporting information to help them evaluate if the sequence is correct. If DNA sequencing and personalized genetic testing are to become commonplace, the practice of ignoring data quality is (in my opinion) simply unacceptable.

    So, for awhile anyway, I'm making a bunch of this data available on-line and I'll describe how to work with it and what it means.

    To see some DNA sequence data, with quality values:
    1. go to http://classroom1.bio-rad.ifinch.com
    2. log in with the user name: BR_guest
    3. and the password: guest

    When you get there, click the link to see the folders that I've set up.

    This link takes you to a folder with student data from 2005. (Learn more about the project) Then, click the link to see a summary of information about the chromatograms.


     


    chromat_table.png
    When you get to the chromatogram table, you can see some information about the quality of each chromatogram. You can take a closer look at the data by clicking the FinchTV link to open the chromatogram in FinchTV. (FinchTV is freely available here from Geospiza.)

     

    Which values do you think correspond to good data?

    Which values are associated with poor quality data?

    Feel free to sort the data and play with it a bit. What fraction of the sequences would you say are "good"?

    Post a Comment

    (Email is required for authentication purposes only. Comments are moderated for spam, your comment may not appear immediately. Thanks for waiting.)





    Having problems commenting? (UPDATED)

    Search All Blogs

    Blogs in the Network

    Top Five: Most Active

    1. Biology needs to explain gravity? 05.16.2008 · PZ Myers
    2. We are such bad boys 05.16.2008 · PZ Myers
    3. California Supreme Court Overturns Ban on Gay Marriage 05.16.2008 · Ed Brayton
    4. Pielke train wreck 05.16.2008 · Tim Lambert
    5. Burn the Sikh! He's Different From Us! 05.16.2008 · Ed Brayton

    Top Science Stories

    powered by SEED - seedmagazine.com