bioinformatics

Part II. What do mumps proteins do? And how do we find out? This is the second in a five part series on an unexpected discovery of a paramyxovirus in mosquitoes, and a general method for finding interesting things. I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a mosquito paramyxovirus V. A general method for finding interesting things in GenBank In Part I, we looked at the NCBI SeqViewer, and found a new way to check out a genome map, and learn more…
Part I. The back story from the genome record Together, these five posts describe the discovery of a novel paramyxovirus in the Aedes aegyptii genome and a new method for finding interesting anomalies in GenBank. I. The back story from the genome record II. What do the mumps proteins do? And how do we find out? III. Serendipity strikes when we Blink. IV. Assembling the details of the case for a mosquito paramyxovirus V. A general method for finding interesting things in GenBank I began this series on mumps intending to write about immunology and how vaccines work to stimulate the immune…
A quick note for bioinformaticians in the audience: Neil Saunders has an excellent post on parsing (i.e. processing a file to retrieve specific sections of interest). Neil's hints are a useful introduction for beginners, but also provide some handy reminders for long-time programmers.
One of the things that drives me crazy on occasion is nomenclature. Well, maybe not just nomenclature, it's really the continual changes in the nomenclature, and the time it takes for those changes to ripple through various databases and get reconciled with other kinds of information. And the realization that sometimes this reconciliation may never happen. One of the projects that I've been working on during the past couple of years has involved developing educational materials that use bioinformatics tools to look at the isozymes that metabolize alcohol. As part of this project, I've been…
A few weeks ago, I wrote about a paper in Science(1) that I read on a connection between a mutation in the dopamine D2 receptor and the genetics of learning. Only, it turned out that when I looked at the gene map... the mutation mapped in a completely different gene. I presented the data here and wrote a bit about my surprise at finding this mistake and even greater surprise at seeing this same mistake perpetuated by others. Now, I have some updates to the story. The folks at the NCBI responded quickly and added annotations to both the DRD2 and the ANKK1 citations in the Gene database. Now…
This the third part of case study where we see what happens when high school students clone and sequence genomic plant DNA. In this last part, we use the results from an automated comparison program to determine if the students cloned any genes at all and, if so, which genes were cloned. (You can also read part I and part II.) Did they clone or not clone? That is the question. But first, we have to answer a different question about which parts of their reads are usable and which parts are not. (A read is the sequence of bases obtained from a chromatogram file.) How does our data get…
This the second part of three part case study where we see what happens when high school students clone and sequence genomic plant DNA. In this part, we do a bit of forensics to see how well their sequencing worked and to see if we can anything that could help them improve their results the next time they sequence. How well did the sequencing work? Anyone who sequences DNA needs to be aware of two kinds of problems that afflict their results. We can divide these into two categories: technical and biological. Technical problems are identified using quality values and the number of bases…
What happens when high school students clone and sequence genomic DNA? Background DNA sequencing is a wonderful tool for discovery and a great technique for getting students involved in molecular science. This fall, Bio-Rad will officially begin selling their DNA cloning and sequencing kit. Now, students across the country will have the tools in hand to begin their own projects cloning and sequencing plant genes. Of course, without bioinformatics there's no way to know what's been cloned or sequenced. This is where we come in. As part of an agreement with Bio-Rad, we adapted a version of…
One of the things I find fascinating about the Weather Channel is that after watching it for a while, you actually start to worry about that cold front moving through some other part of the country. You become quite paranoid about things that won't affect you. Well, I've got an even better way to drive yourself nuts about scary things that won't affect you: HealthMap.org. I'm kidding. Healthmap.org is actually really interesting--it gives you a visual representation of all of the disease outbreaks globally. You can change the dates, diseases, and locations you want to look at. It also…
or is it just an idea that's ahead of the curve? Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here. Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else. If computers were responsible for the annotation, well, this would be understandable.…
It's pretty common these days to pick up an issue of Science or Nature and see people ranting about GenBank (1). Many of the rants are triggered, at least in part, by a wide-spread misunderstanding of what GenBank is and how it works. Perhaps this can be solved through education, but I don't think that's likely. People from the NCBI can explain over and over again that some of the sequence databases in GenBank are meant to be an archival resource (2), and define the term "archive," but that's not going to help. Confusion about database content and oversight is widespread in this…
In its simplest sense, we imagine that learning occurs through a series of positive and negative rewards. Some actions lead to pleasure, others to pain, and it seems reasonable to expect that people will repeat the actions with pleasurable results and avoid those that ended in pain. Yet, we all know people who aren't deterred by the idea of punishment. We all know people who never seem to learn. Could there be a physical reason, hidden in their genes? In December 2007, Science published a study by Klein et. al. (1) where they asked if a specific genotype at a location called "DRD2-TAQ-IA"…
Hey students: if you are looking for a summer internship in marine metagenomics and you can get your application together before June 16th, Jonathan Eisen posted information about an open position on his blog. It also looks like he's looking for post-docs (see the side bar on the right of this page.)
In part I, I wrote about my first semester of teaching on-line and talked about our challenges with technology. Blackboard had a database corruption event during finals week and I had all kinds of struggles with the Windows version of Microsoft Excel. Mike wrote and asked if I thought students should be working more with non-Microsoft software and what I thought the challenges would be in doing so. I can answer with a totally unqualified "it depends." First, I think knowing how to use a spread-sheet program is an advantage in many different kinds of fields and even in real-life, outside of…
I got my copy of "A short guide to the human genome" by Stewart Scherer today from Cold Spring Harbor Laboratory Press (2008, ISBN 978-087969791-4). Usually, I would wait until after I've read a book to write a review, but this book doesn't require that kind of study. As soon I skimmed through it and read some of the questions and answers, I knew this would be the kind of quick reference that I would like to have sitting above my desk. Scherer has compiled a wonderful text that not only answers many of the kinds of questions that I can think to ask about the human genome, but the kinds of…
A potential link between lung cancer and human papilloma virus may make parents even more glad about vaccinating their children with Gardasil®. Not only are the children protected against viruses that commonly cause cervical cancer, they may be protected against some forms of lung cancer as well. The April 25th version of Nature News reports (1) that two viruses, HPV (Human papilloma virus) and measles virus, have been found in lung tumors. From Nature News: Samuel Ariad of the Soroka Medical Center in Beer Sheva, Israel, and his colleagues began by analyzing tumours taken from 65 lung…
Over 2600 genetic diseases have been found where a change in a single gene is linked to the disease. One of the questions we might ask is how those mutations change the shape and possibly the function of a protein? If the structures of the mutant and wild type (normal) proteins have been solved, NCBI has a program called VAST that can be used to align those structures. I have an example here where you can see how a single amino acid change makes influenza resistant to Tamiflu®. This 4 minute movie below shows how we can obtain those aligned structures from VAST and view them with Cn3D.…
One of my favorite web 2.0 technologies is the webinar. When you work at a company and not a University, with constant seminars, it gets a bit harder to hop on a bus and travel across town to learn about new things. Webinars are a good way to fill that gap. I grab my coffee cup, put on my headphones, and I get to listen to someone tell me about their work for an hour and show slides over the web. It's nice. Our company is even going to be involved in two webinars in the next two months. One of us is giving an Illumina webinar tomorrow on managing Next Generation Sequencing data. A…
In the class that I'm teaching, we found that several PCR products, amplified from the 16S ribosomal RNA genes from bacterial isolates, contain a mixed base in one or more positions. We picked samples where the mixed bases were located in high quality regions of the sequence (Q >40), and determined that the mixed bases mostly likely come from different ribosomal RNA genes. Many species of bacteria have multiple copies of 16S ribosomal RNA genes and the copies can differ from each other within a single genome and between genomes. Now, in one of our last projects we are determining where…
I know some of you enjoy looking at data and seeing if you can figure out what's going on. For this Friday's puzzler, I'm going to send you to FinchTalk, our company blog, to take a look at lots of data from a resequencing experiment that was done to look for SNPs and count alleles. The graph is at the end of the post. The graph shows data from 4608 reads (sequenced from both strands, forward and reverse). And there are some interesting patterns. Can you figure them out?