bioinformatics

I agree with ScienceBlogling Sandra: many biologists need to know SQL. She has some very helpful links if you don't know much (or anything) about SQL.
yep, I've become a videoblogger, at least sometimes. See the first video below. Be kind in the comments, this is a new thing for me. This video introduces the different blast programs, discusses word size, and how blastn works, the blastn score and the E value. The treatment is light and not too in depth, but as I said, it's an introduction. A quick introduction to BLAST from Sandra Porter on Vimeo.
A long standing debate in my field is whether or not biologists, who work with computers, need to learn how to program. I usually say "no." Let the programmers program, the biologists interpret the results, and let everyone can benefit from each other's expertise. Well, I've changed my mind in one respect. Most biologists need to work with some kind of database these days and I've discovered that it's really helpful to know something about SQL. Even a tiny bit of SQL, like "SELECT * from table" goes a long, long way. This revelation didn't happen overnight and when I decided a few…
In which we're reminded that database searches are experiments, too. One of the trickiest things with bioinformatics experiments is repeating them. This challenge isn't related to the validity of the original results, the challenge is that, unless you made your own database and kept it in the same state, the database that you'll be using at a later time, sometimes even a day later, is a different database. And, if you query a different database, you may get a different result. The series that I'm currently posting is one that I started working on a couple of years ago. Originally, I was…
How do you go about researching a genetic disease? This multi-part series explores how digital resources can be used to learn about Huntingtin's disease. Reposted and updated from the original DigitalBio. A bit of background Alice's Restaurant is a movie with an unforgettable song that mostly revolves around Arlo Guthrie hanging out with his friends. Somewhere in the movie, the conversation turns to Woody, and someone asks the question that no one wants to touch. Does Arlo's girlfriend know about Huntington's? ...dead silence... Now, I did see the movie quite a few years ago, so my…
Last week I posted an image with two molecules (below the fold), one protein and one nucleic acid, and asked you about the probability of finding similar molecules in different species. You gave me some interesting answers. DAG made me clarify my question by asking what I meant by "similarity." I was wondering whether I would be likely to find a statistically relevant match by doing a BLAST search and I hadn't really thought about the cutoff values. I decided to guess and say that that the protein would be about 30% similar and the nucleic acid about 60%. Paul gave me some answers…
Earlier this week, I attended the International Human Microbiome Consortium Meeting (the human microbiome consists of the organisms that live on and in us). I'm not sure to make of the whole microbiome initiative, but one thing is clear to me: this is being driven by the wrong group of scientists. Instead of being directed by biologists (medical primarily) who have devised a set of important questions, and want to use the power of high throughput genomics, including metagenomics which sequences of all the DNA in a specimen--bacteria, viruses, fungi, protozoa, and, yes, human (which raises…
If you like ham and bacon, you might be interested in this. GenomeWeb reports that researchers at the University of Barcelona have developed an assay that tests 46 SNPs and can be used to trace the origin of your pork dinner. According to GenomeWeb, the test identifies both the breed and origin of the animal. The university and the company said meat traceability is necessary to ensure consumer safety, particularly in cases of infectious disease outbreaks or accidental feed contamination. No more doubts about the home of your Jamón.
'Tis the holiday season and, according to ancient lore, the time when miraculous events are most likely to take place. One of those well-known and miraculous events of ancient days was the birth of a son to a young girl, who, although she was married (Okay, I'm not sure about this part of the story) she was said to be a virgin and the birth to be a miracle. Hmmm. How do you think the news would be received if that sort of thing happened today? Certainly, if the young girl were to produce a grilled cheese sandwich with a burn spot that vaguely resembled a woman in a robe, someone might be…
Which read(s): 1. contain either a SNP (a single nucleotide polymorphism) or a position where different members of a multi-gene family have a different base? C 2. doesn't have any DNA? B 3. is a PCR product? A, B, and C.  All of three reads were obtained by sequencing PCR products, generated with the same set of primers. The quality plots that I refer to are here.
Since DNA diagnostics companies seem to be sprouting like mushrooms after the rain, it seemed like a good time to talk about how DNA testing companies decipher meaning from the tests they perform. Last week, I wrote about interpreting DNA sequence traces and the kind of work that a data analyst or bioinformatics technician does in a DNA diagnostics company. As you might imagine, looking at every single DNA sample by eye gets rather tiring. One of the things that informatics companies (like ours) do, is to try and help people analyze several samples at once so that they can scan fewer…
As many of you know, I'm a big fan of do-it-yourself biology. Digital biology, the field that I write about, is particularly well-suited to this kind of fun and exploration. Last week, I wrote some instructions for making a phylogenetic tree from mitochondrial genomes. This week, we'll continue our analysis. I wrote this activity, in part, because of this awful handout that my oldest daughter brought home last year. She presented me with an overly photocopied paper that showed several protein sequences from cytochrome C in several creatures. She said she was supposed count the…
DNA sequence traces are often used in cases where: We want to identify the source of the nucleic acid. We want to detect drug-resistant variants of human immune deficiency virus. We want to know which base is located at which position, especially where we might be able to diagnose a human disease or determine the best dose of a therapeutic drug. In the future, these assays will likely rely more on automation. Currently, (at least outside of genome centers) many of these results are assessed by human technicians in clinical research labs, or DNA testing companies, who review these data by…
Students at Soldan International High School are participating in an amazing experiment and breaking ground that most science teachers fear to tread. Soldan students, along with hundreds of thousands of other people, are participating in the National Geographic's Genographic Project. Through this project, students send in cheek swabs, DNA is isolated from the cheek cells, and genetic markers are used to look at ancestry. Genetic markers in the mitochondrial DNA are used to trace ancestry through the maternal line and markers on the Y chromosome can be used to learn about one's father.…
Last year I wrote about an experiment where I compared a human mitochondrial DNA sequence to primate sequences in the GenBank. Since I wanted to know about the differences between humans, gorillas, and chimps, I used the Entrez query 'Great Apes' to limit my search to a set of sequences in the PopSet database that contained gorillas, bonobos, chimps, and human DNA. A week ago, I tried to repeat this experiment and... It didn't work. All I saw were human mitochondrial sequences.  I know the other sequences match, but I didn't see them since there are so many human sequences that match…
Metagenomics is a field where people interrogate the living world by isolating and sequencing nucleic acids. Since all living things have DNA, and viruses have either DNA or RNA, we can identify who's around by looking at bits of their genome. Researchers are using this approach to find the culprit that's killing the honeybees. We're also trying to find out who else shares our bodies, and lives in our skin, in our stomachs, and other places where the sun doesn't shine. Craig Venter used metagenomics when he sailed around the world and sequenced DNA samples from the Sargasso Seas. In this…
The simple fact is this: some DNA sequences are more believable than others. The problem is, that many students and researchers never see any of the metrics that we use for evaluating whether a sequence is "good" and whether a sequence is "bad." All they see are the base calls and sequences: ATAGATAGACGAGTAG, without any supporting information to help them evaluate if the sequence is correct. If DNA sequencing and personalized genetic testing are to become commonplace, the practice of ignoring data quality is (in my opinion) simply unacceptable. So, for awhile anyway, I'm making a…
We have lots of DNA samples from bacteria that were isolated from dirt. Now it's time to our own metagenomics project and figure out what they are. Our class project is on a much smaller scale than the honeybee metagenomics project that I wrote about yesterday, but we're using many of the same principles. The general process is this: 1. We sort the chromatogram data to identify good data and separate it from bad data. Informatics can help you determine if data is good, and measure how good it is, but it cannot turn bad data into good data. And, there's no point in wasting time with…
The next time you bite into a crisp juicy apple and the tart juices spill out around your tongue, remember the honeybee. Our fall harvest depends heavily on honeybees carrying pollen from plant to plant. Luscious fruits and vegetables wouldn't grace our table, were it not for the honeybees and other pollinators. Lately though, the buzz about our furry little helpers hasn't been good. Honeybees have been dying, victims of a new disease called "colony collapse disorder," with the US, alone losing a large number of hives in recent years. Why? Researchers have speculated about everything…
Would you like to have some fun playing with chromatograms and helping our class identify bacteria in the dirt? This quarter, my bioinformatics class, at Shoreline Community College, will be working with chromatograms that were obtained by students at Johns Hopkins University, and graciously made available by Dr. Rebecca Pearlman. (See see "Sequencing the campus at the Johns Hopkins University" for more background.) We are going to do a bit of metagenomics by using FinchTV and blastn to identify the soil bacteria that were sampled from different biomes and then use an SQL query that I…