Hunting for huntingtin, part II: In which we're reminded that database searches are experiments

In which we're reminded that database searches are experiments, too.

One of the trickiest things with bioinformatics experiments is repeating them. This challenge isn't related to the validity of the original results, the challenge is that, unless you made your own database and kept it in the same state, the database that you'll be using at a later time, sometimes even a day later, is a different database. And, if you query a different database, you may get a different result.

The series that I'm currently posting is one that I started working on a couple of years ago. Originally, I was going to repost these stories as is, but it seemed best to add another twist and see if I could reproduce some of the results, or at least find out which results have changed. In the next few posts, you'll see the results of those experiments.

Playing catch-up with the latecomers
Hi, for those of you who've just joined us, we've gotten lost in some databases while hunting for information on huntingtin. If you'd like to catch up a bit and come back later, you might want to read Hunting for huntingtin (part I).

If not, here's a brief synopsis of the plot and what we've done so far:

  • learned about Woody Guthrie and Nancy Wexler
  • found a couple of reviews describing Huntington Disease
  • got the HD gene sequence and counted the number of CAGs
  • we learned the CAG codes for glutamine and that glutamine can form hydrogen bonds

Then, we got curious about those extra CAGs and wanted to know if they result from the disease or cause the disease. So we looked up huntingtin at the UCSC genome browser and saw that there are similar genes in mouse, pigs, and zebra fish (plus a few other members of the animal kingdom that were not discussed).

Ah hah!

Since mice have a similar gene - and we know that the Jackson Lab is the place to go for all things mouse - sure enough, the Jackson mouse breeders have made mice with extra CAGs, and .....the mice get the symptoms of HD.

So you guessed it, the extra CAGS are the problem, not the result.

As the fearless leader of this expedition, I vote now we look at those extra CAGs a little more closely.

Searching for the lost glutamines
You might remember, in part I, I mentioned looking for 3-D structures with polyglutamine. I did find one structure with a polyglutamine sequence, but it looked like the crystallographers weren't able to resolve the part in the structure where the glutamines were supposed to be. Cn3D shows the missing glutamines in grey in the sequence window. The structure window shows this:i-4e646f43792a404167cdcecbc318ab05-polyglu_protein.jpg

Looking for other structures

Okay, so what can I do now? What would you do?

I decided to do a blastp search, since NCBI has this cool new feature where protein sequences, with a corresponding structure, are linked to the structure record in the MMDB.

So I used blastp to search the human protein database with a sequence of 15 glutamines.

What did I find?

In 2005, my search gave this result: No significant similarity found.

This year, I got results.

i-9bbf8838590c212488c2bf5730e81f4d-bast_15_glu.gif

But they're strange.

I have some perfect matches to things that I've never heard of like Vanderwaltozyma polyspora, Brugia malayi, and some things that I have heard of like Anopheles gambiae (some type of mosquito) and Chlamydomonas.

Where are the human proteins?

Right. I said these experiments are hard to repeat.

See ya next time. We'll try to muddle through the mystery and get back on track with the story.

technorati tags: , ,

More like this

How do you go about researching a genetic disease? This multi-part series explores how digital resources can be used to learn about Huntingtin's disease. Reposted and updated from the original DigitalBio. A bit of background Alice's Restaurant is a movie with an unforgettable song that mostly…
Previous entries: Part 1 - Introduction Part 2 - The Backstory Part 3 - Obtaining Sequences Part 4 - Obtaining More Sequences This post is part of a series exploring the evolution of a duplicated gene in the genus Drosophila. Links to the previous posts are above. Part 5 of this series (…
Two protein structures from an avian influenza virus are shown below. One form of the protein makes influenza virus resistant to Oseltamivir (Tamiflu®) Don't worry, these proteins aren't from H5N1, but they do come from a related influenza virus that also infects birds. technorati tags: molecular…
Part II. What do mumps proteins do? And how do we find out? This is the second in a five part series on an unexpected discovery of a paramyxovirus in mosquitoes, and a general method for finding interesting things. I. The back story from the genome record II. What do the mumps proteins do? And…