Open science, peer review and the flu

We had a great discussion in the comments yesterday after I published my NJ trees from some of the flu sequences.

If I list all the wonderful pieces of advice that readers shared, I wouldn't have any time to do the searches, but there are a few that I want to mention before getting down to work and posting my BLAST results.

Here were some of the great suggestions and pieces of advice;

1. Do a BLAST search. Right! I can't believe I didn't do that first thing, I think the trees I got surprised me so much all sense flew out of my brain.

2. Show us the multiple alignments. Okay. I'll post the files soon.

3. Make Maximum likelihood trees. David Koppstein has done that. My only complaint is that I'm too near-sighted to identify the sequences in his trees, so while I can see that the CA sequences are clustering together, I can't really interpret the results.

4. Use FASTA. Hmmm. I don't know why this would be helpful. FASTA is more sensitive than BLAST, but if I want to find sequences that are over 95% identical, I don't see where I need more sensitivity. It would be nice to have the reasoning explained, 'cause I don't get it.

5. Use the nucleotide sequences. YES! YES!

6. Mike the Mad Biologist and Mike Dunford also suggested using LDHat. I'll have to do some searching for that one, I guess.

7. Gwen Aimes, and Victor Hanson-Smith, and Brian Foley from the Los Alamos National Lab (especially Brian Foley!) have given some great advice and shared their expertise on comparing viral sequences. And irayork (?) has had some good suggestions, too.

It's been really hard the past couple of days to focus on my other work, like grading homework and teaching my class, since I've really, really, wanted to get back to my computer and do more analyses. But now, I'm calmer and I can take a deep breath and look at the flu sequences in a more methodical and systematic fashion.

One of the things that I never liked about academic science was all the secrecy. It seemed to me that the people around me felt that you should keep every thing secret and not tell anyone anything until you were absolutely sure you were right. The trouble is, that philosophy makes people really afraid to ever be wrong. And, so many times, we are wrong. Or maybe just not 100% right.

So, since I don't have a lab or tenure worries, I thought: why not do science in the open? I've heard people suggest that original research shouldn't be published in blogs, that it should only published as peer-reviewed work. I don't buy that suggestion.

Crazy as it seems, I think this preliminary activity has gotten far better "peer-review" than some of the papers I submitted to official publications. I wish all peer review was as helpful and transparent.

I'll post more data in a bit, but first I want to say "thanks!"

More like this

No more delays! BLAST away! Time to blast. Let's see what it means for sequences to be similar.  First, we'll plan our experiment.  When I think about digital biology experiments, I organize the steps in the following way:             A.  Defining the question B.  Making the data sets…
After my experience with using (or, as at least one of my readers has suggested, misusing) my blog to get an article to which my university does not provide online access, it occurred to me just how much our means of accessing the scientific literature has changed in the last decade and just how…
The bioinformatics classes that I teach use web services and web sites as much as possible, but I still find that it's helpful to have programs on our classroom computers. Here is a list of my favorite desktop programs for those of you who might want to add some bioinformatics activities to your…
In which we identify unknown human proteins. Yesterday, I wrote about using the BLOSUM 62 matrix to calculate a score for matches between two proteins. Those scores give us a good start on understanding how blastp determines whether two sequences are matching by chance or because they're more…

Hi,
Why aren't there any mexican strain sequences yet in the NCBI Database (http://www.ncbi.nlm.nih.gov/genomes/FLU/SwineFlu.html)? I just looked it up, and they already have genes from a lot of places, but not from Mexico. When can we expect to have some?
Best,
AnaG

By Ana Gerschenfeld (not verified) on 01 May 2009 #permalink

Sandy,
Thank you (and other bloggers) for your sites. You are a great resource for my bioinformatics course.

Talk about up-to-date information! There's no way we can get this type of information from a textbook!

By Ying-Tsu Loh (not verified) on 01 May 2009 #permalink

Bill: I did like the article, thanks for the link!

Ana: I don't know why the Mexican sequences aren't there. I would like to see them, too.

Ying-Tsu - thanks! I plan to post some more when I can fit it in. I hope I get to see you in Berkeley! We'll get to work with some Next Gen sequence data in my workshop.