Bug hunting is a BLAST

Last week I found a bug in the new NCBI BLAST interface.

Of course, I reported it to the NCBI help desk so it will probably get fixed sometime soon. But it occurred to me, especially after seeing people joke about whether computer science is really a science or not, that it might surprise people to learn how much of the scientific method goes into testing software and doing digital biology.

tags: , , ,

What happens when the scientific method isn't used?

I wrote earlier in January about applying scientific principles from the wet bench world to world of computer work and how appalled I was when it seemed that some crystallographers had left out some of the basic principles of doing good science - i.e. positive controls. The retraction of five crystallography papers left me wondering if, perhaps, the crystallographers had either missed learning the scientific method - or if they believed somehow that it didn't apply to computer work.

(I instructed the computer to run these calculations, so of course the computer is running the calculations correctly!)

Of course the scientific method does apply to computer experiments just as much as it applies to experiments at the wet bench. Software testing is a really good example.

I don't do very much testing, but every now and then, we all get called in to help find bugs and get them fixed before a new version of our software suite gets released to the public. During these times, I've found that having a scientific background and understanding the scientific method is invaluable.

I've even been able to apply the scientific method to testing and identifying bugs in other people's software, as this post will describe. Certainly, I'd rather not find bugs, but it is reassuring to know why programs are behaving a certain way.


Bug hunting is a BLAST

Earlier this summer, I had a strange experience in our Chautauqua course. We were using blastn to test some primer sequences and try to figure out if the primers would detect the correct sequences in the database. We had some really weird results and I couldn't figure out what was happening - at least not during our course. The college professors taking our course kept getting different results than I did, even though (so we thought) all the parameters were the same. After the course, I decided that the strange results were probably due to the new interface, and that I could solve the problem by logging out of my account before doing a search.

That was a nice hypothesis, but it was wrong. The explanation wasn't that simple.

Naturally, I found this out by accident while giving a BLAST workshop for beginners last week at the Fralin Biotechnology conference.

(I make all my best discoveries at the front of a classroom).

I decided to show the teachers attending the workshop the parameters that were getting used in a blast nucleotide search. Happily, we all clicked the Algorithm parameters link at the bottom of the NCBI BLAST web form.

Only we found a surprise.

Their web forms showed this:

i-57425141b1dd00659637e1e0113a2c62-ff.gif

While mine showed this:

i-997c17e05486908933b7e36cfcf94177-safari.gif

It was puzzling to say the least.

We also looked at our BLAST results to see which paramters were used by BLAST. Where the parameters shown in the form, the same as the parameters used by the program?

They were.

In science terms, we recognized an unexpected phenomenon and we observed it more than once.

But what was going on?

Since I use the scientific method, the next steps were to propose an explanation and to see if I could repeat the phenomenon and predict when it would occur.

At the workshop, everyone in the room was using a Windows computer except for me. So our first hypothesis was that the strange result occurred because I was using a Mac.

And sure enough, I could repeat the behavior. Macs and PCs showed different parameters.

But there was something else.

Remember, you can never compare experiments where you've changed multiple variables. Here was a good example. It wasn't just the platforms that were different. I was using Safari on the Mac and the teachers were using either Firefox or IE on the Windows computers.

The next step was to try and reproduce the experiment, but fewer variables. Part of the scientific method also involves testing alternative explanations for phenomena. I decided to investigate using Safari and Firefox, side by side, on my Mac.

That was the answer. When I used Safari to access NCBI BLAST and clicked the the radio button in front of "More dissimilar sequences," nothing happened.

When I used FireFox, two things were different. First, in Firefox, selecting the NR database automatically caused the Nucleotide collection database to be selected. Second, when I clicked the radio button in front of "More dissimilar sequences," the parameters for the Match/Mismatch scores changed.

Good science is always reproducible. These results were reproducible, too.

I still don't know which of the two behaviors is correct. But, now at least I know that there's a problem.

And I'm reminded, as usual, that you should not take anything for granted.

Go ahead. Click that Algorithm parameters link at the bottom of the BLAST form. Make sure that you know what experimental conditions (i.e. parameters) you're using when you run BLAST.

Those values are just as important as the conditions that you use for doing PCR.

POSTSCRIPT: and just like so many things in science, just when you think you know the answer, you can find out that there were a few more details that you missed. I realized tonight, that the problem only occurs with Safari 3, and not with Safari 2. It's the penalty for trying beta-version software, I guess.

More like this

By now, many of you have probably seen the the new BLAST web interface at the NCBI. There are many good things that I can say about it, but there are a few others that caught me by surprise during my last couple of classes. tags: blast, BLAST tutorial, science education Because of these changes,…
BLAST is a collection of programs that are used to compare sequences (DNA, RNA, or protein) to larger collections of sequences that are stored in databases. I've used BLAST as a teaching tool for many years, partly because it's become a standard tool for biological work and partly because it's…
No more delays! BLAST away! Time to blast. Let's see what it means for sequences to be similar.  First, we'll plan our experiment.  When I think about digital biology experiments, I organize the steps in the following way:             A.  Defining the question B.  Making the data sets…
Three (or more) operating systems times three (or more) versions of software with bugs unique to one or systems (that I don't have) means too many systems for me to manage teaching. Thank the FSM they're not using Linux, too. (Let me see that would be Ubuntu Linux, RedHat Linux, Debian Linux,…