What if Garrison Keillor did bionformatics?

What happens when a biologist tries to talk with the IT group? Needless to say, they don't speak the same language.

Reposted from the archives.

Imagine this. You've been sequencing DNA for a few years now, perhaps ESTs, or something else, and storing files on your local network. Your system administrator makes backup files for you and all is well.

But one day you learn about interesting results that other people are getting by assembling sequence data themselves and you decide to try it, too.

Watch out! You are about to descend into bioinformatics hell.

Soon you learn that the assembly program only runs on computer systems that include "NIX" has part of the name. (Doesn't nix mean bad?) And you have to set the environment and use the bash shell and tell the computer to "make" or compile the program. (What?? Bash the shell? Won't it break? How do I tell the computer to "make" a program? Isn't that what programmers do?)

Plus, this strange assembly program, named after some fancy kind of coffee, has other complicated requirements and demands that all files entering the system be given an incomprehensible name to comply with sequencing procedures from the last decade.

Suddenly files are appearing everywhere!

You beg someone to do something with the computer and rename your files. Meanwhile, the back-up files with the original names, that were referenced in the experimental procedure and linked to experimental data, languish on the system, forgotten. A few months later, no one knows why those files are there. Your new files with their new names are backed up. More new files enter the system and quickly acquire two sets of names. More months pass, the server is loaded down with files, and no one knows why.

Your department head, frustrated with the slow network, hires an expert to analyze the system and determine if you need a Linux cluster. Oops, it turns out that many files contain the same information. Naturally, the older files with the names that you had written down somewhere are deleted. Now all the information connecting the files to the original experiments is lost.

Your lab director says to quit fooling around and hires someone to move all of your data into a database. But, the next few weeks find you ranting at your computer. Why? You don't know how to use SQL and you have important research to do, dammit! The last thing you want to do is fight with your computer to get it to tell you something you don't already know. And, you start to wonder, what exactly is in those tables? And why tables?

And how are you going to get your data back and do something useful with it?

Perhaps, you decide, it's time to hire a programmer.

The first person you interview is very enthusiastic. He can program in more languages than a UN interpreter can speak. And the languages have strange exotic names that sound like beautiful women (Perl and Ruby), snakes (Python), and Amazonian tribes (YAML). Confused already, you ask what he's done. It turns out that he's written games and designed something sticky or gooey for a web (you think) and knows lots about cold fusion and soap.

Okay, you think. I'm fine with people using soap. In fact, I wouldn't have guessed that he uses it, but really I prefer it.

Still, you're a little worried about that gooey (GUI) stuff around your computer and puzzled by the remark about cold fusion (especially since it was a fraud), but you smile and nod, not wanting to betray your ignorance.

Time to switch to your domain.

"Do you know anything about biology?" you ask.

The candidate smiles. Oh yes! He took biology in high school and read "Genome", too!

So, you hire him, pay him twice the salary of any of post-doc, and have him start with something simple. You ask him to write a program to translate both DNA strands into open reading frames. You're met with a blank stare. Is there a problem, you ask?

What's an open reading frame? is the reply.

To quote Garrison Keillor, "Wouldn't this be a great time for a slice of rhubarb pie?"

More like this

You definitely need to ask those kind of question during the interview process. If the candidate isn't interested enough to read up on the basic subjects when it comes to computational biology then they probably aren't going to be very good. I got hired to re-design a database for a small company doing agricultural research in my area. I got the job because I was the only one who had taken time to research how soil samples were taken, and what kind of data was gathered. It drives me crazy when programmer's do that to their clients. Whenever I work for somebody if I don't know, I find out. And I always try to be as informed as possible before the interview.

Anyway, my rant aside you should always make sure you programmer is knowledgeable in the field you are making them work. Next time you interview a someone, grill them on the details. Ask about blast queries, open reading frames, and etc. Even if nobody gets them all right, you will at least know who has the most knowledge of your area.

That's funny, and true I am sure. I am a programmer and I just spent a week helping a physicist struggle with the built-in software that came with a new optical mass spectrometer and a SEM. It is pretty awful.

However, your comment about cold fusion is completely wrong. The cold fusion effect was replicated at high signal to noise ratios by researchers at the Naval Air Warfare Center Weapons Division at China Lake, Amoco, SRI, Texas A&M, Los Alamos, Mitsubishi Res. Center, BARC Bombay, Tsinghua U. and over a hundred other world-class laboratories. Hundreds of positive, peer-reviewed papers on cold fusion were subsequently published in long-established, mainstream journals. The authors include two Nobel laureates, the retired Chairman of the Indian AEC, the commissioner of the French AEC, and many other distinguished scientists.

You can find over 500 full text reprints of scientific papers from all of the institutions listed above, and many others, at our web site, http://lenr-canr.org/

- Jed Rothwell
Librarian
LENR-CANR.org

The comment about filenames is completely on target. Embedding information in filename conventions might seem convenient, but it leads to great inflexibility. Unfortunately, even current applications often enforce strong naming conventions on their data, rather than the more flexible approach of configuration files.

One tip: next time instead of copying-and-renaming the files consider using symbolic links. These are in effect aliases to your files, and keep down the number of ugly duplicates to keep track of. It isn't a foolproof solution, but it can save some headaches.

Did your candidate mention that Ruby is often On Rails (will someone save her?) That Python programmers IDLE most of the time? That Perl programmers use huge tables of hash?

If you find this all confusing, just BLAST your way over to the local ice cream shop and have a nice cool phrap.

This is freaking hilarious and brings back too many memories. I completely agree with Keith on the filename bit, having been guilty of exactly that in the past.

Would phred give you any company on that phrap?

It's really discouraging that bioinformatics software takes one of two forms: really, really inadequate GUIs for copying and pasting sequences, and commandline programs that do not abide by the conventions in unix and are thereful an incredible pain to use. Of course, doing it right, as always, requires more work than just hacking things up really quickly.

Then there's BioBike which is an utterly fascinating idea.

Fred: I would guess that a lot of the commercial bioinformatics software has progressed beyond that point, ours certainly has.

But it's not free, so I would guess that some people in academics don't know about it.

We have one free program, with a really nice GUI, that we freely distribute, see FinchTV. You can use it to view chromatograms and edit the sequences.

Sandra, Finch does look like a nice program (unfortunately I am not at my own computer and cannot try it right now). Yet in the molecular biology lab I work in, everyone cranks away with a combination of the PubMed webtools and GeneWiz or MacVector, all of which are extremely crude.

And the bioinformaticians at my institution crank away with the crude command line tools written by people who didn't understand unix.

I remember MacVector! I didn't know it was still around. I've never heard of GeneWiz, but there are lots of programs out there.

I don't really think the UNIX command line tools are so bad. I like UNIX.

But, I don't think UNIX is always necessary. If I have a choice between stringing together a bunch of UNIX commands or doing my work with a program that has a well-designed graphical interface (GUI) that's fun to use, I'll take the GUI stuff any day.