Do biologists need to learn programming?

I get asked this question often enough and now that's it's come up again, it seems that I might as well answer it once and for all and get it over with.

First, I want to change the question. Of course they don't need to learn programming.

A better question is would it benefit biologists to learn programming?

My flip half-way serious answer is yes, if they want to change careers.

You see, programming is really seductive when you've been a wet-bench biologist. It's like heroin. (At least I suspect it is, I know about programming, I don't have any experience with the other beyond reading William Burroughs in college.) Quick results, instant gratification, release that dopamine baby, I got my answer!

I know that might sound strange, but biology experiments can take months to complete and they're not like sequencing DNA where it's really more manufacturing than research. With DNA sequencing, you always get data. With other kinds of experiments, you can spend months on things that turn out to be a complete waste of time. Many experiments just don't work and even worse, you never understand why, until maybe a year or two later when someone else gets it to work and all of a sudden you know what the problem was, but now of course, it's too late. Sometimes it's a lot easier doing a bit of programming and working with other people's hard won data than doing lab work yourself.

I once spent a month trying to kill some BHK cells.

So, I know.

Anyway, I'm not a programmer, and I've never taken any classes in programming, but when I moved to a software company I thought I should learn some Perl, since it seemed like the thing to do. I spent lots of my free time learning Perl on my own and writing scripts and pipelines so that I could analyze weblog data for my NSF grant and track usage. This was more accounting than science really, but I'm conscientious and thought it would be good to know.

Then programs like Webalyzer and Statcounter, and Google Analytics came along and made that project kind of moot.

I'm convinced that much of bioinformatics will do the same thing and that a lot of programming activities are really done better by real programmers.

So, why do I say that biologists should learn something about programming?

For a few reasons.

First, I think learning something about programming is good because it helps demystify software. Computer programs are written by people. It's good to know that and it's good to have an appreciation for the kinds of mistakes that get made and that you should test for them. Yes, you need to do controls even when you're working with computers.

Second, programmers speak a different language and it's possible that you may end up working with some. In their world a regular expression is not "que pasa! or "garden variety" or "howya doin'?" Programmers talk about associating things with objects in databases (I even heard this from the cashier at Safeway yesterday as part an explanation for why my phone number won't work) and transactional models and joins and floating points in different ways than most people. Well, perhaps normal people don't talk about floating points. But, you have to face it. Software engineers are not likely to learn your language so you might as well try and understand a bit of theirs.

Third, it's fun.

Fourth, right now, I'm learning SQL and having lots of fun asking new kinds of questions from databases and using computer power to do more science. I enjoy it. This is the kind of programming that makes sense for biologists to do. Maybe I'll even post some results.

More like this

I agree completely about the heroin. I took a summer course in Pascal at the end of my first year in a Biology PhD program, and then had to make a conscious decision not to switch departments.

The adrenaline rush isn't the only thing that makes it addictive. Like gambling (also addictive) there's the uncertainly of the immediate payoff: "Just let me tweak this line.... Will it compile? Noooo! Let me try this... Will it compile? YES!"

I agree completely. I learned just enough visual basic to enhance excel spreadsheets and MS access forms. For a VERY brief time I toyed with the idea of becoming a programmer and spending my time making really neat visualizing programs. In the long run though, I like doing science a lot more than I enjoy programming

As a professional software developer I would argue that _everybody_ should learn the basics of biology ! :-)
Having at least some background on genetics, evolution, etc... seems too me like something very valuable.

By Dimitry D'hondt (not verified) on 22 Aug 2007 #permalink

Yes. C language and Linux SO.

Here's one reason you might resist that pull: When you get professional about it, eventually you need to design and implement something too large to be handled by one person alone (at least in a reasonable timeframe). About that time they promote you (or at least promote your responsibilities), put a team under you, and that instant gratification becomes a thing of the past. You're back to waiting months to try to coax the solution you envision into happening. And when you get it, half the time the customers who made very exacting demands you have fulfilled exactly tell you that's not really what they wanted after all.

Basically, you're a biologist again, only you're manipulating much more complex organisms, and sterilizing the petri dish and starting over is considered a crime in most regions.

I'm a chemist, and I use LabVIEW and Java and C/C++ all the time to collect and process data. I agree that programming is addictive. Using a computer to run an experiment, or control an instrument, gives you a feeling of power that is probably unwarranted, all things considered. And once you get sort of good at it, it is difficult to not spend a week writing software to take data that you could take in 15 minutes by hand.

By Dave Eaton (not verified) on 23 Aug 2007 #permalink

In the lab I worked, we used homemade software scripts within a commercial data acquisition package, and in order to do that you had to know programming (knowing C already was best). Some of us in the lab didn't know how to do this, and wound up relying on those who did, and that's never a good feeling or strategy in science to have to rely on someone just to look at the data you collected. Knowing programming, you can customize your data acquisition and analysis software in the way you want, and that's a great boon to your own work.

When a Computer Science person suggests that somebody "needs to learn programming", they may just be expressing their vicarious pain at seeing such nice people approach computing problems in really ugly and inefficient and horrible ways.

On the other hand, programmers tend to feel happy watching people using spreadsheets, because it means they are easily and naturally doing the Right Things:

- sorting their data (anybody ever sort their arrays in Fortran/C? Didn't think so)
- working with multiple kinds of data in parallel, and not particularly worrying about it.
- only looking at relevant data, not crawling repeatedly through data that has been discovered to be unnecessary.
- development focussed on data, not code

Oh, and note none of the above feature in those 'ing Numerical Recipes books?

:-)

Ronan

By Ronan Cunniffe (not verified) on 24 Aug 2007 #permalink

I am a biologist that also does programming. I totally agree about programming being addictive. I do it as a hobby, myself; I have no aspirations as a professional programmer.

(That having been said, I am taking a course in bioinformatics algorithms this next term, with the express intent of understanding what my tools do. I cannot stress enough the importance of this. One need not program everything one uses from scratch, or even do much programming at all, but knowing what the programs that one uses do is often critical to getting good results.)

However, I must take exception to the idea that sequencing genes is a routine thing, from which one can always expect instant results. I have spent the last eight months trying to sequence the same gene from about half a dozen organisms. And no, while I have been sequencing all the while, I still do not have any data. It is not just me, either; I inherited the project from two post-docs who had each in turn given up on it (the first after having published -- twice -- on an earlier iteration of the same project). I am still at it because it is a challenge, and every incremental success is a victory.

There are other challenges as well. An earlier project involved teasing out small numbers of individual organisms from a mixed culture for sequencing. Believe me, there is an incredible thrill in getting a sequence from only four cells. Another involved screening through countless sequences, trying to find the one that was not bacterial. I never did succeed with that one (the project was terminated for other reasons before I got results), but I learned a lot in trying.

Opisthokont: It seems to me that if you've really:

spent the last eight months trying to sequence the same gene from about half a dozen organisms. And no, while I have been sequencing all the while, I still do not have any data..

there's something wrong. I can't say where from such a limited description, but if you like, you can send more info and maybe we can help figure it out.