Digital biologists, bioinformaticists, and computational biologists: more thoughts on the question of names

"What's in a name? that which we call a rose
By any other name would smell as sweet"

- Juliet, from Romeo and Juliet by William Shakespeare

I realized from the comments on my previous post and from Mike's post that more explanations were in order.

It seems we have two topics - why do we need a new name at all? and why the current names (biologist, computational biologist, bioinformatician, etc.) don't work. What really distinguishes a digital biologist from a regular, garden variety biologist? Why isn't a digital biologist a computational biologist?

So, I brought along two "show and tell" items today, a picture and job posting, to help me explain.

First, the job posting. I saw this today on LinkedIn and thought it fit rather well.

Let's parse the job post

Computational Biologist Washington Univ.
The Genome Analysis and Informatics Technology (GAIT) Center at Washington University School of Medicine in St. Louis is seeking Computational Biologists. ... The GAIT center seeks to hire individuals to analyze DNA sequencing data in collaboration with biologists and physicians [the emphasis is mine].

Note - the employer makes a distinction between computational biologists and biologists. He expects the computational biologist to collaborate with biologists. He does not expect the computational biologist to be one.

The ideal candidate has a degree in computer science, biomedical engineering, or biology, ...

As far as knowledge goes, the candidate most likely has a degree in computer science or engineering. Biologists aren't excluded since biology is listed, too. Last.

...is fluent in one or more of the following languages: C, C++, Java, Python, or Ruby,

In other words, the candidate needs to be able to program but doesn't seem to need any special abilities related to biology.

Don't they expect a computational biologist to know any biology?

Well, towards the end, we get this:

...and has taken at least one undergraduate level Genetics class.

Okay - so all you need to have to be a computational biologist is a computer science degree and one genetics class.

Maybe a graph will help

Now, for our second show and tell item, I drew a Venn diagram.

i-bd78a79396aa4bd9ffb083cb90ef24fd-what is a digital biologist_smaller.png
Fig. 1. The circles in my graph merely show where activities overlap. They are NOT drawn to scale.

It occurred to me, from our discussion the other day, that another important difference between digital biologists and regular biologists is the source of the biological data. Biologists are data producers and digital biologists are mostly data consumers.

Artificial data, oh my!

I'm specifying that biologists, digital and otherwise, work with biological data because computational biologists, statisticians, and bioinformaticists sometimes work with artificial data. I know, if you're a biologist, the idea of artificial data is really weird and a bit suspicious. It certainly surprised me to learn that people made artificial data. But mathematical biologists and statisticians like these sorts of things.They find it helpful to have data sets that really are random, like a random collection of DNA sequences, or a set that follows a Poisson curve, or normal distribution. The most efficient way to get these data sets is to make them.

Anyway, unlike mathematical biologists and statisticians, biologists and digital biologists are more likely to use data that come from wet lab experiments.

Hey buddy, where'd you get that data?

The next factor is "where did the data come from?" I'm well aware that many biologists outsource some of their wet lab data collection to core laboratories. But for the most part, biologists get data from wet lab experiments or from activities where they go out and collect samples.

Digital biologists, on the other hand, get most, if not all, of their data from others: either public databases or collaborators.

I think the data source is an important difference between us. I often have to explain to school groups, who want to tour Geospiza, that the company doesn't have a lab. Our work environment looks more like the "The Office" with blue cubicles, not a student's vision of a high tech science lab.

And this brings me to the last point - why do we need a name at all?

Having a name is like wearing a name tag. You don't wear a name tag for yourself, you wear it to help others. Just like we have mathematical biologists, computational biologists, evolutionary biologists, and so on, I think having a name help clarify what makes us different.

Digital biologists need a name because there are students and prospective scientists who don't know about this things we do. They don't know that we have all this really interesting data, already out there, in public databases that we can study. They don't know that we can study the data with existing software tools. They don't know that it's important for us to think of the next generation and think how we get them interested in learning how to find data, evaluate data, and use it to understand biology.

More like this

I'm still rather puzzled by this way of thinking about the labels. I agree - people think about the experimental work as "doing the biology", which can be quite irrelevant.

Consider a team in which person A does benchwork, person B writes software for interpreting the results, and then person A does the thinking, interpretation and reporting. Clearly person A is the biologist, and person B is contributing only a technological component (software).

Now consider a team in which person A does benchwork, person B writes software for interpreting the results, and then person B does the thinking, interpretation and reporting. Clearly, now person A is "a technician" in the lab. What is person B, if not *the* biologist? It would be ludicrous to say the team has a bench technician and an informaticist, and no biologist, if the end result is a publication advancing our knowledge of Biology..?

In other words, being a biologist is a function of what you do, not how you do it. Some people develop wetlab assays and use them to learn something about a biological system. Others build software tools to learn something about a biological system. Neither is useful without the ability to interpret the results - and that high level interpretation is what defines the biologist.

A further complication is that within the realm of bioinformatics, you often have people who are specialized towards the database and/or web programming sides of things. These people may spend their entire time sat at a computer but only ever working with biological data. So are these people bioinformaticians or web developers or database administrators?

Some people prefer one job title over another in order for their CV/resume to have a certain look. I have known others who simply think of themselves more in one category than another.

Good article!

Thanks Grant!

I liked your post. The only difference I would add is between using software and writing programs. Digital biologists are be more likely to use software or web services than to write their own.

Data analyst is a description that almost works, except that data analysts aren't always scientists and they don't always work on biology.

The only difference I would add is between using software and writing programs. Digital biologists are be more likely to use software or web services than to write their own.

I wrote that difference myself, it's the key different to me too, e.g.: "Digital biologist / bioinformatics analyst: Biologists who conduct bioinformatics analyses full-time, but donât develop software (I prefer the latter term)" Must have got you lost in my rambling on after that :-) I don't think I'm as bad a Orac but, then again, maybe I am...

Data analyst is a description that almost works, except that data analysts aren't always scientists and they don't always work on biology.

That's why I used bioinformatics analyst, not data analyst. It's a good point that data analyst aren't always scientist's though. (There are some biologists who think of some [even most] bioinformaticists as "not biologists" along similar thinking, which is a small part of the reason I prefer the 'computational biology' title: to remind people I am a biologist, and a biologist first at that.)

Media Centre has higher hardware requirements than other versions of Windows XP. MCE 2005 requires at least a 1.6 GHz (or equivalent) processor, DirectX 9 hardware-accelerated GPU (such as a recent ATI Radon or NVIDIA GeForce), and 256 MB of RAM. Some functionality, such as Media Centre Extender support, use of multiple tuners, or HDTV playback/recording carries higher system requirements.

By r4i software (not verified) on 09 Jan 2010 #permalink