So What Exactly Is Bioinformatics

In response to this question asked of us by our Seed Overlords (the readers), Steinn says that he would do bioinformatics. As a biologist, I'm really unclear as to what bioinformatics actually is, other than a word you put into your grants to get funding. Let me add that I'm the PI on a federally-funded bioinformatics grant, so I'm supposed to be an expert in this area.

As I see it, bioinformatics usually means one of three things:

  1. The generation of large (massive, actually) datasets.
  2. The analysis of large data sets, and development of computational tools to handle these large datasets.
  3. The storage of large datasets and the creation of accessible databases.

It seems to me that with bioinformatics we are moving away from the "-ology" oriented approaches (i.e., intellectual disciplines). While this sounds exciting, synergistic, and groundbreaking, in reality it can lead to a lot of bad science because the technology is driving the intellectual development (or lack thereof). One example is genomics where I think a lot of shoddy analyses have been performed. I swear to the Intelligent Designer if I see one more comparison of two to four genomes where every nonsynonymous change (i.e., a DNA change that alters amino acid structure) is assumed to be under positively selection, I'm going to get really Mad.

Another instance of this is the microarray thing. While it seems to have settled down a bit, a lot of claims were being made for microarrays that just weren't appropriate (reproducibility, for example). And I still haven't heard a good statistical treatment for how you deal with the multiple comparisons issue (if you're comparing the gene expression between two organisms, and you're dealing with 5000 genes, some differences are expected simply by random chance.) You can't Bonferroni correct this (p < 0.05/5000. Oh yeah, that's gonna work). And while I'm a big fan of log likelihood ratios, pulling a significant difference of two log units out of thin air is a little arbitrary.

While I don't want to overplay hypothesis driven science, because sometimes you need to do non-hypothesis driven science, but bioinformatics often seems to be large amounts of data and funding desperately seeking a hypothesis (although if anyone wants to throw some funding my way, I'll be more than happy to come up with a hypothesis for you...). So, I think bioinformatics is both a useful tool and a useless buzzword.

I'm just not sure it's an intellectual discipline.

More like this

I'm loath to call Scienceblogling Josh of Thoughts from Kansas out since he was one of my earliest linkers and readers, back when I was but a wee Mad Biologist; I probably wouldn't have the readership that I have, in part, were it not for Josh. But Josh wrote something about Eric Lander that…
So, Nature Reviews Genetics has an article, "Computational solutions to large-scale data management and analysis", which claims the following in the abstract (italics mine): Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing…
And do we want to? Maybe it could help formally include non-publishing activities in a scientist's evaluation? When I first read this PLoS Computational Biology article, "I Am Not a Scientist, I Am a Number", I was ready to beat down on its ass. After all, how seriously can you take something…
I had the good fortune on Thursday to hear a fascinating talk on deep transcriptome analysis by Chris Mason, Assistant Professor, at the Institute for Computational Biomedicine at Cornell University.  Several intriguing observations were presented during the talk.  I'll present the key points…

Hearty agreement - the people with the computational skills who know enough biology to be helpful are really great and deserve appropriate credit. But, like most technologies, attempts to make it a separate discipline result from a technique being oversold like as molecular biology was in the 1970s. Bioinformatics is a great tool, a great hypothesis-generator, but must be followed up by real biology - I'm thinking that Ira Pastan demonstrated a good example in the "early days" but I'm blanking on the ref right now.

It's a bit surprising to see how often science presented as new paradigms, even revolutions, in the reviews or books of scientific popularization, I mean for the general public, is seen by scientists as sometimes bad or cranky, exaggerated at best, science.

Mike, I think you're missing the difference between bioinformatics and computational biology. Bioinformaticians develop analytical tools to analyze biological data. Computational biologists use those tools to analyze data. Your beef is with the statistically incompetent computational folks, not the guys writing the programs. In fact, there is a general frustration amongst bioinformaticians that the users of their programs don't know what they hell they're doing. They just plug in their data and use the default parameters.

As for my opinion, I think microarrays and comparative genomics are great exploratory tools, but when it comes to testing hypothesis, you need other data (polymorphism, in situ hybridization, etc).

RPM,

I don't disagree, but I think there's a larger problem too that I didn't address in my post. From a funding perspective, I think a lot of the bioinformatics/computational biology funding has been very poorly spent. Because many of these projects are so expensive (particularly on the data collection and tool development sides), I think a lot of money has been very poorly spent.

There's a second issue which is that it's not entirely clear to me how 'bioinformatics' (into which computational biology is usually lumped) is that fundamentally different from, well, what we have been doing the last several decades. Yes, the data are more numerous and more complete, but are we really answering new questions?

I realize this a strawman, but from the '30,000 ft. view', I think 'bioinformatics' is being oversold because many of the biological justifications for the experiments is not that good.

"Bioinformatics" is simply the answer to the question, "so what the hell are you going to do with all that data?"

By mike schmidt (not verified) on 15 Jun 2006 #permalink

what the Mikes said.

The concept of BI is more a 'method' to digest reems of data (everyone is a 'bioinformatitician' every time they check a firgin seq.). But it can qualify in the experimental arena by virtue of allowing those vast reems to be compared side by side in just a matter of seconds. Stuff turns up. Protein structure coordinates keep piling in - from 100s to 1000s of 'em now, but nothin illuminates the predictions like a hearty signal bursting out of those plates-n-plaques and confocals. That is, all the comparisons won't matter without buttoning it down via the nuts-n-bolts in-vitro/in-vivo action. Gotta seperate the kernel from the noise and artifacts the old fashioned way.

There are things that might get lost in the mix - ie in the rush to optimize sequences (a strategy which has been hastened by informatics) we seem to have overlooked the 'pause' in translations. Are pauses just cumbersome speedbumps, or are they integral to alignment/splicing steps? Can we continue to optimize vast stretches of sequence without paying them heed? Not sure BI can do much besides provide statistical clues.

Words that get you money in grants are good, no?

My perspective on bioinfo is the application of modern mathematical methods and computational techniques to make sense of biological data.
That was the sense in which I meant to use the word; it was intended to be suitably vague and broad to imply "whatever I thought was fun and interesting".

Ah well, if it doesn't really exist then there is always quantum computing...

I disagree, RPM. Historically, computational biologist really is the term to describe those original computational scientists who developed the tools, like ClustalW or BLAST. Only later on, the term bioinformatican appeared because there are more and more data available in reference and there is need for more programming positions to deal with them. I consider computational biologist a species of scientist, while bioinformatican is closer to software engineers.