Computational Work without Experimental Validation

As I've pointed out before, the big division in biology is currently between computational groups and wet labs. Michael White agrees with me. Here's his take on the current state of computational work in cell biology:

The result is that you get different groups coming up with all sorts of new analyses of the same genomic data . . . but never really making any serious progress towards improving our understanding of the biological process in question. The worst part is that, over time, the researchers doing this kind of work start talking as if we are making progress in our understanding, even though we haven't really tested that understanding. You start getting an echo chamber resonating with these guys who are citing each other for validation more than they are citing the people actually study the relevant genes in the lab.

Michael goes on to say that wet lab people, like him, ignore the echo chamber. Instead of using their complementary tools to improve our knowledge, the two groups are unnecessarily diverging from each other. This can be remedied by labs that employ both computational and experimental techniques (yes, there are a few), or if computational labs and wet labs work together to solve problems.

Contributing to the echo chamber is the fact that many journals are purely computational or entirely experimental. One journal that reports research from both sides of the divide is Nucleic Acids Research (and they're open access!). Check out what they require from computational articles:

Manuscripts will be considered only if they describe new algorithms that are a substantial improvement over current applications and have direct biological relevance. The performance of such algorithms must be compared with current methods and, unless special circumstances prevail, predictions must be experimentally verified. The sensitivity and selectivity of predictions must be indicated. Small improvements or modifications of existing algorithms will not be considered. Manuscripts must be written so as to be understandable to biologists. The extensive use of equations should be avoided in the main text and any heavy mathematics should be presented as supplementary material. All source code must be freely available upon request.

Setting a standard of experimental verification for a computational article sounds exactly like what I was advocating. As you can guess, I support this standard.

Tags

More like this

Right on! I agree with you about the journals. But there's more at stake than just what's being published. See my comments over at Sandwalk.

By Larry Moran (not verified) on 14 Nov 2006 #permalink

I have to agree with you, RPM. I would love to explore functional properties of loci showing evidence of selection (through various statistical & computational methods) in humans and other organisms.

This is an old argument. The first gene function to be predicted computationally (around 1982?), caused a big uproar because other people felt that the prediction wasn't enough for publication and that the function had to be proven as well. (I think they were also a bit mad because they felt that a computational prediction without biochemistry was a form of "cheating")

Experimental verification is certainly the gold standard and worth striving for. Achieving it, though, isn't always an easy thing - you can only test the validity of the algorithm if you can get ahold of real data. Sometimes the data don't exist and sometimes the people who have the data won't give it to you.

In the ideal world, collaborations would be much easier and biologists and computational scientists could work better together to devise good tests for new algorithms.

In the real world, though, many algorithms are devised long before they can ever be tested experimentally. Many of the phylogeny algorithms that you enjoy, i.e. maximum parsimony, maximum likelihood, neighbor joining, were developed and used for some years before the predictions they made were ever compared to real data from time course experiments. I don't think that there's anything wrong with that - but it's good to know which programs have been tested and how, which haven't been tested, and sometimes that distinction is difficult to make.