When computers first entered the mainstream, it was common to hear them getting blamed for everything. Did you miss a bank statement? that darned computer! Miss a phone call? – again the computer!
The latest issue of Science had a new twist on this old story. Now, instead of a researcher failing to take responsibility for doing sloppy science, we’re back to blaming the computer. Never mind that the lab was using home made software that they “inherited from someone” (and apparently didn’t test it) the five retracted papers were the fault of the software! Not the scientists who forgot to include positive controls!
To quote the Science news article:
In September, Swiss researchers published a paper in Nature that cast serious doubt on a protein structure Chang’s group had described in a 2001 Science paper. When he investigated, Chang was horrified to discover that a homemade data-analysis program had flipped two columns of data, inverting the electron-density map from which his team had derived the final protein structure. Unfortunately, his group had used the program to analyze data for other proteins. As a result, on page 1875, Chang and his colleagues retract three Science papers and report that two papers in other journals also contain erroneous structures.
The most influential of Chang’s retracted publications, other researchers say, was the 2001 Science paper, which described the structure of a protein called MsbA, isolated from the bacterium Escherichia coli. MsbA belongs to a huge and ancient family of molecules that use energy from adenosine triphosphate to transport molecules across cell membranes. These so-called ABC transporters perform many essential biological duties and are of great clinical interest because of their roles in drug resistance. Some pump antibiotics out of bacterial cells, for example; others clear chemotherapy drugs from cancer cells. Chang’s MsbA structure was the first molecular portrait of an entire ABC transporter, and many researchers saw it as a major contribution toward figuring out how these crucial proteins do their jobs. That paper alone has been cited by 364 publications, according to Google Scholar.
Ironically, another former postdoc in Rees’s lab, Kaspar Locher, exposed the mistake. In the 14 September issue of Nature, Locher, now at the Swiss Federal Institute of Technology in Zurich, described the structure of an ABC transporter called Sav1866 from Staphylococcus aureus. The structure was dramatically–and unexpectedly–different from that of MsbA. After pulling up Sav1866 and Chang’s MsbA from S. typhimurium on a computer screen, Locher says he realized in minutes that the MsbA structure was inverted. Interpreting the “hand” of a molecule is always a challenge for crystallographers, Locher notes, and many mistakes can lead to an incorrect mirror-image structure. Getting the wrong hand is “in the category of monumental blunders,” Locher says.
On reading the Nature paper, Chang quickly traced the mix-up back to the analysis program, which he says he inherited from another lab.
Chang’s publication record is impressive but I’m stunned that 364 publications cited his 2001 paper and no one tested the software with data sets from positive controls!
Computer programs aren’t magic. They are written by human beings and contain all the same kinds of errors in logic and mistakes that humans make, plus some other interesting problems that are unique to computers – like running out of memory, and kernal panics,
We just can’t just throw all of our scientific training out of the window because there’s a computer involved. If you’re going to use software, you have to use controls and good experimental design, just like when you’re doing a wet-bench type of experiment.
Back when I was a young, impressionable graduate student, I never ceased to be amazed by the one female post-doc in our lab. I think she left academic science once she started having children, like so many of the other women I knew, but she was an incredibly careful researcher, and famous, in our lab, for her obsession with controls. In fact, I don’t think she was ever satisfied unless an experiment included more controls than experimental samples. Where are these people in computational biology and bioinformatics?
Controls are a cornerstone of biological research
To those of you who don’t do wet-bench biology, controls are fundamental to this type of work. Since we can’t possibly know all or predict of the variables, we do the next best thing. We use controls. Since many procedures have multiple steps, we often include both positive and negative controls.
A positive control is a sample that should exhibit predictable behavior. Often, it’s a sample that we’ve used before. For example, in a PCR experiment, a positive control would be a sample that has worked before and produced a DNA fragment of a specific size. In a DNA sequencing experiment, it would be a sample that we’ve sequenced before, with success. We use positive controls to help troubleshoot experiments and identify points of failure. If a positive control fails to behave as we expect, we know that there is a problem with the entire experiment.
A negative control is a sample that is identical to our experimental sample, with the exception that it’s missing the thing we want to test. In PCR, a negative control might be missing the template DNA. If we saw DNA appear in the negative control, after the PCR, we would suspect a problem with contamination. If we were testing the effect of an antibiotic, the negative control sample would be a bacterial culture, grown to the same density as the test culture, in the same media, and under identical conditions, but without the antibiotic.
In commerical software testing, and in bioinformatics and/or computational biology work, we use control samples as well. I use data sets that have a predictable behavior, or include positive controls – that is, data that I know should work a certain way – whenever I try a new program or method. I do these kinds of things partly because it’s part of my job to find bugs and partly so that I can be confident that the algorithms or programs are behaving the way that they’re supposed to.
Somehow I think, we have to impress on young researchers that testing software is at least as important as being able to use it.
Greg Miller 2006 “A Scientist’s Nightmare: Software Problem Leads to Five Retractions” Science 314:1856 – 1857.