I’m a scientist and my research is supported by NIH, i.e., by American taxpayers. More importantly, the science I do is for anyone to use. I claim no proprietary rights. That’s what science is all about. We make our computer code publicly available, not just by request, but posted on the internet, and it is usable code: commented and documented. We ask the scientists in our program to do the same with the reagents they develop. Reagents are things like genetic probes or antibodies directed against specific targets mentioned in the articles they publish. There is an list of the reagents on the internet and instructions on how to get them if you are another researcher. Since giving you the link would also reveal the identity of one of the reveres, you’ll just have to trust me that this is true. It is. And I mention it because I am in full agreement with a piece in The Guardian [UK] by Darrell Ince, a professor of computing at The Open University in the UK (hat tip Slashdot):
One of the spinoffs from the emails and documents that were leaked from the Climate Research Unit at the University of East Anglia is the light that was shone on the role of program code in climate research. There is a particularly revealing set of “README” documents that were produced by a programmer at UEA apparently known as “Harry”. The documents indicate someone struggling with undocumented, baroque code and missing data – this, in something which forms part of one of the three major climate databases used by researchers throughout the world.
Many climate scientists have refused to publish their computer programs. I suggest that this is both unscientific behaviour and, equally importantly, ignores a major problem: that scientific software has got a poor reputation for error. (Darrell Ince, The Guardian)
I do not have a moment’s doubt about the basic science of climate change. There are too many convergent lines of evidence and some really convincing science to back it up. But like a lot of science — including a great deal of molecular biology and sophisticated engineering and much else — it depends on complex computer code that can’t be checked or verified because it isn’t made available to other scientists.
One of the things we know about software — even critical software that runs important medical devices like radiation therapy machines — is that it is frequently in error. Checks of commercially produced software has found a high rate of error and inconsistency. Imagine what you’d find with software produced by academic researchers who aren’t software engineers. The trouble is you can’t find it because the code isn’t always made available.
I feel the same way about data. As an epidemiologist there are some problems related to subject privacy with our data sets, but they can be overcome. Many of my colleagues object to releasing their data sets for a different reason: usually it has taken them years and a great deal of money to collect and they don’t want someone else scarfing it up without lifting a finger and using it to scoop them. My colleagues — and I — want first crack at it. The same thing is true for sequence data in virology and other disciplines. I’m sympathetic because I’m in the same boat, but I think this can be dealt with, too. One way would be to grant a grace period before requiring release of data to allow the scientist who collected it to use it. Once published it must be made available, preferably as online supplementary material accompanying the research where it is used. Another solution would be to have some requirement for crediting the data collector via authorship or data origination credit, credit that would count for academic or professional purposes like promotion and tenure.
Whatever the solution, the principle should be that scientific data, like other information, wants to be free and has an even greater claim because science is an open process. It can’t be if the tools that generate the data and the data themselves are not accessible for confirmation or verification. I agree with Ince:
So, if you are publishing research articles that use computer programs, if you want to claim that you are engaging in science, the programs are in your possession and you will not release them then I would not regard you as a scientist; I would also regard any papers based on the software as null and void.
We now use only open source statistical software suites like R because it can be checked, improved and corrected by a large community of users and by our scientific colleagues around the world. We make our own code available, too.
Because we like to consider ourselves scientists.