Is There a Black Box in your Research Methodology?

I want to point out an interesting opinion piece about the threat of black boxes and the roll of OpenSource software in math.

A key part of the message:

Increasingly, proprietary software and the algorithms used are an essential part of mathematical proofs. To quote J. Neubüser, "with this situation two of the most basic rules of conduct in mathematics are violated: In mathematics information is passed on free of charge and everything is laid open for checking."

In other words, the difference between using proprietary software and OpenSource software for mathematical research is that for the former the actual method of calculation is something you are not necessarily allowed to know, let alone report as part of your research, while for the latter, not only can you know this, but you can even participate in producing or modifying it.

I would like to suggest that this argument extends to other areas of science that use algorithms (formulas and their implementation, mainly) for data management, data mining, data summary, and statistical representation and modeling. If you can't detail the methods .... if you have a "black box" at any stage of the development of your work ... then you are not really doing it right.

You need to ask yourself: Is the commercial software that you use transparent or is there a part of the process that is hidden and proprietary? I don't mean "do they show you the formula they use somewhere in the manual ..." Even a formula designed for calculation (as opposed to a more theoretical formula) for a statistic is not a representation of what actually happens in the machine. Floating point calculations in the computer are not done the same way as you would do them on paper, for instance. How are they done, exactly? Never mind the random number generators.... What a mess that can be.

This applies to statistical software, mathematical modeling software, spreadsheets, and possibly even graphical software such as CAD program and 3D modeling software.

The only way to get rid of the black box is to make the code totally open. One good way to get that result for yourself as a scientist or engineer is to use OpenSource software.

Tags

More like this

The sad thing about this is that some people still do statistics in Excel, despite the fact that version after version was shown to mis-calculate basic statistics. (I'm not sure about the last two versions, but I'm not sure what would prompt to fix a well-known problem after all this time).

And just in case you haven't been paying attention, there's the recent past to recall. There's the recent Excel bug (fortunately a display-only bug, but close!).