Seed Media Group

Greg Laden's Blog

Evolution, Life Sciences, Science Education, Human Evolution, and Stuff

Search this blog

Profile

greg.jpg


My name is Greg Laden. You can find out about me here, contact me here, and for all the gory details, have a look at this...

Top Posts on This Site




openlab08-submit.150.png

Recent Posts

Recent Comments

Archives

Blogroll

Join the best atheist themed blogroll!

« Gene Genie # 20 | Main | Missionaries. Ick. »

Is There a Black Box in your Research Methodology?

Category: OpenSource
Posted on: November 20, 2007 9:20 AM, by Greg Laden

I want to point out an interesting opinion piece about the threat of black boxes and the roll of OpenSource software in math.

A key part of the message:

Increasingly, proprietary software and the algorithms used are an essential part of mathematical proofs. To quote J. Neubüser, "with this situation two of the most basic rules of conduct in mathematics are violated: In mathematics information is passed on free of charge and everything is laid open for checking."

In other words, the difference between using proprietary software and OpenSource software for mathematical research is that for the former the actual method of calculation is something you are not necessarily allowed to know, let alone report as part of your research, while for the latter, not only can you know this, but you can even participate in producing or modifying it.

I would like to suggest that this argument extends to other areas of science that use algorithms (formulas and their implementation, mainly) for data management, data mining, data summary, and statistical representation and modeling. If you can't detail the methods .... if you have a "black box" at any stage of the development of your work ... then you are not really doing it right.

You need to ask yourself: Is the commercial software that you use transparent or is there a part of the process that is hidden and proprietary? I don't mean "do they show you the formula they use somewhere in the manual ..." Even a formula designed for calculation (as opposed to a more theoretical formula) for a statistic is not a representation of what actually happens in the machine. Floating point calculations in the computer are not done the same way as you would do them on paper, for instance. How are they done, exactly? Never mind the random number generators.... What a mess that can be.

This applies to statistical software, mathematical modeling software, spreadsheets, and possibly even graphical software such as CAD program and 3D modeling software.

The only way to get rid of the black box is to make the code totally open. One good way to get that result for yourself as a scientist or engineer is to use OpenSource software.


Comments

The sad thing about this is that some people still do statistics in Excel, despite the fact that version after version was shown to mis-calculate basic statistics. (I'm not sure about the last two versions, but I'm not sure what would prompt to fix a well-known problem after all this time).

Posted by: IanR | November 20, 2007 11:41 AM

Gnumeric! It's cool.

Posted by: greg laden | November 20, 2007 12:31 PM

Since we're onto that, I'm going to say arrrrrr! (as in www.r-project.org)

Posted by: KMSL | November 20, 2007 8:12 PM

r rocks.

Posted by: Greg Laden | November 20, 2007 10:39 PM

And just in case you haven't been paying attention, there's the recent past to recall. There's the recent Excel bug (fortunately a display-only bug, but close!).

Posted by: Moss | December 1, 2007 8:25 PM

Post a Comment

(Email is required for authentication purposes only. Comments are moderated for spam, your comment may not appear immediately. Thanks for waiting.)





Having problems commenting? (UPDATED)

Blogs in the Network

Advertisement

Top Five: Most German

Search All Blogs

Top Science Stories

powered by SEED - seedmagazine.com