Janet Stemwedel brings her expertise in science and science ethics to bear on the contents of the emails stolen from the University of East Anglia. As with all of her work, the whole thing is worth reading, but I want to pick up on this claim:
If you don’t thoroughly document your code, no one but you will have a clear understanding of what it’s supposed to do. Actually, if you don’t thoroughly document your code, you yourself, at a later moment in time, might not have a clear understanding of what it’s supposed to do. (Then there’s the question of whether, when executed, it actually does what it’s supposed to do, but as far as I can tell, that’s not a central issue in the discussions of ClimateGate.)
Sadly, no. First, the criticism of the code focuses exclusively on the comments, not on what the code actually does. As Tim Lambert observes, the supposedly damning comments about artificial corrections generally attach to code which is itself commented out. So no harm, no foul.
But even Janet’s premise here is false. It’s certainly the case that young programmers are told to comment copiously, just as young scientists are taught that there’s no human element to the scientific process. These are the little white lies we tell in hopes of papering over the messy truth of how these fields actually operate.
In reality, “real programmers don’t comment code. If it was hard to write, it should be hard to understand.” Or as new Scibling Andrew Gelman (a statistician and statistical programmer) puts it: Don’t comment code (“I’d heard this before, but good advice is typically worth repeating“).
That link goes to a programmer in language processing, who insists that commenting code is futile. “Professional coders don?t comment their own code much,” he explains, “and never trust the comments of others they find in code. Instead, we try to learn to read code and write more readable code.” The author continues:
The reason to be very suspicious of code comments is that they can lie. The code is what?s executed, so it can?t lie. ?
I don?t mean little white lies, I mean big lies that?ll mess up your code if you believe them. I mean comments like ?verifies the integrity of the object before returning?, when it really doesn?t. ?
Another common reason is that the code author didn?t actually understand what the code was doing, so wrote comments that were wrong.?
Most Comments Considered Useless
The worst offenses in the useless category are things that simply repeat what the code says.?
Eliminate, don?t Comment Out, Dead Code
This isn’t to say comments have no value. In my dissertation work, I’d leave comments in my code reminding me when I’d done something non-idiomatic in my programming language, or where I kept trying to optimize a given code block in the same way, always to have it fail. “Don’t do that stupid thing you always try to do here” is not canonically perfect code documentation, but it works.
The file from which this material was all drawn is a log of a programmer brought in to sort out software written years earlier by someone else. It was crufty and odd and confusing, and his comments reflect that confusion. I’d wager dollars to donuts that the comments of everything from Linux to Windows to Word and Photoshop are filled with equally frustrated comments as people try to understand why feature X causes feature Y to crash. It’s a natural part of programming, and the idea of well-commented code is a fiction, like the idea that science is an enterprise where data are shared freely and widely in a world of internal harmony devoid of personality clashes and grudges. It’s a story we tell to students to get them to see the big picture, and something they need to unlearn when they need to do science (or programming) right.