clearclimatecode.org

If you believe in climate related software being open, or even if you only believe in the ultimate triumph of Python over Fortran (personally I'm a Perlista when not rooting for embedded C, though I have respectable colleagues who adhere to the Python heresy, and who may convert me in time), then go visit:

http://clearclimatecode.org/

where you can find a guide to the project history and some interesting results and their google code. This is all a free effort by Nick Barnes and David Jones and others at Ravenbrook, but they welcome others to join. So far they are concentrating on the instrumental series (gosh, how topical!), but Nick has ultimate ambitions to do palaeo stuff too. Maybe they will write a climate model one day (though that is a much much larger task). It looks like the UKMO has finally swung round the idea of an open analysis (well maybe; this can perhaps be read slightly differently. We'll see what they release. I presume "We intend that as soon as possible we will also publish the specific computer code that aggregates the individual station temperatures into the global land temperature record." means that they feel the need to scrub the code for embarassing comments first, otherwise what possible reason could there be for delay?).

I wonder if this is a good place to comment on programming languages? One of the reasons I left science was that I really didn't want to spend the rest of my life writing Fortran. It does have some advantages - it is harder to do some of the really really stupid things that are fairly easy to do in C - but that isn't a reason to be writing new code in it. It is a legacy problem - so much old code is on Fortran - but sometime someone has to gird up their loins and replace it. And retrain all the scientists who write Fortran.

Tags

More like this

Thanks very much for this, W. A couple of notes:

- it's already not just Ravenbrook, and we hope to get people on-board from all over. The code is hosted at Google Code; anyone can play. Already our STEP2 code was written by Paul Ollis, who I don't think I've ever met.

- the project is not anti-sceptic. I would love some serious sceptics to join the project, read some code, write some code, and generally help out.

- we are not wedded to Python, except for the CCC-GISTEMP project which is already underway. We like Python very much, we think it's especially good for writing clear code (which I think few people would claim for certain line-noise languages, eh, W?), but project members are well-versed in a wide range of other languages. If, for instance, future projects require more horse-power than a Python interpreter can offer us, we will certainly consider alternatives (although I'd be tempted to roll up my sleeves and write a really good Python compiler: it's too long since I wrote a compiler).

So please, come one, come all, come and fix climate science software.

By Nick Barnes (not verified) on 05 Dec 2009 #permalink

Re the met office: in my experience, delays in publishing source code are usually caused by legal hurdles. Someone has to go through it all and check that it was all actually written in-house; none of it is licensed from anyone else, etc. See Open Solaris, for a big example which seemed to take ages from the starting gun to the code drop.

By Nick Barnes (not verified) on 05 Dec 2009 #permalink

We like Python very much, we think it's especially good for writing clear code

I do too, and I hope you didn't take my comment regarding Python vs. the computational expense required to do realistic modeling with a GCM as an indictment of your choice to use it.

I think it's a very good choice for any of the projects you might take on *other* than a model meant for production use. It actually would be a good choice for writing a model meant to educate people more generally as to how such things are written.

If, for instance, future projects require more horse-power than a Python interpreter can offer us, we will certainly consider alternatives (although I'd be tempted to roll up my sleeves and write a really good Python compiler: it's too long since I wrote a compiler).

Well, I'm sure you know there is a just-in-time compiler project for Python.

Ahh, here it is: Psyco.

The project doesn't look very active, though.

Yes, lots of people have done compilers of various sorts for Python - in fact some people such as Siek at UC Boulder use Python for teaching compilers - and Psyco is a pretty cool example. In fact, Paul Ollis's original code for STEP2 of CCC-GISTEMP used Psyco if it was present. I ripped out the code which invoked Psyco because it was "contrary to the project goals" (i.e. it wasn't really, really, clear), and because it was premature optimization.

But none of the compilers I have seen for Python have really seemed industrial-strength to me. Maybe it's just my compiler itch.

By Nick Barnes (not verified) on 05 Dec 2009 #permalink

Yes, lots of people have done compilers of various sorts for Python - in fact some people such as Siek at UC Boulder use Python for teaching compilers

Python as the implementation language (which makes a lot of sense), or writing a Python compiler (which given the complexity would not seem to make so much sense in an introductory compiler course)?

But none of the compilers I have seen for Python have really seemed industrial-strength to me. Maybe it's just my compiler itch.

Well, there are some inherent language issues which would make writing an efficient static compiler impossible.

Out of curiousity, what's your compiler writing experience? I co-authored the first commercially supported Pascal compiler (for PDP-11's) in 1975 and the company I co-founded provided compilers with front-ends for Pascal, Modula-2, C++ and C and back-ends for about a half-dozen architectures for about twelve years.

I got out of the compiler racket about 10 years ago after getting tired of doing contract work for Rational.

Well, my recent experience is mainly with Java and Ruby (with a little perl on the side). If you want well-organized, reusable, readable code, both of them have some advantages on that front (java with the eclipse IDE I've found particularly nice). Both have reasonably large built-in and third-party libraries that are easy to adopt (although not always intuitively consistent). Both can run quite speedily enough for most purposes after some initialization overhead.

When you're constrained by irreducible numerical computational limits then fortran may still be a good choice, but in my experience a good algorithm has far more impact than choice of language on speed, and being able to write efficient algorithms easily can make a huge difference...

William,

Just to show how ecumenical I am, let me say that I enjoyed this post.

I am also a perl programmer, and wondered for many years about the mysterious heresy that is python. Years later, having seen so much %*(*^# that perl lets you say, I've kind of wished I'd taken the other path. I'm sure it's still easy enough to learn.

May I ask, do you think there is conceivably work in the future available for computer programmers in climate science who may lack the formal training in experimental sciences?

[As Nick says, maybe. My own personal view developed towards the idea that there *should be* far more computing support for science. Alas, almost everyone can write *some* code and this sustains the illusion that everyone can write what code is needed. Also, scientists are cheaper than decent software engineers, and most bits of science are short of money, and run on the kind of grants that won't look favourably on long-term development. Misc caveats: I believe that parts of the core of the UKMO model are maintained by real programmers (though not, I suspect, by Real Progrrammers) and, for exapmle, the "oxygen isotope" add-on was done by someone ex-industry -W]

By Alex Harvey (not verified) on 05 Dec 2009 #permalink

I am also a perl programmer

WTF does this mean? I'm a machine language programmer, an assembly language programmer, a programmer in many high-level languages (who has written many of the same), yet, I'd never say ...

"I'm a perl programmer" ....

What a self-deprecating thing to say ...

Dhogaza, thanks for the encouragement. Actually, I also know a number of languages, although maybe not quite as many as you. I'm probably too young to have needed many of the ancient ones you've mentioned here. At any rate, should I kill myself, now, do you think, or is there hope?

By Alex Harvey (not verified) on 05 Dec 2009 #permalink

Python with its Numpy and Scipy are nice for scientific-oriented computing, with occasional aid from C or R.

Otherwise, especially if you are in the JVM, look for Clojure - http://clojure.org/ - all the Java libraries plus easier concurrency, high level, interactive.

Python as the implementation language, or writing a Python compiler?

Writing Python compilers. Surprised me too, but the course notes are online, and pretty good.

inherent language issues

Nothing I haven't seen before, and in spades. At least the meta-object protocol is pretty fixed.

what's your compiler writing experience?

Ten years on-and-off at Harlequin, writing a Standard ML development environment (including a compiler, of course), and helping out occasionally with their Lisp and Dylan products (which included several compilers).

Aside from that, numerous little implementation projects, mostly for Lisp-like languages. I used to hang out on comp.compilers and comp.arch, which is where I first encountered John Mashey. Oh, and one of our clients asks for advice sometimes on his home-grown implementation of a Dylan-like language.

About half of my paid work is in Common Lisp at the moment, which gives me quite a good handle on the challenges and benefits of dynamic languages.

But I'm not really going to write a Python compiler. Life's too short.

By Nick Barnes (not verified) on 05 Dec 2009 #permalink

Alex @ 7 asks:
May I ask, do you think there is conceivably work in the future available for computer programmers in climate science who may lack the formal training in experimental sciences?

That depends what sort of future you want. There is certainly room for more "support staff", broadly. I found out this week that "technicians" at my local university - many of them programmers in fact - can earn more than I do (as senior member of a small software consultancy), at least in a slow year.

But if you want to be an actual scientist, I don't think there's any way in except the traditional route with a science PhD.

By Nick Barnes (not verified) on 05 Dec 2009 #permalink

Hi Nick, thanks for your response, and my congratulations for what you are doing over there at http://clearclimatecode.org/. I hope this project takes off in the same way that Linux did. I think we are so much closer to seeing public trust restored to climate science. As for me, I would love to work at a university amongst the support staff if it gave me the opportunity to actually work with climate scientists. What sort of skills do these technicians generally possess? I suppose, if I win the lottery next Tuesday, I'll certainly choose the Ph.D route! Otherwise, though, to continue with my mortgage repayments, I may have to remain as a lowly IT professional... :)

By Alex Harvey (not verified) on 06 Dec 2009 #permalink

Writing Python compilers. Surprised me too, but the course notes are online, and pretty good.

Nice, maybe I'll take a look ...

But I'm not really going to write a Python compiler. Life's too short.

That's sort of how I've felt about writing any compiler the last decade :) I get paid a lot more to work on much simpler stuff these days ... though I did help write the first implementation of referential integrity operators for PostgreSQL not that many years ago ... that was a close call, though, almost got sucked back into language stuff :)

Thanks Ian. I was aware of OpenTemp; in fact John Van Vliet is on our project mailing list. He has a different emphasis. I don't know why he's gone quiet; I might ask.

By NickBarnes (not verified) on 06 Dec 2009 #permalink

I don't know why he's gone quiet; I might ask.

It might be related to his rude reception at CA and WUWT.

I think he honestly bought into the "independent audit" meme and that his reconstruction work, showing that using only "good stations" from Watts surface stations project showed essentially the same warming as the full set when using GISTEMP methodology, would be welcomed.

Just the opposite.

Then again, maybe he just got busy in his professional and/or personal life.

re opentemp -- is it just me or does it seem funny & silly to complain about Fortran, and then claim to be "open source" project using a proprietary Microsoft language (C#)?

[You appear to be missing the point. You can publish open source in a proprietrary language -W]

And I don't quite understand the need for clearclimatecode to have this corporate/consultancy connection - I mean it doesn't inspire me to do any work on it if it's all for the greater glory of some British consulting firm.

[It has the connection because they work there. Or at least the founders do; they now have "outsiders" too. They have been very clear that they welcome help, including from sceptics. Its really unclear to me why you're being so grumpy about it -W]

I don't quite understand the need for clearclimatecode to have this corporate/consultancy connection
What corporate/consultancy connection? The fact that it was the idea of a director of a small company, and was originally hosted at that company, and the initial work was done by people at that company? Why is that some sort of problem?
I mean it doesn't inspire me to do any work on it if it's all for the greater glory of some British consulting firm.
What on earth are you talking about? Here are the goals of the CCC project:

  • 1. To produce clear climate science software;
  • 2. To encourage the production of clear climate science software;
  • 3. To increase public confidence in climate science results.

As you can see, it is not "all for the greater glory of some British consulting firm".

I expect that every person and company who contributes to the CCC project will have their own reasons for doing so. These reasons might include the three project goals, or they might be that an individual likes the smell of FORTRAN, or maybe they like working for projects with triplet-letter abbreviations, or maybe they are Python evangelists, or maybe they are conspiracy theorists who want to show that global warming is all a big scheme cooked up between James Hansen and Al Gore.

I don't care.

Google is helping the CCC project by hosting the source code at Google Code, and the mailing list at Google Groups. They are certainly doing this as part of their sinister and not-very-secret plot to Take Over The World. Do you have a problem with that?

I don't care.

You might want to help the CCC project because you want to annoy Steve McIntyre, or because the sky is pink on your planet and the word for pink in your language is C'C'C.

I don't care.

But I suspect that you won't help, because you have already decided that you're not going to help, because you'd rather sit on the side-lines and carp.

W: please edit this if appropriate. I've had more than enough of this kind of ridiculous criticism, and may have lost my cool.

[Oh no, this is one of my best comments for a while, I'm keeping it whole and untouched :-) -W]

By Nick Barnes (not verified) on 06 Dec 2009 #permalink

And retrain all the scientists who write Fortran.

Why not make some camps where they could be interned and make the job so much quicker and more effective?

I don't think that just because a large number of scientists have written bad code in fortran that fortran should be blamed. I used to use perl a great deal but since fortran 90/95 I basically have no need for perl. I just wrote some modules to do dynamic arrays and hash based arrays (granted, that is a little kludgy and not as powerful as perl, but does what I need).

Fortran's native array/matrix manipulation and interrogation features are bloody awesome. You can write very nice, succinct and clear code with Fortran 90. I wouldn't choose C/C++ in a pink fit**

** This does not apply if I was writing a GUI.

[Without backing down at all (I still think Fortran needs to be abandoned) I think you've mistaken the real point I'm trying to make, which is probably no surprise as I made it rather poorly. You can, as they say, write Fortran in any language. The problem isn't really specific to the choice of langauge; you can write quite well in Fortran. It is rather more about all that goes with it, and the training in computer science / software engineering that is missing. The contrast between science software processes and the software industry is very striking -W]

"retrain all the scientists who write Fortran"

This is perhaps the most amusing comment - why would anyone assume we were /trained/ in the first place?

I suppose changing to C or C+ or whatever these newfangled things are called might slow me down a little for a month or two. I wrote some Fortran in Java a few years ago, I spent longer than usual reading web pages for the syntax but I didn't let it interfere with the basic structure :-)

(I do know one person who attended an F90 course, many years ago.)

[I knew I shouldn't have implied that anyone had been trained :-(. But I rather fear that the science communities approach to software engineering bears an uncanny resemblance to the septics views of auditing science: they really don't know what they are on about but feel quite condfident they can bluff their way through successfully -W]

I don't think people "feel quite condfident they can bluff their way through successfully" (or even confident :-)) - they just get on with it and do the best they can, picking up bits and pieces all over the place. The interweb makes this a pretty viable strategy IMO and IME, even if it isn't optimal.

It's important to reconise that pure coding is just one part of the job of a typical scientist, we generally have precious little formal training in data analysis, professional writing or public speaking either. Oh yes, management. I almost forgot that. University staff have teaching/lecturing too. We can hardly be trained (and maintained) at a high level in all these things - there simply isn't the time, and given the range of work, there is not the reinforcement through repetition. Neither is there a reward structure for anything better than "good enough". Perhaps this latter point is the real key...

The serious GCM groups do have proper software engineers involved, but that only covers computational efficiency (and perhaps reproducibility) not really correctness in the broader scientific sense.

It is rather more about all that goes with it, and the training in computer science / software engineering that is missing. The contrast between science software processes and the software industry is very striking

If you mean that scientists don't pretend to have software that actually works in order to scam money out of desperate clients, then yes, there is a great deal of difference.

I'm a scientist. I use revision control software. I write unit tests. I write object oriented fortran code with operator overloading. Most scientists I know seem to think this is over the top and probably unnecessary.

I did computer science undergraduate courses that (at the time) I thought stupid, dull and of no relevance to me. I didn't start actually programming until I had a problem to solve. And only when I made enough mistakes did I learn some real programming techniques. Maybe the point is that people who are attracted to science might not be all that receptive to formal programming lectures, if they were they might be studying computer science?

I reckon programming is like statistics, boring and dull until you need it, and then it is pure gold.

I should also say, if you are trying to get scientists to write "better" code then making them use C++ would be spectacularly counter-productive. So much prior knowledge is required to avoid making an awful hash of it and is likely to make scientists even less receptive to software engineering principles.

>that you won't help, because you have already decided that you're not going to help, >because you'd rather sit on the side-lines and carp.

well no, I won't help because although you claim to be doing climate modelling; you're not.

[Ha. I was going to complain bitterly at you for saying this, but alas you're right - the about page does indeed say modelling. Ah well - hopefully they'll correct that asap -W]

and I don't really see the point in redoing in GISTEMP in Python, as if that will make things clearer or better than Fortran. Sure, Fortran is old and boring, but unless you're doing object-oriented climate modelling in C++ or Python, or porting it to run on GPUs or something, I don't really see the point in moving a temp reconstruction from Fortran to Python. Other than if it's just some lame geek exercise or to publicize your company or whatever.

I should also say, if you are trying to get scientists to write "better" code then making them use C++ would be spectacularly counter-productive. So much prior knowledge is required to avoid making an awful hash of it and is likely to make scientists even less receptive to software engineering principles.

C++ flat out sucks. That's why we have Java and C# among other things.

There was no rigor in the design of the language, and once it became popular, wildly divergent proposals that weren't well thought out were added to the language willy-nilly.

If the choice for a language for writing modeling software comes down to FORTRAN or C++, stick to FORTRAN. Or hide much of C++ from the implementors.

well no, I won't help because although you claim to be doing climate modelling; you're not. and I don't really see the point in redoing in GISTEMP in Python

For starters, since all they're claiming to do at the moment is to rewrite GISTEMP, your claim that *they* claim to be doing climate modeling is false.

More bluntly, a lie.

, as if that will make things clearer or better than Fortran. Sure, Fortran is old and boring, but unless you're doing object-oriented climate modelling in C++ or Python, or porting it to run on GPUs or something, I don't really see the point in moving a temp reconstruction from Fortran to Python. Other than if it's just some lame geek exercise or to publicize your company or whatever.

Python's more readable than FORTRAN.

Making the code more accessible to more software professionals will, if nothing else, lead to more qualified people being able to state informed opinions like "McIntyre and Watts are full of shit".

That's useful.

also, not to sound too bitch (too late I suppose) it does say:

"To promote Ravenbrook's software consultancy services"

which seems a bit of a shameless plug for an open-source project (at least any I've been on the past 10 years or so). But good luck with your efforts, I guess I'll stick to my port of CCSM to OpenCL; maybe turn it into another climateprediction.net....

errr dhogaza:

"Clear Climate Code is an open project created by Ravenbrook; we aim to write and maintain software for climate modelling and analysis"

and re: your laughable comments about C++ - pretty much speak for yourself -- perhaps you can't write anything readable or maintainable in C++, but don't speak for those of us who can & do! :-)

Carl C @ 29:
also, not to sound too bitch (too late I suppose) it does say:
"To promote Ravenbrook's software consultancy services"
which seems a bit of a shameless plug for an open-source project

Where are you reading that this is a goal of the project? It is not, as I made clear when I started the public project last year. However, it's not impossible that there is some cut-and-paste from our internal project goal page on the website. Please show me the page which says that this is a goal of the project, so that I can correct it.

The goals page says that this is one of Ravenbrook's goals for the project. Not the project goal, but one of (in fact, the least of) Ravenbrook's corporate reasons for starting the project. If you think that companies start, run, host, and support pro bono projects without any idea analogous to this one, then think again. Why does Sun do Java? Why do you think that Google runs Google Code?

Ravenbrook is quite deliberately more open about this, and about many other things, than any other company I know. We would even publish all of our internal accounts if we could do that without upsetting our clients. There's some very deep irony in this conversation.

By Nick Barnes (not verified) on 07 Dec 2009 #permalink

This a great python-based course aimed at providing practical software skills to scientists and others: http://softwarecarpentry.wordpress.com/

And as somone who has used and taught a pile of languages - python is what I'd recommend to a scientist looking to learn a programming language - or upgrade from Fortran so the cool kids don't laugh at you.

Carl C:
well no, I won't help because although you claim to be doing climate modelling; you're not.
[Ha. I was going to complain bitterly at you for saying this, but alas you're right - the about page does indeed say modelling. Ah well - hopefully they'll correct that asap -W]

Yes, thanks for pointing this out. It expressed our ambition, and to be fair it did say "and analysis". Fixed.

Seriously, thank you for this. Beyond the importance to the project, I have a personal commitment to truth and fair-dealing, and it's important to me.

and I don't really see the point in redoing in GISTEMP in Python, as if that will make things clearer or better than Fortran.
Recoding in Python, per se does not make the code much clearer. It's a means to the end, not the end in itself. Isn't this clear on the project site? For a trivial example, I suggest you compare the STEP0 directory from GISTEMP with our step0.py. This is just the beginning, of course: in phase 1 our code is still constrained to produce identical intermediate files. In phase 2 (soon) we drop that constraint.

Other than if it's just some lame geek exercise or to publicize your company or whatever.
Please can you stop with this. It should be clear to anyone with an open mind that this isn't what we're about. Have you been unaware of all the complaining about the GISTEMP code, and the way in which it was used to cast doubt on climate science results?
If we were running a soup kitchen, would you complain if the delivery van had a Ravenbrook bumper sticker?

By Nick Barnes (not verified) on 07 Dec 2009 #permalink

Right, I've edited the goals and about pages. I thought they were already crystal clear, but comments here have proved me wrong.

Next I will be accused of hiding my real motivations. 3... 2... 1....

By Nick Barnes (not verified) on 07 Dec 2009 #permalink

re: your laughable comments about C++ - pretty much speak for yourself -- perhaps you can't write anything readable or maintainable in C++, but don't speak for those of us who can & do! :-)

My old software company wrote and sold the first non-Cfront commercially supported C++ compiler, back in the 1980s. C++ sucks.

Ravenbrook is quite deliberately more open about this, and about many other things, than any other company I know. We would even publish all of our internal accounts if we could do that without upsetting our clients. There's some very deep irony in this conversation.

Don't worry, the grown-ups appreciate what you and Ravenbrook are doing. Are you really getting much flack about this from other people, too?

>>>C++ sucks.

Haha, OK, so dhogaza has a "jumped the shark" moment in his zeal to defend CCC. Obviously with such a brilliant coder as him, who can say "C++ sucks" you'll go far! I've had 60 teraflops of climate stuff running continuously, and on over a million different machines on other projects over the years thanks to C++.

As I said I wish you guys luck, maybe I read the stuff on your site a little too seriously, and I've been burnt by attention-grabbing British consulting companies in the past claiming to do climate work (perhpas your intentions are honorable). but I'm glad you've changed your site to reflect that you're not doing climate modelling and your publlicity goals for your company are more known.

I was just more into climate modelling and after seeing you put top billing on it, with n'er a climate model in sight, I guess I got a little bitchy. But that's what blogs are for, eh, after all my old projects & old colleagues have been bitched about quite a lot on blogs for little things (well far more minor than emails with scary statements like "hide the decline" etc ;-).

If I might suggest a strategy. I do occasional translations, and what I have found is that it goes much faster and better if I generate a computer translation and then work from the original and the computer translation. Could such a thing be done from code, with warning flags where there are usual kludges?

One of the things you are going to run into is places where the FORTRAN does things in a seemingly mysterious way known to FORTRAN programmers, but not to the younger generation.

If I might suggest a strategy. I do occasional translations, and what I have found is that it goes much faster and better if I generate a computer translation and then work from the original and the computer translation. Could such a thing be done from code, with warning flags where there are usual kludges?

Well, the point is to restructure the code to make it more readable and robust, not to simply translate it into Python. This requires a reasonably deep understanding as to what the original code is doing, and machine translation won't help with this part. Actually translating loops and assignment statements, etc, is the trivial part of the problem.

Also, a quick poke in Google leads me to believe that such a translator doesn't exist.

Perhaps if one wanted to translate something as large as, say, a GCM into Python, putting in the effort to write such a translator would be worth it. It's not a trivial project, you essentially would need to write a Python code generator for a FORTRAN parser.

also to get this on a less inflammatory note, here's my views on Fortran. I'm not the "death to Fortran" person I used to be, although I haven't written Fortran program "from scratch" in 20 years probably.

1) there's a lot of it out there, and if scientists are used to it and it's verified as working correctly, then I don't have much problem with that as long as it's error-free, in working/running condition etc

2) as long as there are Fortran compilers with good optimizations & libraries (Intel, LAPACK, BLAST etc) and features as OpenMPI, parallelization I think it will be around for awhile and useful, probably much more useful scientifically (thanks to said libraries) than Python.

3) if a scientist is writing bad/spaghetti code in Fortran he'll do so in C, Perl, Python etc anyway

4) as long as a Fortran compiler can handle C-style pass-by-reference arrays, call external C/C++ functions either linked in or via a shared object/DLL, then I'm fine with it as I just do my stuff in C++ and don't really have to look at the Fortran except maybe for putting in a few calls to my C++ calls etc. that's how we did graphics on CPDN, postprocessing etc

Carl, for climate models in Python, talk to Michael Tobis. Really, he's your go-to guy for that. If I remember correctly, Steve Easterbrook is on the case too.

By Nick Barnes (not verified) on 07 Dec 2009 #permalink

Nice find.

I don't know very much about GISS, but my experience so far with climate programming is the gain from rapid prototyping is not worth the loss from grinding through 82 million observations in an interpreted environment.

I'm back in c++ land, but at least I'm generating my headers and test stubs. Maybe they'll have a place spatial interpolation experiments. While see...

By Jonathan Fischoff (not verified) on 07 Dec 2009 #permalink

dh, the aim of the project is to port a LOT of code, which means a translator could be very useful.

The goal would be to make something that would at least flag the structural problems.

@Eli: GISTEMP isn't a lot of code. It's a few thousand lines. Admittedly "all of climate science" is a lot of code, and that is our goal. But we're not going to do that personally. Yes, perhaps a serious effort at converting everyone to a clearer language, like Python, would involve writing some sort of translation / assistance / analysis tools.

But I'm way too old to fall into the trap of "right, the first thing we need, is a new tool".

Small steps. We'll get GISTEMP done, then we'll take over the world.

@David Not a problem, my point was if and when this becomes a larger effort a translator would help. It would also help FORTRAN addicts detox. BTW, promise that python is going to last more than a week. Eli got talked into learning Pascal once, which is pretty dead at this point.

The UKMO do/did employ real CS people, and some of their early MF stuff (CDB for example) was really well written. More latterly they've even been known to use SSADM (see MIDAS). This despite the fact that, certainly during the MIDAS development time, their pay rates were well down on equivalents in industry.

Some other IT development projects (HORACE for example) were as good as many industry one I've worked on. The long-winded nature meant that the hardware got a little out of date (IIRC) but it was good enough for the task, and it was certainly no worse than something Accenture etc. could knock out.

Their FORTRAN training Beaufort Park left a little to be desired though, but it was certainly not aimed at the IT people.

Re current practices. Steve Easterbrook has some very nice things to say about some of their change control systems. I can't find the link at the moment, but it's from a thread on RG's site.

Also, one DB that was looked at by the UKMO for MIDAS was one being built by NOAA in C...I forget the DB system though. So there's more out there than may immediately meet the eye.

As a meteorologist who works with mesoscale models, as a faculty member who directs the undergraduate meteorology program and as a parent of an Electrical Engineering Major I cannot say enough BAD THINGS about python. My son struggled with python and got to the point where he actually decided against a CS degree because of it. Once he took a mandatory assembly language course and finally a 1 credit hour fortran course his comment was "Why didn't they teach us this first!" This comment is echoed by my met majors who are required to take a course in computational techniques in meteorology. Considering the abysmal performance of python, drain bramaged structure and huge floating bugs (sin(2pi) is not 0) how anyone can suggest using python for anything more than generating a web page is beyond me

By mommycalled (not verified) on 08 Dec 2009 #permalink

mommycalled your "huge floating bug" in Python is definitely not a bug and little to do with Python - its a consequence of using a finite set of fixed-width binary representations to approximate reals. It has important consequences if you do any programming with floating point and should be carefully explained in any first year computing course.

Honestly if you don't understand this - you know very little about computing. Nothing wrong with that - but you if you really need to make decisions about computing you really should chat to the computing faculty at your institution- and you may find the conversation goes better if you don't start by advocating teaching of Fortran.

NikFromNYC: OpenTemp is excellent, and John Van Vliet is on our CCC mailing list. OpenTemp is making an open surface temperature record, and so are we. But we have different goals and different approaches. We are writing code specifically to reproduce the GISTEMP record, by reimplementing the GISTEMP algorithms with a particular emphasis on clarity, thus revealing the algorithms to the casual reader.

One valuable result of our approach is that we are finding bugs in GISTEMP (in the sense: places where the code does not implement the semantics immediately intended by the writer), which then get fixed.

We expect that in phase 2, there will be another valuable result: as we clarify the detailed algorithms of GISTEMP, we will find places in which the details of those algorithms may not be the best choice for implementing the high-level design (for instance, we think the way in which sub-box records are normalized for combination into box records, when they have little overlap, is a little odd). GISS may or may not choose to fix some of these.

Then, on a third level, there is the question of whether the high-level design is the best way to compute a global surface temperature record. This is something that's hard to judge when that design is so obscured.

mommycalled: If you want to argue against Python, talk to the Software Carpentry people. There's actually quite a lot of work out there on the differences between languages in clarity, expressiveness, and other measures. I'm curious about what sort of non-CS course your son did which had compulsory assembly language and Fortran units. EEE perhaps? Different strokes for different folks, but I haven't found anyone yet who prefers GISTEMP STEP0 to our step0.py. As for sin(2pi), possibly you're unaware of how floating-point arithmetic works?

By Nick Barnes (not verified) on 09 Dec 2009 #permalink

W - if you've never tasted the bliss of software development on an embedded DSP board, you have not fully lived life.

Many of them have my favorite operating system - none at all!

And imagine an in-circuit emulator which can freeze the hardware (running real-time) as though some logic analyzer from above stops all the universe in its tracks as soon as some untoward bus access happens!

(I need to go smoke)

@Louie: I'm not sure why you mention embedded DSP, but we have street cred in that area too. Earlier this year I was writing communication protocol code for a water meter running on an 8-bit processor with 2K ROM and about 128 bytes RAM (and I can hold 4 'scope probes on a 0.8 mm part in one hand, and still press run/stop with my other hand). Nick has had his head in wire wrap trying to bring up keyboard and display code on a prototype ARM board, and so and so forth.

(Can't speak for W, but I'm sure he has a war story or two)

[Mine are all secret, alas. The-company-that-can-not-be-named isn't very keen on its employees blogging - definitely a downside -W]

Earlier this year I was writing communication protocol code for a water meter running on an 8-bit processor with 2K ROM

What fun! A friend and I wrote a calculator once (a real one). After first writing a simulator for it. National Semiconductor said we were the first people to go to production on the first try using that particular chipset.

It was a huge change-of-pace from the compiler writing work I was mostly doing in those days. Fun, fun, fun!