NASA’s GISS has on-line graphing system that lets you see a graph of temperatures for a particular weather station (For example, here is Sydney airport).) Steve McIntyre decided to run a script that asked the GISS system to produce graphs of each and every station in the data set (thousands of weather stations). Since his script was requesting these as fast as it could, it made it difficult for anyone else to use the system, so the GISS webmaster blocked his access. When he asked why, the webmaster explained
Although you did not provide any further details about your problem, I will assume that you are the person on the cable.rogers.com network who has been running a robot for the past several hours trying to scrape GISTEMP station data and who has made over 16000 (!) requests to the data.giss.nasa.gov website.
Please note that the robots.txt file on that website includes a list of directories which any legitimate web robot is forbidden from trying to index. That list of off-limits directories includes the /work/ and /cgi-bin/ directories.
McIntyre responded with:
I have been attempting to collate station data for scientific purposes. I have not been running a robot but have been running a program in R that collects station data.
See, it was a script collating station data, not a web robot scraping station data. In the ensuing discussion, McIntyre steadfastly refused to accept that the the two are exactly the same thing and displayed his usual paranoia. Eventually GISS agreed to unblock him if he would show some consideration to other users by running his bot late at night or on weekends. McIntyre’s victory speech:
This little episode does illustrate the power of blogs to bring sunlight onto bureaucracies and to force them to do things that they don’t want to do.