Steve McIntyre's DOS attack on GISS

Lambert, I think before you attempt to slander McIntyre that you should apply a little due diligence.

Check to see if the GISS webmaster has ever published a paper with Michael Mann.

As I understand it, McIntyre wasn't asking the system to produce "graphs" of each station but was rather trying to obtain the underlying data so he could generate *his own* graphs and perform his own statistical analysis. GISS could have made the data available as a single gzipped file that would require ONE file request. Since they chose to only make the data available in tiny chunks, they shouldn't be surprised that people who want *all* the chunks are making lots of individual page requests.

(I agree it's a little embarassing that Steve didn't know enough to add a short sleep between requests and didn't know the geek terminology the webmaster was using to describe the problem but everybody has to learn sometime - you'll notice he mostly figured it out later if you read the rest of the comment thread.)

Thom

How is Mr. Lambert slandering McIntyre? Following the link and reading McIntyre's description...it turns out that Mr. Lambert's description is correct. He was running a script that accessed a dynamic web site where each access generated many processes on the web server's infrastructure, further bogging it down. And McIntyre didn't seem to do anything to lessen the load...such as limiting it to a few accesses a minute. If this happened on any web server I was looking after, I'd block him as well.

It was a robot. It may not have been a web crawler, but was a robot.

And just what does "Check to see if the GISS webmaster has ever published a paper with Michael Mann"? What has this to do with McIntyre writing a web robot that almost brought down a web server?

Robots usually do crawl sites, not automatically download files. He even admits he could see how somebody might consider it one though, so I'm rather confused how that's paranoid....

I did not regard my R program as a "web robot" in the sense that this is usually used as I was not indexing their files, but I realize that reasonable people can disagree. There is no explicit notice on the webpages.

In any case, the script may have created a DoS situation of sorts but it's not an what I would refer to as a DoS attack. Or at least wouldn't think he was attempting to make the server unavailable. Just not aware the server couldn't handle his requests fast enough.

Certainly, if they allow that sort of stuff on their site, downloading, and don't have anything in place to limit number of requests in X time, I wouldn't see how it's anyone's "fault", except perhaps theirs....

I did not regard my R program as a "web robot" in the sense that this is usually used as I was not indexing their files, but I realize that reasonable people can disagree.

No, reasonable people cannot disagree. Any program that traverses a site is a spider, a robot, and if it doesn't obey robots.txt, is reasonably blocked.

This is the guy who is proving thousands of climate researchers wrong?

Pffft.

McIntyre seems to have trouble understanding plain English, since the webmaster explained in no uncertain terms what he needed to do:

"If you are indeed the person who has been running that particular web robot, and if you do need access to some large amount of the GISTEMP station data for a scientific purpose, then you should contact the GISTEMP research group to explain your needs. E-mail addresses for the GISTEMP research group are located at the bottom of the page at http://data.giss.nasa.gov/gistemp/"

Most people would simply have done as suggested.

And most people (adults, anyway) certainly would not have made a public issue of it.

That's behavior one expects from two-year-olds.

Murray- Thom was being sarcastic.

Wikipedia says:

A robot is a mechanical or virtual, artificial agent.

So that would include McIntyre program. Not that this is at all relevant. What matters is that any twit should realise that if you bombard any but the biggest web servers with 16000 requests in a short time will be in danger of causing of DoS attack, intended or otherwise.

In any case, the script may have created a DoS situation of sorts but it's not an what I would refer to as a DoS attack. Or at least wouldn't think he was attempting to make the server unavailable. Just not aware the server couldn't handle his requests fast enough.

Ignorance, or in this case plain stupidity, is no excuse for bad behaviour.

Certainly, if they allow that sort of stuff on their site, downloading, and don't have anything in place to limit number of requests in X time, I wouldn't see how it's anyone's "fault", except perhaps theirs....

ARE YOU BLIND? THERE'S A ROBOT.TXT FILE !!!

Most webmasters expect robots to behave and obey such system information. It's a bit like putting up a sign saying "DON'T GO IN THERE" ... and then McIntyre goes in there and is amazed the webmaster is annoyed.

Which comes to my last point. If McIntyre wanted the data, why didn't just ask them ... it ain't hard to send an email you know.

"If McIntyre wanted the data, why didn't just ask them ... it ain't hard to send an email you know."

That would have been way too easy -- and would have left him with nothing to complain about (no vast climate-scientist conspiracies, etc, etc.)

If I had been the webmaster, I would have told him to take his script and shove it up his a-hole.

I was not being sarcastic. I was serious and I am now investigating to see if anyone named "guthrie" has ever published with Michael Mann, to include poems in obscure trade press.

Will publish the results of my audit on Michael Mann's collaborator's on McIntyre's blog. We can never be too suspicious.

You'd better start with my livejournal then:
(cue bit of shameless self promotion even though theres nothing much there that you'd find interesting)
http://calcinations.livejournal.com

As part of my job I ask agencies for large files. I never, ever, ever, ever (did I say ever?) access these data through a website.

I determine who is the appropriate contact and we arrange an FTP - all agencies have a guest/visitor FTP for these requests.

Either Stevie Mac is an idiotn or he's pulling another dimwitted trick to trump up the response to prove some point.

Oh, wait: those two things are not mutually exclusive.

Best,

D

Did I mention ever?

Clown show over there at CA, complete with rubber chickens and honking horns.

D

Robots usually do crawl sites, not automatically download files.

Robots do what you program them to do. From Wikipedia:

Internet bots, also known as web robots or simply bots, are software applications that run automated tasks over the internet. Typically, bots perform tasks that are both simple and structurally repetitive, at a much higher rate than would be possible for a human editor alone.

Check out #6, under "Malicious Purposes":

data mining programs and cyber-surveillance bots used to collect information on a web site or company;

http://en.wikipedia.org/wiki/Web_robot

While it may be true that McIntyre didn't have a clue, either, that doesn't mean his bot didn't fit the definition that he didn't understand.

"So? What's wrong with being sexy?" - Nigel Tufnel.

#6. My program was downloading station data. The total amount of data involved is less than 20 MB. I was running my download program at night. When a problem arose with downloading, I asked the webmaster about it politely, sending an email to him at about 11.30 PM. Rather to my surprise I got an answer a few minutes later that night. I identified myself and stated that the purpose of retrieving the data was scientific. When the webmaster referred me to the GISS research group, I immediately asked them for particulars on how I might access their data, as the webmaster suggested. Later that day, the GISS research group asked the webmaster to remove the block and asked me to download in the evenings which was fine by me. I expressed mild satisfaction at this being resolved expeditiously.

#14. I had no "malicious purpose" in downloading the data. I appreciate the efforts of webmasters to operate sites. I thought that the downloading that I was carrying out would be well within the capacity of the site. Again, I was not attempting to cause any problems for them; I immediately identified myself when service was interrupted and was quite willing to accept other methods of downloading the data.

I expressed mild satisfaction at this being resolved expeditiously.

Having read the thread over at CA, a more accurate description is that you whined like a baby and demonstrated your usual cluelessness...

Guthrie, you are making me even more suspicious....McIntyre, could you please release the robots onto Guthrie's livejournal site?

I believe you might find something there that could get you another invite to brief politicians in Washington D.C. and your picture on the front page of the Wall Street Journal.

Here is the very first response in that thread:

Steve McIntyre says:
May 17th, 2007 at 7:00 am

A CA reader had already downloaded the data, spreading his download over 36 hours. So I'll get the data one way or another. However, I expected NASA GISS to remove the block once they got notice that the data was not being accessed by a web robot but for scientific research. The block is also much wider than the data set already downloaded.

Isn't the language interesting: my attempt to get data is described as "scraping" data. Perhaps that's a term of art in websites, but it seems an odd choice of word. Also to describe this activity as a "blatant" violation of their robots.txt policy. Imagine that - me "blatantly" violating a robots.txt policy. Sorrrrr-eeee. Will I have to write "I promise to be nice to Gavin Schmidt" 100 times on the blackboard.

Thom is developing a nice style.

"Certainly, if they allow that sort of stuff on their site, downloading, and don't have anything in place to limit number of requests in X time, I wouldn't see how it's anyone's "fault", except perhaps theirs...."

I TOLD the judge it was that damn yuppie's fault for leaving the car window rolled down.

Thom,

It's suspiciously like you can't refute McIntyre on the facts so you're reduced to character assassination. Oh wait, that's exactly what you're doing. Preaching to the choir is lots of fun, isn't it? Much easier than substantive commentary.

Yawn.

The total amount of data involved is less than 20 MB.

The amount of data and the amount of requests are different things. If the request was for a single 20Mb file, the webserver would have no problems serving it up. If instead you were requesting lots of little chunks, the load is much higher. Every request, however small, has overhead on the server.

Situations where there are lots of little requests are known to webmasters as "death by a thousand cuts". For future reference, give consideration to throttling your scripts, or as others have suggested, directly contacting the folk holding the original dataset.

Thom:
Where's Tim Curry when you need him? RELEASE THE BOTS!
(overenthusiastic agreement can be mistaken for sincerity by some).

Steve McIntyre: "Imagine that - me "blatantly" violating a robots.txt policy. Sorrrrr-eeee. Will I have to write "I promise to be nice to Gavin Schmidt" 100 times on the blackboard."

Do I detect some juvenile anger issues here?

Next thing you know he will be threatening to take his marbles home.

Re: #16
I never said you had malicious purposes, I was telling you where on the page you could find something that matched exactly what you had done so that you could see that what you wrote was, in fact, a robot and was therefore required to follow the rules in the robots.txt file.

Welcome to the Internet, Mr MacIntyre. There are rules. You just learned one of them, badly. The proper response to an unintentional breach of etiquette is "I'm sorry, I didn't know. I'll adjust accordingly for the future" and not "well sor-ree Mr Angrypants but I didn't build I 'robot' because I don't know what a 'robot' is and if you can't handle me doing something I'm not supposed to do then it's a problem with you being a weenie and not me doing something wrong!"

I expressed mild satisfaction at this being resolved expeditiously.

Dhogaza is right. This whole episode is sickening. What Steve McIntyre did was clearly bad behaviour by internet standards. And what happens? Does he get punished? Oh no! GISS bend over backwards to help him, and all the while he mouths of about them behind their backs on CA.

It appears McIntyre didn't think of robot in the same sense as the webmaster, in general the file is used for spiders and various search engines to stop the indexing of pages. He wasn't indexing pages. But if the directories are being indexed, and it has the same effect, it doesn't matter what it is, but what it did. As the webmaster said:

It is an automated process scraping content from the website, and if that isn't what a web robot does, then it's close enough.

So yes, reasonable people could disagree, the webmaster even said if it's not, it's the same.

I agree this could have been handled a lot better. However, it seems as if as usual, everyone is getting in a huff about nitpicky nothing details. That was mostly a discussion where both sides had to figure out what the other side meant. They didn't have the file set up properly; they were blocking automated retrieval of robots.txt itself anyway. And The fact is in the end, that they restored access and that they let him keep using the same program.

Part of the problem stems from that the contact info you're saying he should have used -- He couldn't get to it, he was blocked!

As the webmaster said,

Good point. That was foolish of me to suggest checking a page on which access had been turned off.

It's also clear they got tired of talking to him. Good thing none of you that hate him is the webmaster (or maybe one of you is....) Hmmmm....

Anyway......

I realized that what is and isn't a "web robot" is a semantic discussion. However robotstxt.org defines "web robot" as follows:

Web Robots are programs that traverse the Web automatically. Some people call them Web Wanderers, Crawlers, or Spiders

Within this definition, my downloading was not being done by a "web robot". (I understand the issues, I'm just making a semantic point.)

and

In retrospect, I would have put a sleep instruction - BTW how do you do that? - but it didn't occur to me that my downloading requests would be material to them.

You're making a big deal out of nothing.

Robots usually do crawl sites, not automatically download files.

Someone needs to study up on what it means to crawl a website. You download the webpages (usually html pages) when doing that, so you are most certainly downloading files. It's not like you get to use the webserver to parse the text for you.

Rober S. said: "You're making a big deal out of nothing."

Actually, the one who made a big deal out of nothing was McIntyre.

The vast majority of people would have quietly resolved the issue with NASA GISS and never made a public issue of it.

That was mostly a discussion where both sides had to figure out what the other side meant.

Yes! A discussion that took less than 24 hours!

Now, why exactly did McIntyre bitch and whine and moan so much on his website, then?

The fact is in the end, that they restored access and that they let him keep using the same program.

Yes, they did! It took them less than 24 hours!

Now, why exactly did McIntyre bitch and whine and moan so much on his website, then?

Kristian: Well, of course by definition, you have to get a certain number of files (The HTML ones) to build a map of a site (etc) I'm talking non-html data files don't get downloaded. Crawling/Spidering doesn't usually involve automatically getting anything but those pages that get indexed, in some depth (and keyword analysis, etc) but as we can see from discussions, there's some wiggle room on definitions, depending on the context. That's why we have people that hate Windowss that argue with each other about Linux distribution version numbers. :)

JB: I agree.

dhogaza: It's a blog. We post our thoughts on blogs. You know like "Here's what I did, here's what they did, here's what I thought about it." So it's a narrative. He even said that he had no problem with being blocked for too many requests, but once they knew it was a person running a script rather than a totally automated non interactive program, that he thought the access should have been restored. Not told to go to email pages he was blocked from.
But whining?

I have no particular objection to the webmaster blocking access until he was assured that the inquiry was legitimate or even that the webmaster referred the matter to his bosses. I also have no objection to how long GISS took to remove the block. If all climate data access issues were resolved this quickly, it would be great.

------------------------------

Now, aside from not knowing how to talk 'webmaster' or interact in a certain style, not knowing what his program was going to do, and not putting in a delay, on his part, it's not like GISS had the data ready nor that it was easy to get it, nor that they had all the site set up completely correctly.

Also, both sides were a bit curt (a factor in email no doubt) once about different issues . To, "I know how to use the GHCN data. I'm not interested in "tips" on how to use it." and later From, "The block has been lifted as far as I know. As far as I'm concerned, this is the end of our correspondence."

Night
S. "I'm blocked"
W. "Yeah, you hit us with 16K hits, whatever you're doing, you went into directories you weren't supposed to. If you want GISTEMP data, email these folks here: [list]"
S. "Ah, well I'm running a program called R that collects station data.

Morning

S. "Uh, I can't get to your contact page since I'm blocked."
W. "Oh, yeah, email, I let you get that page. Whatever the program was, you were grabbing content from the site."

W. "Contact the group, I don't know, but here's thereis site. Maybe Dr. R can tell you how to use the GHCN data instead"
S. "I know how to, just wanted to get the GISS stuff also."

S. "I couldn't find station data GISS uses, like GHCN has. I ran a script to get it all automatically and got blocked. Can I find someplace else to get it or get unblocked?"

R. "GHCN is better for individual stations. We have intermediate steps and regional data. I don't know why you'd want these notes, but tell me what and maybe why and I'll try. We have tools up there, maybe not such a good idea to have them up, but they're really for our use."
S. "Ah, can I get the programs you use on the GHCN so I can replicate it? I'd just like to see how the two vary and the data the way anyone can access it on the web, but all at once."
R. "Dr Hansen said use your program to get the data, but do at night or weekend. What we do with GHCN data lis listed on our site."
S. "Thank you, I'll get the data slowly at night. I'm still interested in possibly getting the code you use on GHCN data."
R. "You have no block, we're done talking."

Robert S.

You will note that the part you quote -- "I have no particular objection..." -- was added well after the initial exchange -- and after Tim and others posted about it.

In fact, McIntyre took his merry time to add that part so unless one read all the comments, it looked like he was still waiting to have the issue resolved for some time after it had already been resolved.

I too would have to characterize McIntyre's initial post -- particularly taken with his own first comment (noted by Lee above) -- as whining about nothing.

After McIntyre was spanked (on his own blog and elsewhere) for the way he acted , he back-pedaled, undoubtedly because he realized how foolish it made him look.

Someone doth protest too much to support one's ideology, methinks.

Best,

D

Robert S.:

But whining?

I have no particular objection to the webmaster blocking access until he was assured that the inquiry was legitimate or even that the webmaster referred the matter to his bosses. I also have no objection to how long GISS took to remove the block. If all climate data access issues were resolved this quickly, it would be great.

I notice you've cherry picked your quote. Sorry to bust your bubble, but here is another quote from McIntyre:

Imagine that - me "blatantly" violating a robots.txt policy. Sorrrrr-eeee. Will I have to write "I promise to be nice to Gavin Schmidt" 100 times on the blackboard.

Explain to us how that isn't whining?

He even said that he had no problem with being blocked for too many requests, but once they knew it was a person running a script rather than a totally automated non interactive program, that he thought the access should have been restored.

Well, no, that's not what he said.

He said that after he identified himself, they should've unblocked him. "Hey, I'm Steve McIntyre, unblock me!" was more the tone.

Whining, in other words.

The webmaster probably didn't know Steve McIntyre from Steve McQueen, obviously he wasn't impressed.

Perhaps McIntyre's peeved about that, too ...

JB: I'm not trying to say this interaction didn't cause him to tone down. Or not. I don't know the guy nor have I met him nor do I converse with him. Except for what I may write at his blog or any other one. I just think it's no big deal, it's his blog after all, not a story in the New York Times or a peer journal. What do you expect from a statistician, who's not a web guy, on a blog? In any case, knowing how some webmasters are (and how narrowly many tech guys define things) I'm not surprised it got out of hand. So he was pissed off and then calmed down. Big deal.

Dano: I don't have an ideology. I'm trying to be neutral. And I did say this: "not knowing how to talk 'webmaster' or interact in a certain style, not knowing what his program was going to do, and not putting in a delay, on his part..." So he's not blameless. You guys all seem to be pretty confrontational is all I'm saying. That, and you obviously don't like him, what he does, or how he does it. Me, I wonder why they didn't have the data up there on FTP (or not have it at all). And if they were so bent out of shape, why they let him go back to getting it the same way he had been. Eh.

Meyrick: I didn't cherry pick it, more I picked what summed up his point as he was making it. If that was backpeddling or clarification, on his part, I don't know. I don't even really care. I write boring things all day, I just like to write here and elsewhere for fun. If you think what he said at the end was total BS, fine. I don't disagree with your opinion if that's what you think. And yeah, that did start out pretty whiny. "Imagine that - me "blatantly" violating a robots.txt policy. Sorrrrr-eeee. Will I have to write "I promise to be nice to Gavin Schmidt" 100 times on the blackboard." But I prefer to categorize it as satire. (wasn't gavin the one on the debate that one time?) Nyah nyah.

dhogaza: He said it at some point in time. Whatever, big deal. "Perhaps McIntyre's peeved about that, too ..." No doubt! Perhaps that's the effect they were trying for. Or maybe they don't know him. I doubt it except probably the webmaster, yeah.

Anyone ever think perhaps he's trying to be funny? Or to get you all riled up? Or both? I mean, look at this poke at Mann "If all climate data access issues were resolved this quickly, it would be great." If he was that worried about all this sort of thing, he wouldn't have left everything up or written it in the first place. Or he wouldn't keep doing things like this....

Oh, and by the way, we use scripts all the time where I am and we never call them robots. We think of robots like the way Google sends out crawlers to map keywords and populate databases without humans having to be activly involved in that process. (Well of course except for writing and tweaking and monitoring the programs.) We don't even call what we do agents. The scripts automate a repetitive task and gather information we can look at later. Hardly a web robot. But again, just semantics.

You guys (and/or gals, such as the case may be)-- have a great day!

Anyone ever think perhaps he's trying to be funny?

Hmmm, hadn't thought about that. Now that you mention it, though, Climate Audit is a joke ...

Hmmm, hadn't thought about that. Now that you mention it, though, Climate Audit is a joke ...

Yes, but a doubt it's deliberate.

You never know with polarized and focused individuals.

So yes, reasonable people could disagree, the webmaster even said if it's not, it's the same.

Problem is, the people who run the internets or work therein on a daily basis will *all* be on the side of the webmaster. So, he's wrong. And rude about being wrong.

Really, it'd be like some guy with an undergraduate math degree saying that he's right in the face of the expertise virtually every scientist in a given field... very much like that.

...very much like that.

The chatter of the Cheer Squad is very much like that too.

Best,

D

Reading this site it is clear that Deltoid is "Fair and Balanced". Just like Fox.

I'm sure some CA cheer squadders have cute puppies and contribute to charities and help old ladies cross the street.

How's that for balanced, and is that the best you can do?

Best,

D

We don't have to be fair or balanced gladys unless we feel like it. Only thing we have to be is quicker, stronger, better than our competitors and that seems to be getting easier all the time.

Please, Dano... Just look at the headline for this thread. I have been attacked here by people just because I seemed reasonable. Lambert once sited a study about gun deaths that he stated changed his mind. In the thread discussing the actual study that Lambert pointed us at, the authors of the study admitted that the type of analysis they did was flawed and didn't prove anything (they acknowledged this because the whole point of the study was only to prove that another study was flawed as well). We know to what extents Lambert will go to find errors in the work of people he disagrees with, but he continues to site Phil Jones. You yourself seem to come here only to pat yourself on the back and state that anyone with any intelligence already agrees with you.

No need to reply that I haven't gotten your approval, but if Tim wants to act like he has some sort of moral high ground or that childishness is only to be found on other blogs, he needs to get a new editor and headline writer.

couched potato, thanks for making your reasons for posting here clear. When you get bored with that, will you decide to change your motivation and start looking for truth in the world around you?

Right at the same time as Fox, ClimateAudit, Dicky Lindzen and the CEI do, yes indeedy.

couched potato, I'm not sure I understand. You consider Fox to be biased, so to counter that, you are going to be biased? Wouldn't the counter to something that is biased be something that is actually fair and balanced? To me, your inability to see this tends to make you less credible. I have only seen two posts from you, both on this thread, so I will have to see some other post from you on some actual topic to see if you have this failing of logic in other things as well.

oconnellc, my headline is accurate. Steve McI did make a DOS attack on GISS. It was inadvertent, but it was still a DOS attack by his bot.

Tim, that is kind of weak. Are you really contending that Mc made a deliberate, malevolent decision to try to make GISS servers unavailable? It seems obvious that Mc sure made a lot of requests of that server. But saying that it was a DOS attack implies that that was the intentional result of his actions. The fact that he was eventually allowed to continue doing what he was doing makes that assertion seem kind of silly. He made a mistake born out of ignorance. It seems that accusing him of that would suit your typical purposes. But you decide to accuse him of a DOS attack and put it in the headline, so that anyone reading gets the impression that he sat around trying to figure out how to bring that server down. That is childish and as bad as what you typically accuse him of.

And, I'm guessing that you have some proof that a DOS actually happened. Are there server logs that show that the server was unavailable? For example, if Mc was inundating the server with requests so that it was unavailable, most of his own requests would have failed as well. I didn't see that anywhere. There are many reasons why a server admin would be able to detect that sort of traffic and stop it and many of them have nothing to do with server load. Look at the attempts he had made. 16000 over several hours! Are you really trying to say that Mc figured that was the kind of load required to bring down a public web server? And I notice that you include the comment from the web master chastising Mc for not looking at the "robots.txt" file... But you didn't include this comment: Good point. That was foolish of me to suggest checking a page on which access had been turned off.

You are a computer scientist. So am I. You used specialized knowledge to try to mischaracterise what happened. Mc made a mistake from ignorance. He stated that he should have put a 'pause' in his script and then asked how to do that. What you did was worse. I'm not defending Mc. But this is your blog and you started this. You can act any way you want. But if you don't act like an adult, don't be surprised when people notice. When Mc doesn't act like an adult, you sure notice. Who notices when you don't?

Tim:

It was inadvertent...

Oconnelle

Tim, that is kind of weak. Are you really contending that Mc made a deliberate, malevolent decision to try to make GISS servers unavailable?

Dhogaza

sigh...

Sorry I haven't replied, but I have been on vacation the last 10 days. dhogaza, what is your point with your last post? Tim says that someone did something inadvertantly and then called it an attack? Have you somehow found a definition of 'attack' that includes inadvertant activity? I went to dictionary.com and found this definition:

the beginning or initiating of any action; onset

Are you contending that you can use this definition of the word "attack" and thereby make yourself feel better? And we still aren't sure what kind of attack it was. Tim called it a DOS, but it doesn't sound like anyone was denied service. If Mc had written his program to wait 1 minute between requests, would it still have been called a DOS attack? Someone made the comment that 16000 requests over several hours was a lot. I bet $50 to your $1 that I could install java and Tomcat (free software) on your computer at home and it would take less than several hours to service 16000 requests (someone here actually used the word 'bombard' to describe that kind of traffic). Will you be comfortable when Tim writes a headline about the DOS attack you set up with your browser when you made that reply? I'm willing to bet that you used a web browser to make that reply, and I'm sure that a web browser fits the definition of "mechanical or virtual, artificial agent".

I can read. Tim said it was inadvertant, then he called it an attack. Did you really not understand the point I was making? If you didn't, I'm sorry I made this post attacking you. Maybe you could do me a favor though? Could you count the number of replies in addition to Tim's original (including your own) that attacked Mc for not using that inaccessible robots.txt file?

Could you count the number of replies in addition to Tim's original (including your own) that attacked Mc for not using that inaccessible robots.txt file?

I'm sending data to a firm tomorrow morning. I'm using FTP. As does the rest of the non-dipsh*t population on the planet.

Best,

D

Dano, not sure how new you are to the internet. I've been writing software since 95. There are a lot of data exchange protocols. FTP is actually typically used for large file transfers because of the overhead associated with the protocol itself. You may have heard of something called "MIME". It allows people to attach files to emails and transfer data that way. There are many ways to exchange data based on the http protocol. SOAP is one of them. In addition, you can just include any old content you want in an http response body. If you read the original post by Tim, you can see that the folks at NASA have set up a publicly available way to get their data and it appears to also be based on http. That appears to be the mechanism that Mc was using. In fact, when questioned about the correct way to get their data, the folks at NASA told Mc to continue to use that mechanism. I'm not sure who the dipsh*ts are, but it would appear that you think the folks at NASA are the dipsh*ts.

Since you weren't able to do the counting, that number was four. Yes, four people decided to make themselves look like (choose your word: nitwits, dipsh*ts, idiots, dimwits) and chastise Mc for not using the file that was not available to him. That appears consistent with your most recent comment ignoring the fact that NASA had set up ahead of time an http based mechanism to retrieve the data, and when asked, they told Mc to continue to use the http mechanism.

Once again, the point was not about data protocols, but instead about which blogger and set of fans should be referred to as the 'cheer squad'. It seems obvious, but I'm guessing you will get it wrong.

I addressed this way up at 12. It is standard to ask someone for something & they ask you how you want it, and you tell them. This stunt was just trying to game a predetermined result to get the cheer squad to clap. F'n joke & your hand-waving is just more hand movement.

Best,

D

Dano, it is very much NOT standard for someone with data to wait for ad hoc requests for that data, ask how the requester would like to have the data prepared and then prepare it that way. I am beginning to believe that you do work for the government... It is standard to prepare data, make it available, and then provide directions for retrieving that data without having to bother a human to get it. That seems to be what NASA did here. Are you not reading the same blog that I am reading? My goodness, doesn't the fact that after Mc contacted them and asked how he should get the data, they said he should continue to get it the way he already was enter into your interpretation of these events?

You seem determined to view all human interaction through your filter as a functionary asking another functionary for a dataset or a report. This was someone accessing a public API to retrieve thousands of individual files. Do you really think the folks at NASA want an email from everyone who wants that data? Is there so much tax money to spend where you work that no thoughts of efficiency ever come into play?

It seems that the only way to make a point here is to set it on the ground and light it on fire. The regular commenters here seem to love to call everyone else biased, call names and point out how unfair everyone else is. But no one seems to notice little things like attacking Mc for not using a file that he didn't have access to. Dano, whatever point you are trying to make, you aren't helping yourself. But, I'm sure your next response will include something about how those who agree with you (about what, I'm still not sure) have now moved on...

I decided to go back from the top and re-read the original post. Tim said that the webmaster blocked Mc because his requests were making the site difficult to access. But then, when he quotes the webmaster, the webmaster states that he blocked Mc because he was accessing an illegal directory. It doesn't say anything about making the server inaccessible. Then, when we get a little more detail, we find out that the webmaster had failed to provide information about which directories were available or not. This is of course after several people also jump on Mc for not reading said inaccessible file. Dano now tells us that rather than using that publicly available API for getting the data, Mc should have sent Hanson an email and asked Hanson to generate the thousands of files that Mc wanted and then put them on an ftp server for him. NASA obviously doesn't know how things work for public servants, because they told Mc to continue getting the data using his scripts. This makes Mc an idiot.

Of course, we are supposed to believe that it was difficult to access the server. But the only time Mc had any difficulty was after his access had been blocked. I guess it was just other people who had difficulty accessing the system...

You know, it would have been easy enough for Tim to write an entry about how Mc is arrogant and makes ignorant mistakes. But for some reason, that wouldn't have been enough. And the cheer squad here would have missed a few chances to call Mc some names (of course, they screwed up this time and looked a little silly themselves). Tim, where can I get my pom-poms for the next round?

Thank you, oconnellc, for your try at argumentation by death of a thousand dull cuts. I can also call it 'ants making crumbs into a picnic'.

Nonetheless, we can see by your well-presented recantation of the fact pattern, that when dealing with government data it is best (thus standard) to ask for it, as many many people do every day - the many many people who do this kind of work for a living (as opposed to those who have a blog). There are many reasons for this, but the most common is typical SNAFUs like this one.

Hence my point: people who study and analyze data (and share it) just call, because it's the best way. If Stevie Mac actually did this kind of work, he'd have e-m'd or picked up the phone.

But since his name is toxic, he can't do that, and he ran into the SNAFU that one would expect if they used and analyzed Uni or gummint data.

Best,

D

Dano, I'm not sure what you are talking about. Only someone working in the public sector thinks that it is a good idea to have someone who does nothing but write emails and wait for data to be sent and someone else who does nothing but respond to emails from people looking for data...

Take a look at the Nature policy on data availability (there are others, but Nature seems like a good place to start) http://www.nature.com/authors/editorial_policies/availability.html :
<<
Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database (as detailed in the sections below on this page) or, where one does not exist, to readers promptly on request.
>>
and then, if we go down a bit further to the section on "Other supporting data"
<<
Such material must be hosted on an accredited independent site (URL and accession numbers to be provided by the author), or sent to the Nature journal at submission, either uploaded via the journal's online submission service, or if the files are too large or in an unsuitable format for this purpose, on CD/DVD (five copies). Such material cannot solely be hosted on an author's personal or institutional web site.
>>

You may have noticed that the first choice there is to just make the data publicly available. Only where this is not possible should you require people to have to request data from you. My goodness, what do you do for a living that you think it is a good idea to have to have someone dealing with these sorts of requests? You really think having to call or email is better than just visiting a web address and getting the files that you want?

I did some more googling and found this interesting site that talks about making data publicly accessible: http://globalchange.gov/policies/diwg/dmwg-grants.html

I tried to find a site that discussed how the best way to share data and information was to require that gummint agencies keep someone on staff to handle ad-hoc requests for data, but I couldn't find any. Maybe some of the Uni's or gummint's that you work with have specific policies about this that you would care to share with us. It seems that the rest of the world seems to feel that having data be freely available to experts and non-experts is kind of an important thing. It also seems that just having this data be available on public archives with the metadata necessary to understand and use it is also important. Is this some sort of bureaucratic job security trick that you accidentally shared with the rest of us? Does anyone else feel that what Dano is proposing is really a good idea?

What really amazes me is that you can look at this particular fact pattern and think that the mistake was made by the person looking for the data. It never occured to you that maybe NASA could have also just put all of that data together in a single file and made it available. Instead, your solution is that NASA employ someone to deal with the same situation over and over every time someone wants that data, thus making things a little more difficult for people who want the data and ensuring one more unnecessary FTE. I'm at least starting to understand a little better how you think. You might not actually be part of the cheer squad, you might be saying things because that is just the way you work...

BTW, I was a systems analyst for a large bank for many years, incl writing the requirements for the data transfer systems for my dept from and to vendors.

Anyway, the point is that the private sector has the resources to get it right, whereas the public sector may or may not.

Those that transfer data from and to those in Unis and bad ol' gummint know this and to avoid time wastage because they know this, find alternate means because the system may or may not work the way you wish.

Or you may be a researcher who doesn't know how to write such a program (because, say, you are in the field looking at bugs and never learned this marginal skill) nor do you know anyone who does. Nor do you wish to, as it is a marginal skill.

Best,

D

Dano, so your point that Mc is an idiot, dimwitted and a dipsh*t because he didn't realize the inherent incompetence of the government agency he was trying to get data from? Even though they had a public API for getting the exact data he was looking for, he should have assumed that it didn't work (since it was a government site, and not the private sector) and just called or emailed. Amazing.

I have looked and I can't find a single reference that leads me to believe that what you propose is actually the right way to do things. That globalchange.gov site I pointed you at stated that the right way to do things was to make the data and metadata publicly available so that non-experts could make use of it. And that was written 10 years ago. There are many public databases available for housing the sorts of data we are talking about, so that researchers don't need to learn this marginal skill. Nature, for example, considers it a requirement for publication that you make that data publicly and independently available (unless you have a really good reason for doing so). In general, this is all directed right at the uni and gummint crowd, since in general the private sector people aren't publishing their results, but keeping it private so they can make money from it. But in your mind, the best way is still a phone call or email. And you go out of your way to call people names for not following what certainly appears to be the standard way of doing things. I can't believe you have the gall to refer to CA as the cheer squad...

oconnellc: A public API? You simply do not know what you are talking about. They have a CGI script that produces graphs of station data. You can use that script to get the data, but it is wrong to call this a "public API", and it certainly was never intended to be used the way McIntyre used it. Which is why his bot caused problems for the server, as was clear from the incomplete files he mentioned in his post.

I appreciate your efforts to hand-wave away the fact that Stevie Mac doesn't do research for a living thus isn't familiar with getting data.

Or hand-waving away the fact that This Is A Conspiracy Against All Amateur Auditors Getting To The Truth Of The Great Conspiracy To Take Away Our Energy.

I share data often. I start by e-ming the webmaster of the agency and asking the best way to get/give data. Takes 10 seconds, the reply usu. comes in a day or two, takes me 15 seconds to read.

Less time than it takes to read your apologia for incompetence, BTW.

Best,

D

Making the data publicly available has never meant immediately when some child stamps his feet.

You are right. It wasn't a public API. But not because it was a CGI script (seriously? Is that why you think it wasn't a public API?). Because they never intended it for public consumption. Mc should have known that after all, since all he had to do was look at the robots.txt file... Oops. No one other than that server admin was able to look at the robots.txt file. I didn't see you mention that it was not accessible in your original post, btw.

You have to be kidding to try to keep defending this. There are about a zillion things that NASA could do to keep the outside world from accessing that script, from firewall settings to OS security to web sever security on down. In addition, most modern web servers allow you to configure how many processes at a time can be spawned by something like a CGI executable (or servlet or whatever if CGI isn't good enough for you) to prevent things like this from happening in the first place. They did NONE of them. Instead, you saw a chance to call Mc a poo-poo head and you took it. If that is your goal, fine. Just stop trying to act like your goal is something else.

Rah rah, sis boom bah!

This is almost funny. I have never meant to defend Mc from accusations of making an ignorant mistake. You keep missing the point. Tim started this out by cherry picking comments from the Mc blog and then the rest of the cheer squad here decided to pile on. Now, the squad here is up in arms doing anything they can to talk about anything other than the fact that Tim isn't above a little name calling when he thinks he can get a dig in. Despite all the evidence to the contrary, Dano continues to assert that the best way to share data is on an ad-hoc basis. After all, that is the way he does it!

And Eli, nice. You are the first person to actually make a comment I agree with. Mc was not the most patient person when he couldn't get what he wanted as soon as he wanted it. Post about his patience all day. This noise about emails being the best and DOS attacks and robots.txt is just foolish. Eli, are you willing to comment on the amount of maturity found on this site as well as the amount on CA?

I think the bigger point than him acting "badly but not horribly" is it seems like what happened here is that he was repeatedly hitting directories that his program, whatever you call it, shouldn't have been in. They blocked him because they didn't know what it was doing. Unless they're running the web server on a 300 baud modem on an Atari 400, 8000 hits an hour on a CGI doesn't seem excessive, nor very DoSsy at all.

Oh. And for those of us that use the interface into the ncdc noaa stuff. You can make as many requests as you want, while it's getting the data the server itself is just calculating away, and that should be the bottleneck, you have to wait for the data to come back.

I could be wrong about that. If so, and grabbing that data automatically with a script did bring them down, then for God sakes, NASA can't run a system that can handle it? NASA? You mean the folks that build space shuttles don't have a web server that can handle that traffic? NASA? Come on.

And if the thing is slow (as it pretty much always is) then the server is mismatched for the load (or not tweaked, etc) and it's always going to be slow. Wonder what they do after a TV show increases their traffic 50% or 500% Hmmmm.

I'm sure the fact that it was McIntyre didn't help him much -- even if the webmaster didn't know who he was, once it got escalated....

Anyway, whatever you call it, you can define even an inadvertant DoS a 'DoS attack'. That's rather misleading, as usually an attack is on purpose. But '...inadvertant DoS attack...' doesn't sound as sharp, does it?

Look at it this way; the two "camps" hate each other. And if you didn't, what would you write about mainly? How would people hang around to be entertained and generate traffic?

Hey, oconnellc, I think the robots.txt was only unavailable after he got blocked. But the inaccessable thing was about the page with the email address, I believe.

Regardless if his work is good or not, or what he does helpfull or not, he still didn't have to be arrogant about it. But he's not the only one online that acts like that, so whatever.

Resistance is futile. I have been assimilated. I now believe oconellc's argument. It has been through force of repitition.

I now believe, thanks to the misdirections by oconnelc, that it is not important to know the protocol for retrieving data. You should just do whatever you feel like. It's so 60s, man.

I also no longer believe Stevie Mac doesn't do research for a living thus isn't familiar with getting data. I now agree that This Is A Conspiracy Against All Amateur Auditors Getting To The Truth Of The Great Conspiracy To Take Away Our Energy.

The force of repitition has assimilated me. Hockey stick.

Best,

D

Oh, and seriously, I may not always agree with Robert S. but I enjoy the nice structure of his arguments.

Best,

D

You have to be kidding to try to keep defending this. There are about a zillion things that NASA could do to keep the outside world from accessing that script, from firewall settings to OS security to web sever security on down.

Well, for starters, they did the one thing that would reasonably be expected of them, and the one thing the relevant law expects them to do: They put up a robots.txt file.

After that, they did the obvious and standard thing for keeping people from abusing the script: When they found someone abusing the script, they blocked him. They were extremely polite about it, too, it seems to me.

The people in this thread claiming that it shouldn't matter that he flooded NASA, because NASA should have been able to write something that could handle flooding... this is truly baffling to me. It is as if someone drove a car through the front door of a shop, and there were people claiming that if the shop hadn't wanted cars driving through their front wall, they could have locked the door. No doubt NASA could have written this tool in such a way as to support a high and persistent volume of traffic, but that was apparently simply outside the intended scope of the tool. This is no doubt why they put up a robots.txt in the first place!

But not because it was a CGI script (seriously? Is that why you think it wasn't a public API?).

That's what I would say except under certain extremely simple circumstances, yes.

The very fact you are asking this question indicates you don't actually know what the word "API" means. It's like you're saying "but not because it was a banana (seriously? is that why you think it wasn't an apple?)".

Steve McIntyre's DOS attack on GISS

More like this

Cherry picking confirmed

Houston Street Subway Art 1

West 34th/Penn Station Subway Art 5

West 34th/Penn Station Subway Art #1

Scienceblogs is shutting down

June 2017 Open Thread

March 2017 Open Thread

January 2107 Open thread

December 2016 Open Thread

Ask Ethan #79: The tiniest Neutron Star (Synopsis)

The Amazing Spider-Man!

Danish Castle Road Trip