Help Louisville... and think about your data

By dsalo on August 5, 2009.

Yesterday the city of Louisville suffered a freak thunderstorm that dumped half a foot of rain in an hour and a quarter. Their library has been devastated, to the tune of a million-plus dollars in damage.

As a proud member of The Library Society of the World (and I have the Cod of Ethics to prove it!), I ask anyone who is able to throw a few bucks their way. I trust Steve Lawson to do as he says he'll do.

The library's data center and systems office were on its ground floor. If you watch Greg Schwartz's Twitterstream you can keep up with the recovery efforts. For my purposes, though, I want you to think very hard for a moment about your data, keeping the Louisville Free Public Library's experience in mind.

Where are your data? Do you have them on your hard drive? Are they on a departmental or campus server? Where is that? What natural or manmade disasters is it vulnerable to?
Geographically-dispersed backups, do you have them?
If you're relying on a third-party service, has it promised you anything about reliability, or the ability to get your data back out?

These are basic, basic questions, folks. If you can't answer them appropriately for your most important data, call in an expert yesterday to get the problem fixed. (Nota bene: graduate students are not experts.)

More like this

MPOW's offsite backups are stored a set number miles out of the DC metro. Seems like university libraries could do something cooperative if they aren't already so that one in the midwest could host one from the east's data and vice versa. Or maybe they are doing that and I heard it somewhere?

I know that some libraries use LOCKSS to replicate e.g. electronic theses and dissertations, but I don't know of any similar efforts for non-unique, less obviously vital data.

Geography is everything. We got a four inch rainstorm over a pretty large area in central Minnesota last week, and it had no significant effects other than making all the lakes and swamps a little wetter.

I understand the OR's in one of the area hospitals, which were built in the basement, were also wiped out.

Store your data in your own cloud of PCs or servers, with replication to a second remote cloud. The software that automates this is a snap to use, and presents your data online using standard http:

http://www.caringo.com/downloadCAStor.html

License for the first 4Tb is free.

Ob disclaimer: yeah, I have an interest in this. ;-)

Russell: What OS's does it work with?

It ships with Debian rolled into the package, and is designed to boot over the network, or from USB. Once it boots, the OS and application live entirely in RAM. There is no system volume, nor system to maintain. All the disks are used to store data. You can specify a replication factor for data, and lost volumes or servers cause more replicas to be made of data falling below that number of replicas.

And it really is easy. If you have a few old PCs sitting around gathering dust, you can make your own little storage cloud lickety-split. The fun experiment then is to pull the plug on the PCs one by one, and observe that all your data remains, until the last one standing, or until you exhaust total disk space. If you have ethernet between buildings, you can put half in one building and half in another, and define them as subclusters so that every piece of data is stored in both buildings, and your data survives even if one building is completely destroyed. There's also remote replication if you don't have a large local net. But the subcluster mechanism is neat, because it is so completely automatic. Applications using the data wouldn't even know about the loss of one subcluster.

There's also a utility to make a virtual drive on your laptop that is stored in your cloud over the internet. So it's easy to get to everything remotely.

BTW, don't let me abuse your blog with commercial advertising. I'm an engineer, not a salesperson. The problem you brought up just happens to be the business we're in. ;-)

Thanks for spreading the word about LSW passing the hat, Dorothea. Glad you think I am trustworthy, but I understand if people don't want to PayPal a stranger. Here is the address and phone for making a direct contribution to the library:

The Library Foundation
Attn: Flood
301 York St.
Louisville, KY 40203
(502) 574-1709

Slightly off-topic, I'm afraid, but, can you explain the origin of the Cod of Ethics? Clearly a famous typo, but what was its provenance?

Someone was comparing the American Library Association and Association for Computing Machinery Codes of Ethics on FriendFeed. A typo happened in one of the comments. The rest was history. :)

There's a huge number of options available both free and for cost (one of which I'm the founder of). Before you start looking at specifics, you've got to figure out some really important basics. These sound really simple, almost trite, but if you don't really have a solid answer to them it's likely that you're either going to get fleeced, waste a lot of time, or end up losing data.

1) What's the budget?

A) Nothing: You're going to want something open source, and require some already-existing offsite computing resources. Perhaps you have multiple offices, or in this case a group of regional libraries that work together and mutually host each others data?

B) There is money now: As above, but you can also setup the offsite resources, as long as you already have a location at which you can do so. And you can look into self-hosted professional software solutions.

C) We have money now, and will have an ongoing budget: Any of the above, and you can look at professional offsite backup services.

2) Do you have technical resources?

A) Nope, no techs available: You probably want to look into a professional solution then. Trying to do anything else will risk a whole lot of frustration and probably not end up very reliable.

B) Yup, plenty of people with deep computer skills: Then you can probably put together an in-house solution using open source tools that'll work nearly as well as, and for a whole lot cheaper than, a professional solution.

If you want to try setting up a low-cost in-house solution, this looks very promising, if still rough around the edges: http://allmydata.org/trac/tahoe

If you want to look at professional solutions, I'd (naturally) encourage you to find the contact page on the URL in my name, and get in touch. We're a small company with several years of history, and though I obviously can't promise that we're the perfect solution to everything, I can promise that you'll be treated honestly and fairly. Even if that means telling you to go somewhere else.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

Universities Can Agree On All Hate Speech Except Antisemitism

More by this author

We're moving!

August 3, 2010

Looking for us? We're happy to say that we're part of the new Scientopia blogging collective. Come see us there!

Belated Zombie Day post

July 13, 2010

Oh, if I'd only had this picture for Zombie Day... Credit for the photo to UK Serials Group. Credit for the alteration of the speech bubble (you can see the original slide here if you care to) to Steve Lawson. Incidentally, I should have a postprint of an article based on this presentation up…

Promoting a comment: "Open and shared format"

July 8, 2010

Richard Wallis has taken my ribbing in good part, which I appreciate; his response is here and will reward your perusal. He also left a comment here, part of which I will make bold to reproduce: As to RDF underpinning the Linked Data Web - it is only as necessary as HTML was to the growth of the…

Small fry, blogging networks, and reputation

July 8, 2010

So, the PepsiCo blog thing. Right. Advance disclaimer: this is me talking, not either of my illustrious co-bloggers. We have not yet made a decision about what to do; one co-blogger is across the pond at a conference and the other is vacationing, so that discussion will have to wait a bit. This is…

I'd love to dance with you, but...

July 6, 2010

Richard Wallis of Talis (a library-systems vendor) posted The Data Publishing Three-Step to the Talis blog recently. My reaction to this particular brand of reductionism is… shall we say, impolitic. I just want to pat Richard on the head and croon "Who's the clever boy, then? You are! Yes, you are…