Help Louisville... and think about your data

Yesterday the city of Louisville suffered a freak thunderstorm that dumped half a foot of rain in an hour and a quarter. Their library has been devastated, to the tune of a million-plus dollars in damage.

As a proud member of The Library Society of the World (and I have the Cod of Ethics to prove it!), I ask anyone who is able to throw a few bucks their way. I trust Steve Lawson to do as he says he'll do.

The library's data center and systems office were on its ground floor. If you watch Greg Schwartz's Twitterstream you can keep up with the recovery efforts. For my purposes, though, I want you to think very hard for a moment about your data, keeping the Louisville Free Public Library's experience in mind.

  • Where are your data? Do you have them on your hard drive? Are they on a departmental or campus server? Where is that? What natural or manmade disasters is it vulnerable to?
  • Geographically-dispersed backups, do you have them?
  • If you're relying on a third-party service, has it promised you anything about reliability, or the ability to get your data back out?

These are basic, basic questions, folks. If you can't answer them appropriately for your most important data, call in an expert yesterday to get the problem fixed. (Nota bene: graduate students are not experts.)

More like this

MPOW's offsite backups are stored a set number miles out of the DC metro. Seems like university libraries could do something cooperative if they aren't already so that one in the midwest could host one from the east's data and vice versa. Or maybe they are doing that and I heard it somewhere?

I know that some libraries use LOCKSS to replicate e.g. electronic theses and dissertations, but I don't know of any similar efforts for non-unique, less obviously vital data.

Geography is everything. We got a four inch rainstorm over a pretty large area in central Minnesota last week, and it had no significant effects other than making all the lakes and swamps a little wetter.

I understand the OR's in one of the area hospitals, which were built in the basement, were also wiped out.

Store your data in your own cloud of PCs or servers, with replication to a second remote cloud. The software that automates this is a snap to use, and presents your data online using standard http:

http://www.caringo.com/downloadCAStor.html

License for the first 4Tb is free.

Ob disclaimer: yeah, I have an interest in this. ;-)

It ships with Debian rolled into the package, and is designed to boot over the network, or from USB. Once it boots, the OS and application live entirely in RAM. There is no system volume, nor system to maintain. All the disks are used to store data. You can specify a replication factor for data, and lost volumes or servers cause more replicas to be made of data falling below that number of replicas.

And it really is easy. If you have a few old PCs sitting around gathering dust, you can make your own little storage cloud lickety-split. The fun experiment then is to pull the plug on the PCs one by one, and observe that all your data remains, until the last one standing, or until you exhaust total disk space. If you have ethernet between buildings, you can put half in one building and half in another, and define them as subclusters so that every piece of data is stored in both buildings, and your data survives even if one building is completely destroyed. There's also remote replication if you don't have a large local net. But the subcluster mechanism is neat, because it is so completely automatic. Applications using the data wouldn't even know about the loss of one subcluster.

There's also a utility to make a virtual drive on your laptop that is stored in your cloud over the internet. So it's easy to get to everything remotely.

BTW, don't let me abuse your blog with commercial advertising. I'm an engineer, not a salesperson. The problem you brought up just happens to be the business we're in. ;-)

Thanks for spreading the word about LSW passing the hat, Dorothea. Glad you think I am trustworthy, but I understand if people don't want to PayPal a stranger. Here is the address and phone for making a direct contribution to the library:

The Library Foundation
Attn: Flood
301 York St.
Louisville, KY 40203
(502) 574-1709

Slightly off-topic, I'm afraid, but, can you explain the origin of the Cod of Ethics? Clearly a famous typo, but what was its provenance?

Someone was comparing the American Library Association and Association for Computing Machinery Codes of Ethics on FriendFeed. A typo happened in one of the comments. The rest was history. :)

There's a huge number of options available both free and for cost (one of which I'm the founder of). Before you start looking at specifics, you've got to figure out some really important basics. These sound really simple, almost trite, but if you don't really have a solid answer to them it's likely that you're either going to get fleeced, waste a lot of time, or end up losing data.

1) What's the budget?

A) Nothing: You're going to want something open source, and require some already-existing offsite computing resources. Perhaps you have multiple offices, or in this case a group of regional libraries that work together and mutually host each others data?

B) There is money now: As above, but you can also setup the offsite resources, as long as you already have a location at which you can do so. And you can look into self-hosted professional software solutions.

C) We have money now, and will have an ongoing budget: Any of the above, and you can look at professional offsite backup services.

2) Do you have technical resources?

A) Nope, no techs available: You probably want to look into a professional solution then. Trying to do anything else will risk a whole lot of frustration and probably not end up very reliable.

B) Yup, plenty of people with deep computer skills: Then you can probably put together an in-house solution using open source tools that'll work nearly as well as, and for a whole lot cheaper than, a professional solution.

If you want to try setting up a low-cost in-house solution, this looks very promising, if still rough around the edges: http://allmydata.org/trac/tahoe

If you want to look at professional solutions, I'd (naturally) encourage you to find the contact page on the URL in my name, and get in touch. We're a small company with several years of history, and though I obviously can't promise that we're the perfect solution to everything, I can promise that you'll be treated honestly and fairly. Even if that means telling you to go somewhere else.