Attn all researchers: Back up your computers. NOW.


Do not read this post.

Hook your computers up to your external hard-drives, back up your data, then come back to read this post.

... Okay, Im assuming the only people reading right now are people with Apple-->Leopard-->Time Machine, and already have things constantly backed up, or non-scientists who want to know what the hell Im talking about.

Two scientists (a couple who work together) at my Uni had their laptop stolen from their car.

Their work laptop.

One they had used for years.

... None of their data was backed up. I shit you not.

I feel so bad for that couple. But for christs sake...

I am neurotic about backing things up. And then I also have this on all my computers. I also dont leave my laptop in the front of my car, ever. I dont even leave my iPod in the front of my car.

For christs sake...

I hope they get the computer back, and I hope the rest of us can learn from their mistake.

Saw something similar here a couple of years ago, flyers stapled to phone poles by a PhD student who apparently had all his thesis-in-progress and related materials on his laptop, not backed up.

How horrifying ... but how stupid ...

Backing up is like contraception: you don't always know when it's going to matter, but assuming that you'll never need it is

a) Stupid,
b) In the face of sense,
c) In the face of all sorts of experience,
d) Head-smashingly common,
e) All of the above

I worked in IT for 14 years, and you would not believe how many people leave their laptops in their vehicles. We, in a company of 900 computer users, would have to replace 3-5 laptops a year.

Yes, lots of things can happen to your computer. Not just theft, either; digital stuff has this tendency to work until the day it doesn't. I once permanently lost a desktop computer to the voltage spike that came at the start of a power outage (fortunately, although the motherboard was toast, the hard drive was intact; it could as easily have been the other way around), and other catastrophes (floods, fires, getting run over by a truck, etc.) are certainly possible.

And it's not the financial cost of replacing it that gets you, even if you are a grad student who would have to decide between replacing the computer and paying your next three months of rent. It's the loss of personal and/or business data: e-mails, financial data, photos, your thesis, your novel, your plan to form a start-up company which (with a suitable combination of luck and skill) will make you a multi-gazillionaire, etc. Files which, if you don't have backups, will take days to years to replace or reconstruct, if you can replace or reconstruct them at all.

I use three different external drives to back up my photos - this story astounds me, even though I've heard similar things in the past.

1) One of the vice presidents at my school sent out notes saying she'd lost her thumb-drive, WITH THE ONLY ELECTRONIC COPY OF HER WORK - IN PROGRESS DISSERTATION ON IT

2) One of my fellow stat grad students lost the drafts of his dissertation (early 80s) in a fire in his apartment

Did these two work in a vacuum, or were they simply paranoid about storing data and having someone else see it?

In the "trivially easy if you don't need to back up much" department:

Just upload your damn dissertation to Google if nothing else.

If that doesn't work for you, consider something like this:
I run a unison server on my home server. $DAUGHTER keeps drafts of her dissertation, publications, etc. (the stuff she really can't afford to lose) in folders on her laptop and on her office desktop which are synchronized to a folder on her account on my server. Net result: she has three copies of everything important (plus backups) and as a side benefit they're all synchronized so it doesn't matter which system she's using.

And she doesn't have to remember.

as a comp bio - I load everything directly from our servers. My computer is for pet projects where I don't need to backup things all the time.

But all my big things are done on a server and version controlled. I think more scientists should consider it...

My primary backup protocol involves frequently transferring data between my laptop and desktop. So if I lose something, it's only very rarely something I miss significantly.

This turned a hard drive crash I had last year from a devastating situation into a merely annoying one. I lost some stuff, but nothing terribly important.

I recently ordered four 2TB USB hard drives for our lab, at $99 each.

There's just no excuse.

Even better, learn to use version control systems and put everything in git/svn. This way you can:

1) Synchronize data between several computers.
2) Have distributed backups.
3) Keep the full history of everything.

Seriously, try it!

Oh christ, git. That's just a mindfuck. Just use Time Machine or DropBox or something similar.

Well, it's pretty easy once you know how it works inside.

Dropbox has a not-very-nice side effect - if you accidentally delete a file on one computer, DropBox will dutifully delete it on other computers.

And Time Machine is not available on Linux/Windows.

With Dropbox, you can recover old versions of files (which includes restoring deleted files). There's a "show deleted files" button at the top of the file lists on the web interface, and if you click the down arrow that appears when you hover over a file, there's a link that will let you restore previous versions. They only keep 30 days of history for free accounts, though, so will have to notice the problem relatively quickly. (I'm pretty sure that if you pay for extra storage, they keep changes forever.)

If you're at a university, make use of the storage they provide with your user account! Presumably it's very well backed up. At mine I have 10 GB of space, and the filesystem has a .oldfiles directory from which I can pull a copy of any file that I've accidentally deleted or screwed up.

The equivalent of Time Machine for Linux is called rsnapshot. I used to use it on my mac before Time Machine existed.

I second the suggestion about version control. If you're a researcher (or anyone who stores a lot of important files and changes them often), it's well worth your time to learn how it works. It's not that hard.

How timely, as I just upgraded my backup system to 2 terabytes!

It consist of three hard drives: two are mirror-images of each other via a RAID array, and one is in a hot-swap bay. Every few weeks, I yank the one in the bay, head to a family member's house, swap that one for the third hard drive, which they've been holding onto for me, and slap that one back into the computer. It's quick, easy, and impervious to fire or flood.

HJ Hornbeck

Simple way to back up reasonably small files: email them to yourself. I do this with my grades databases and drafts of works in progress.

I wrote a few articles on my blog on back-ups for similar reasons. Itâs appalling how many people have valuable data on their machines, but they don't back it up properly, or at all.

One point to add to your post (I haven't read the other comments, sorry): you want an off-site backup, really.

Time Machine, good as it is, isn't the best for that. Time Machine has a couple of other weaknesses I've mentioned on my blog. (See link on my name if it matters to you; excuse the pimping but it is an important issue as ERV points out. I have to admit I never got around to the final post in that series!)

I recall a night near the end of my honours year, when, pissed as a newt at 3am in a pub, I realised that the entirety of my thesis (draft, data, etc.) was on a dodgy old USB in my pocket and nowhere else.

I made half a dozen backups the next morning...

I usually keep two back ups of everything academic related. However, they are all physically on site. So if my apartment were ever broken into/caught on fire I would be screwed.

Thanks for the cautionary tale, I really hope those two manage to pull a happy ending out of this somehow.

I use carbonite. It takes the lazy factor out of things

That totally, totally sucks. But yea, I can't believe they left the single computer with everything on it in their car! Our campus was also hit with a string of work laptop thefts in our actual buildings. Thieves managed to break in one academic's office and several unlocked phd offices..including mine. yup, computer lock didn't deter them either, they pulled mine right out of my laptop took off with it. Scary thing was, I was only away from my desk for 20 minutes, I popped over to the lab around the corner.

I don't think you can back up enough. That same week my laptop was stolen my back up external harddrive ate a virus from a shared computer and died. Luckily, I'd been emailing my supervisor parts of my thesis and made hard copies for him of everything. If your dept has its own server, this also makes it easy to back up files. And Dropbox sounds like a brilliant idea!

Backing up as we speak (doubly reminded of the need by a friend whose cat knocked a cup of tea all over her Macbook...)

Heh, congratulations. While everything that would really, truly be missed is already backed up remotely you've inspired me to get out my new external hard disk and back up a few good-sized encrypted file containers I don't particularly want stored remotely as well as the contents of the who-knows-when-it'll-fail reused D drive on my old desktop tower.

+1 for Dropbox. I've been using it for a while now and it has saved me from my own stupidity more than once.

Learn from your developer comerades. Use version control. Use a central repository on shared hosting*. Update your research, then commit it to the repo. Rinse repeat.

Seriously. I am physically incapable of losing my work these days, and I owe it to version control.

* Dreamhost, for example, will host SVN repos as well as any web content you need for like $10/mo - and I'm almost certain that your uni would be happy to set something up.

@Tom H:

What's a mind fuck about Git? Right click -> commit to your local version -> push to the central repo.

I agree that if you're not coding, you don't really need the local repo (so SVN's a cleaner option), but if you want to be entirely unable to fuck yourself, you need versioning.

With the ext. hd or hard drive you can still lose your data if any storm came up. I've backed up my data online for a lengthy time now. The so called hard drive crashed on me terribly but I stayed still because my data was safe and secure. It took me effortless to retrieve back my stuff in just a click. I am so thankful to that saved me money for all this long. long live Safecopy.

I am not a scientist (IANAS) but I am wondering why some of the data can never be replicated again.

My wife (who is involved in grants for researchers) wondered this as well:
Research is a bit different than business. It's publish or perish. If they haven't been publishing their findings over the years so their work can be peer-reviewed there is no way they are going to get further funding. Sure you can conceal proprietary information but saying 'we're this close" without the data to back it up won't get you the big bucks. I know private labs are different than an University environment but I would think they'd have to make regular progress reports with detailed results to continue drawing a paycheck.

We are in Canada, so we don't know the difference between how research in the US and here works, but they must have some of their data published.

These are all good suggestions and it is certainly a concern in my profession (monster) with the bonus concern of maintaining privilege.

We encrypt and back up (including all programs and operating systems) on a 2T master external twice a day and for added fun we keep a set of encrypted passport externals in a little Faraday cage we dump a back-up on once a week.

Everybody laptop uses 18g thumbs which hold their forms and working file data and unless they are working, it stays in their pocket or gets duped to the master where it gets encrypted and copied to the above described protocol.

I guess we get the overkill award but we can't store and encrypt online until we get a go ahead from the bar association.

I regularly receive and send data files via email from couple of my intrepid tableteers from which I strip of metadata before they go in and out of the system.

It is all cheap and once you are in the habit it is easy.

Hell, thanks to a defunct microwave with everything but the ground yanked out of the chord we are covered for atomic detonations and solar flares.

Laptops are notorious for crashing right after you've finished your thesis, so I definitely wouldn't store a lifetime's worth of research on one. This has happened to more than one friend. Dropbox is a good solution to this (but not for massive amounts of data.) It is how I kept my thesis on many computers at once. It is super easy and you don't even have to think about it. Poor people.

Although institute and company servers do keep backups, I wouldn't trust that for exceptionally important things: keep your own backup somewhere off site. The IT department sometimes fails.

No excuses. You can use DVD, CD, external hard drive, or USB sticks to back-up your data.
You can have a (free) online back-up account @ Dropbox, Mozy, SugarSync, Ubuntu One, ...
You can subscribe for a (free) online subversion, CVS, or git account when you need version management.
I hope they get their laptop back, but in this day and age you have no excuses for not backing up your data.
I learned it the hard way too, by losing some data when my hard drive failed. Taught me to take more backups.

Wow, there are people who still work in local. Amazing.

Do scientists not use SVN or similar?

Okay everybody has written about svn already. Sorry. Didnt read the comments before posting mine.

#7 "as a comp bio - I load everything directly from our servers. My computer is for pet projects where I don't need to backup things all the time. But all my big things are done on a server and version controlled."

That's the only right way to do things.

My data is backed up at least three ways: on a local backup drive which runs weekly, the RAID system in the basement of my building, and the departmental RAID system which I believe also connects to an online service. Then there is the local copy. Civilization would probably have to collapse before I lost more than a few hours' work.
This must have been a systematic failure, and frankly if they can't get this data back they've wasted a lot of taxpayer money and I don't feel sorry for them.

Even better than an external hard drive, use a program like Crashplan+ to back up an unlimited amount online for about $3/month.

Even if you have a fire your data is safe and you won't be able to forget to back up since it happens in real time. I think every scientist (and probably many other people) should be forced to use such a system by their employers.

My wife is writing her dissertation. She makes daily, weekly and monthly backups. About this time last year, her hard drive crashed, with no way to retrieve the data (she probably would have finished a major revision the same day it crashed). "No problem," we thought, "we've got all those backups." Except it turned out that the daily and weekly backups only saved the first layer of files; anything that was in a sub-folder didn't get save. And the monthly back-up? Well, it will only restore the hard-drive that created it. Won't work with any other hard-drive. We had to go to when she was swapping files from my laptop to hers. Some of you may recall that I guest-blogged for Ed last month. She finally caught up to where she had been the week before I started guest-blogging (in fact, my posting schedule got thrown off by helping her copy-edit).

It's not enough to back-up. You need to make sure the data is retrievable.

Isn't there a requirement to record data in a lab notebook that is maintained on site? The article states they don't know the serial number. If they were using a university computer, this data would be available (and likely the ability to backup). Why are they recording all data on a personal laptop?

@28: We are in Canada, so we don't know the difference between how research in the US and here works, but they must have some of their data published.

Well, IAAS, a physicist, in the US and I can tell you that standards for how data is handled vary WIDELY from PI to PI even within a university department. And the data you publish is often only the tip of the iceberg. Not only that, I've known advisers who have their students publish AFTER the dissertation, and of course you don't need to have been published at all to write a dissertation.

Much of the raw data I collected for my Ph.D. work was image stacks from a confocal microscope, which of course can't be put into a lab notebook and can't be published in a paper (are you going to read a 100GB paper?), the data I publish is simply plots of things I've abstracted from those image stacks, with fitting parameters, and it would impossible to reconstitute my raw data from it. Other kinds of data I would collect were IR spectra. And of course you can plot a spectrum and publish it but you can't easily convert a plot back into the numerical data.

For my master's I was doing Monte Carlo simulations, and of course my raw data was huge output files of atomic positions and such. What I published was abstracted from that.

There's huge amounts of data that never see the light of day and maybe only one or two people EVER look at it.

Of course people should be backing up their data, and PIs should have some sensible of organizing it. And while it's fine to have data on a personal laptop it needs to be in other places where it can backed up and found by people who might need to look at it.

I use Carbonite, Jungle Disk (cause it works on Windows and Linux), and Dropbox. Those are my three cloud-based backup destinations. Ubuntu One would be find except it's not encrypted.

At home I use an external hard drive with auto-backup software (for the primary hard drive), just in case. I also have a second external hard drive for my work laptop (which my company backs up auto-magically anyway), and I routinely store my currently-in-use critical files on a flash drive, just to be handy. And sure.

I'm paranoid, maybe, (go ahead and say it) but I've lost stuff in the past. I hate losing my work, so it's "No Single Point of Failure". Ever.

There now. I feel better. :)

Even if its not a laptop, back it up. Several years ago we realised that our server, whose main purpose was data backup, had about a terabyte of data that wasn't backed-up elsewhere. Mid-way in making a mirror of our server we had a power failiure, resulting in corruption of our server. Long story short, $18K later we recovered the data...

...backup, and save money. External HDD, NAT, or eve a dedicated server is a heck of a lot cheaper than data recovery. And if your computers stolen no recovery is possible

It's not enough to back-up. You need to make sure the data is retrievable.

THIS. The State of Alaska learned this the hard way when they found out their regular backup tapes were B L A N K. Every IT department should be REQUIRED to do a test retrieval of backups once every 3 months. Anything less is just asking for trouble.

Personally, I use two 500 GB portable drives, and leave one at work and one at home. I'm mostly backing up images, so as soon as I download a set of the camera's SD card, I copy it to the local external drive, and then rotate drives every few weeks. Oh, I also never delete the SD cards. Why bother? With 4 GB SD cards as cheap as they are, it just makes more sense to NOT erase them and keep them as a third, emergency backup mechanism.

