I hear talk about “the cloud” as the solution to research data curation. Data will waft softly up into “the cloud,” and “the cloud” will look after it and give it back on demand, and there will be unicorns and rainbows and rainbow-colored unicorns, and?well, you get the idea.
I think this is bosh. Balderdash. Bunkum. But I also think it’s worth unpacking why this is a popular and recurring idea, because there’s the germ of a service design in there.
“The cloud” means a lot of things to a lot of people, but for the sake of argument, let’s call it “third-party data-storage services” such as Amazon’s S3. S3 is not a solution for data curation. The service-level agreement amounts to “we can lose any of your data any time, and your only recourse might be a refund of what you paid us.” For unique, irreplaceable data, this is beyond unacceptable. Think it can’t happen? It already has.
As part of a well-managed storage and backup system, S3 might do. Might. But do you really want to design around its limitations?
However. Look up at the sky, if you’re lucky enough to be near a window. I’m guessing you see either no clouds at all, or a lot of them. More than one, at any rate. How many skies contain just one cloud?
Cost questions aside, what is it that “the cloud” promises that people want? Could those of us interested in data build that?
“The cloud” promises to make data storage secure, safe, and above all easy. Yes, I think we can do this, and I think we should. Fedora, IRODS, pick your poison?but big disk, taken care of invisibly behind the scenes, with lots and lots of ways to get data in and out?
We can do this. We should.