The Book of Trogool


Unconnected incidents are making me ponder questions of sustainability. I don’t have any answers, but I can at least unburden myself of some frustrations!

I learned from a colleague that arXiv is looking for a new funding model, as Cornell is wearying of picking up the entire tab. Various options are on the table, and I’m not competent to opine on their feasibility. I’m more interested in the larger question: how are we, we libraries and we researchers, organizing to shoulder the burden of electronic archives, especially open-access ones?

Historically, the answer has been “not effectively.” I can name scads of dead digital projects without having to think hard, and I daresay you can too. This is no longer an acceptable answer, if indeed it ever was. I’m just a little bemused and worried about the models that seem to be emerging?again, especially for open-access archives. If Cornell can’t underwrite arXiv, arguably the most successful preprint archive ever, what does that mean for disciplinary repositories generally? (See also the move of OAIster from the University of Michigan to OCLC.) What does it mean for library support of open access, to data as well as documents?

There’s more in those questions than I can unpack in a single post, so suffice it to say I think librarianship’s stance on this question is a bellwether. Are we tethered to the past or working for the future? Are we really memory organizations, or are we only memory organizations for print? Will we pay for human access to knowledge, or only institutional access?

So that was one thing. Here’s another.

In the course of my regular work, I had occasion to look for a long-term home for an item originating outside my institution. This, you see, is one peril of running an institutional repository; the mission is strictly constrained to materials originating (in some fashion) inside the institution. No matter how amazing that item was, I can’t make an easy case for accepting it, and I may not be able to make any case at all.

So, all right, there may be an appropriate disciplinary repository somewhere. I went looking (ROAR, OAD, and the Goog) and found two possibilities. One restricted depositors by geographic origin; I sent it back to my correspondent, as I didn’t know the origin of the ultimate requester.

The other? well, the other is the reason I’m being cagey around identifying the requester and the discipline. The other appears to have been hacked up by two people in their spare time. The two people got a couple of publications out of the attempt?and then they abandoned the repository; it hasn’t seen any action in well over a year.

I have no words for this irresponsibility that are printable. The word “repository” gets kicked around a lot?it’s not my favorite word either?but responsibility and sustainability are, I believe, two concepts commonly associated with it. Whomping up a repository on a lark and then leaving it to die is a betrayal of trust. I don’t approve. Worst of all, this so-called repository is essentially cybersquatting; no one else will take another stab at making a home for materials in this discipline while the repository is still (however marginally) extant.

This is nothing I haven’t said before:


When even scholars wanting to do the right thing and hand off their work to a responsible party cannot find anywhere to go, when enabling digital communication and the preservation of its results is an altruistic act in libraries instead of the bedrock of our mission, when worthy digital projects die because we in libraries do not notice and reach out to them, when we ourselves can’t see our way clear to sustaining digital materials? we have a serious systemic problem.


  1. #1 Gray Gaffer
    August 6, 2009

    As I have commented elsewhere, we are currently in a Dark Age, by the definition that it leaves no decipherable records. But this post adds another dimension to the problem: not only is our technology leaving documents behind, connecting the documents with those who need them is also part of the problem. A document may not be readable because the technology for doing so has been obsoleted. It is also unreadable if it cannot be found.

    A big danger today would be an uncontrolled proliferation of repositories. The example given is a prime one.

    As far as I can see, the solution has to be to maintain hard copy repositories of everything, in parallel with digitized, searchable, and collatable, versions of the collections. Acid free archival quality hard copy, or controlled environment rooms. Redundantly replicated across different civilizations. With reader tutorials, from “See Spot Run” on, so our far descendants have a fighting chance of recovering their forebears’ store of knowledge and technologies.

    So one task might be: how to make technology-independent or agnostic repositories with archival lifetimes, other than paper? Also cheaper and smaller? And obvious as to function?

    All of which comes down to funding. Somebody has to pay for it. And it should not be from annual political budget committees. We are talking the lifeblood of our culture here.

  2. #2 Dorothea Salo
    August 6, 2009

    I hate to use the word “impossible,” but I’m afraid your vision is. The volume is utterly unsustainable to reproduce in print, especially in multiple places.

    Moreover — and it troubles me how many people do not realize this — some digital artifacts CANNOT be acceptably reproduced in print. I was talking to a bioinformaticist this afternoon about three-dimensional representations of microscopic images of living tissue. This is simply not printable!

    Print is not the solution to digital preservation. It cannot be.

  3. #3 Joseph Thibault
    August 13, 2009

    I’ve been working with course designers at a college recently helping in their effort to make an online program for sustainability (of the environmental kind) and we basically had a similar conversation:

    That is that there are too many options for them to host their OER but none seems to be independent or necessarily in it for the long run (they’re all created for single purpose, single location or fail to identify their means of staying ‘in business’ indefinitely).

    The world got it right (somewhat) with the seed bank (aka dooms day vault) so it’s not impossible I agree.

    I’m curious though what the optimal repository for information (digital) would look like? How could we sustain it? Who would manage it, develop it, continue to innovate with it? The implications on advancing access to educational content (OER) could be huge…but at the moment this vital information is fragmented, and like you say, unsustainable.

    In a sense, the Internet (capital I) is the worlds repository of information, but it’s as fickle as the world’s population (ever increasing, ever creating new and retiring old).

  4. #4 Dorothea Salo
    August 13, 2009

    Great comment; thank you!

    I don’t know that there is or can be one single optimal repository. Just building and maintaining appropriate user interfaces for wildly diverse types of content makes my brain hurt. (It’s a productive sort of hurt, and it’s informing a talk I’ll be giving in October at Access 2009, but pain is pain!)

    The problem of Keeping Digital Stuff parcels out into a lot of parts. In all honesty, just making people consider questions of sustainability AT ALL when new-project stars are in their eyes would be a tremendous step forward. It’s starting to happen, but it’s awfully slow.

  5. #5 charlie
    August 13, 2009

    In an ideal world where all academic articles were published under CC licenses encoded with rich metadata, and everyone in a field is on the same P2P network with a 1 terrabyte drive to store and share documents, institutional and other monolithic repositories would no longer be necessary and would be seen as a very archaic method of document preservation (and distribution).

    I chose the “1 terrabyte drive” amount because any one can buy one now for under $100, and the P2P technology already exists. It’s only the controlling, possessive mindset of providing online resources which are free to access and not yet free to be shared which constrains us; otherwise, we could already have distributed scholarly archives that would greatly surpass what any one institution or organization could achieve on its own.

  6. #6 Dorothea Salo
    August 14, 2009

    And when someone’s 1-TB drive fails?

    Or someone changes institutions, or retires?

    Or someone’s data are incomprehensible?

    Or someone remembers that someone else had data, but she isn’t sure who or where?

    “Rich metadata” doesn’t come out of a vacuum or a magic wand, either.

    If Book of Trogool accomplishes nothing else, I hope it complicates the notion that this problem is as simple as networked disk.