The Book of Trogool

Tidbits, 20 November 2009

Have some Friday tidbits!

  • An important biology dataset is losing NSF funding and may fold. Nor (as the article explains) is it the only one. It is impossible to overstate the desperate gravity of the data-sustainability question. Academic libraries, if we are not the white knights here?and we certainly have been in the past; witness arXiv?who is?
  • On a similar theme, Yahoo pulls the plug on GeoCities. O ye researchers relying on consumer-grade web services, or new startups, have an exit strategy! Consumer-grade services die when they lose money. Jason Scott may not come charging to your rescue.
  • H1N1 science depends on a public database of flu immunity data. “As the researchers acknowledge in their paper, the work couldn’t have taken place if it weren’t for extensive data sharing within the community of flu virus researchers.” Data sharing makes possible better, faster science.
  • Data and the journal article. First: if you are saving your data as PDF, stop it. Second: as I suggested to Chris on FriendFeed, there’s a serious structural issue with expecting journal publishers to cope with appropriate data archiving: by the time a researcher chooses a journal to publish in, all the decisions about data gathering and representation have already been made?and they may well have been made badly. The poor journal publisher can’t go back in time and fix bad decisions! In our not-yet-standardized data age, early data interventions have to happen close to the researcher, which to me means they need to happen at the institution where the research happens.
  • The need for clear data licenses. I haven’t talked about data licensing here, partly because the current state of intellectual-property law makes me sick at heart, but there’s no question that it’s an important piece of the data puzzle.
  • Peer-to-peer technology used for the forces of good: BioTorrents. Datasets vary in size; for the large ones, network latency becomes a sharing problem. Torrenting won’t precisely solve the problem, but it certainly increases the size range within which datasets are portable.
  • Fascinating data project of the week: National Center for Ecological Analysis and Synthesis. What caught my attention is that as I read the project description, it takes public data sharing for granted. NCEAS researchers are not generating data; they are mining existing data. I’m inordinately curious about the disciplinary culture that makes this a feasible thing: what price scooping?

Whew. I have a lot more, but it’s Friday.