What is the impact of discovery tools on researcher self-archiving behavior?

This is the question I was asking myself while reading this fairly straightforward paper on open access in high-energy physics (hat tip to Garret McMahon).

It's impossible to be in my particular professional specialty and not know about the trajectory of self-archiving in high-energy physics, but I learned a smallish detail from that paper that intrigues me rather: the existence of SPIRES, a disciplinary search tool that covers both the published literature and gray literature such as preprints on arXiv.

This strikes me as a rare thing. We have disciplinary gray-lit search tools such as RePEc in economics, and we have no end of disciplinary published-lit search tools (despite the considerable expense of securing access to them), but tools that do both? Within a given discipline? I'm not a reference librarian, so discipline-specific search tools aren't my specialty at all, but I can't think of anything else on the SPIRES model. There's WorldCat and Google Scholar, of course, but neither of them is discipline-specific. EBSCO is known to index some library blogs for its library-science databases, but they don't touch DList or E-LIS as far as I'm aware. Law might have some interesting things going on, given the novel importance of blawgs, but I don't know of anything firsthand.

SPIRES makes me wonder, it really does. Imagine you're a high-energy physicist (take that in either sense or both!). You search SPIRES; you know all your colleagues do, too. You have two ways to get your work in SPIRES so that it's in front of their eyes: pop a preprint on arXiv, or go through the slow process of peer-reviewed publishing, a process that you don't believe will change your paper much.

This is not the narrative that one typically sees regarding high-energy physics and self-archiving. It's usually seen as a continuation of a print-culture norm of circulating preprints individually by mail. Still… I wonder.

What is the relevance of this little idyll to research data? This: If data are not indexed where researchers expect to search for disciplinary materials useful to them, will data be used? Taken seriously? Cleaned up and placed online in the first place, even? "Discoverability" of data, in the broad sense of "availability to web search," may not be enough. Discoverability through discipline-appropriate channels, alongside other trusted materials, may well be the key.

Or so it seems to me.


More like this

I am not a physicist (though sometimes I regret that), but I do have a lot of friends who are physicists (that is, at minimum have undergraduate degrees in physics; only a handful are practicing physicists, though several got PhDs before jumping ship).

The impression I've gotten from them is that arXiv is special. In particular, I've gotten this impression because they talk about arXiv without anyone having asked them. It's part of the culture deep down, even at the undergraduate level and even some years ago. It's part of the standard workflow for how physics is done.

So I don't think arXiv would need a special indexing tool, integrated with discovery for more traditional sources, to be used by physicists. But I also think it is odd in this regard and that having a discovery tool that spans both -- especially if its layout makes preprints (or similar) and peer-reviewed articles display on equal footing -- would in general dramatically improve both use and cultural currency for non-published sources.

I think there are pros and cons to this.

Yes, I agree; arXiv is part of the culture!

My question is more "how did it get to be that way?" I'm wondering whether the existence of SPIRES made a difference.

ADS is another literature search archive covering essentially all publishing in astronomy.

but the literature in SPIRES or ADS are anything but weakly coupled to actual data. no real discovery can take place. The 2nd most cited paper of 2008/2009, Komatsu et al. 2008, cited over 1600 times in the past 1+ years, contains no actual data. nor do SPIRES or ADS link to the WMAP data.

This latter circumstance is likely because these data archives have no curation efforts to re-establish the data-literature links broken by the author/journals. one example in astronomy where such curation does take place is HST.

You say If data are not indexed where researchers expect to search for disciplinary materials useful to them, will data be used?. I agree that this is important for within-discipline reuse, but the Australian National Data Service is focussed also on re-use between disciplines. For this (re-)use-case, just targeting disciplinary search engines won't be enough. That is why our initial discovery service is building richly interlinked web-pages for spidering by web search-engines, so that the data shows up in the places that researchers look for everything, including disciplinary materials. The trick is going to be seeing if our pages rank highly enough for people to find them - check back with us early next year to see!

Good point. Cross-disciplinary discovery is a whole 'nother can of worms.

I suppose I'm just wondering what will break the logjam of data citation/data reuse/credit for data in tenure hearings/researchers valuing data as it deserves. (There are echoes of the same problem in OA as well, to be sure; it's a great day when an OA journal gets indexed by one of the major disciplinary indexes!) If we could try to tease out what the contribution of SPIRES was to arXiv's success, we might know whether it's worth the effort to get datasets indexed.