Electronic Publication and the Narrowing of Science and Scholarship by James A. Evans, ironically behind the paywall, has got a lot of people scratching their heads – it sounds so counter-intuitive, as well as opposite from other pieces of similar research.
A commentary at the Chronicle of Higher Education is here, also ironically behind the paywall.
Here is the press release and here is the abstract:
Online journals promise to serve more information to more dispersed audiences and are more efficiently searched and recalled. But because they are used differently than print–scientists and scholars tend to search electronically and follow hyperlinks rather than browse or peruse–electronically available journals may portend an ironic change for science. Using a database of 34 million articles, their citations (1945 to 2005), and online availability (1998 to 2005), I show that as more journal issues came online, the articles referenced tended to be more recent, fewer journals and articles were cited, and more of those citations were to fewer journals and articles. The forced browsing of print archives may have stretched scientists and scholars to anchor findings deeply into past and present scholarship. Searching online is more efficient and following hyperlinks quickly puts researchers in touch with prevailing opinion, but this may accelerate consensus and narrow the range of findings and ideas built upon.
For now, let’s see what others say:
* It’s hard to say much based on a newspaper summary and a press release. But at first glance, Evans’ results conflict with the many studies showing that OA articles are cited significantly more often than non-OA articles. These studies differ from one another on how to explain the correlation between OA and increased citation counts, but they agree on the correlation. However, there may be ways to reconcile the two sets of results. For example, authors may cite fewer articles when they have more to choose from, but they may still cite OA articles relatively more often than TA articles. Or the average number of citations per article may decline with the growth of the total number of articles accessible to authors, but OA articles might bring the average up, and TA articles might bring it down. Or the multiplication of ejournals may be narrowing the scope of the average paper, and therefore shortening the average reference list, but citations may be growing overall and the of citations of OA articles may be growing faster than the citations of TA articles. (On the other side, the Economist said that “the same effect applied whether or not a journal had to be paid for” –though without specifying exactly which effect.)
* Evans’ results also appear to conflict with a recent study by Arthur Eger, Database statistics applied to investigate the effects of electronic information services on publication of academic research – a comparative study covering Austria, Germany and Switzerland, GMS Medizin – Bibliothek – Information, June 26, 2008. Eger found that “a larger content offering coincides with a dramatic increase in Full Text Article requests, and an increase in Full Text Article requests, after about 2 years, coincides with increased article publication.” If Evans is right that “less is sampled”, then the two studies are definitely incompatible. But if look only at Evans’ conclusions about citations, the two studies may be compatible. Evans is saying that access to more literature reduces the number of different sources one cites, and Eger is saying that it increases (“dramatically” increases) the number of articles one requests or samples. Researchers may be viewing more articles but citing fewer. Are they using their enhanced access to browse neighboring topics? Are they exploring serendipitous discoveries, only some of which turn out to be citable? Does their wider reading help them zero in on citable research?
Brandon Keim asked (and commenters are answering):
What do you think, scientist and scholar Wired Science readers, especially those whose careers have spanned the jump from paper to screen? What have you gained — or lost — from the internet’s rise?
In other words, it is not the additional online access that this causing the change in citation behavior but the tools that accompany the online access — tools that allow readers to link to related articles, rank by relevance, times cited, etc. It is these tools that signal to the reader what is important and should be read. The result of these signals is to create herding behavior among scientists, or what Evans describes as consensus building.
A highly-efficient publication system can come with unanticipated consequences — the loss of serendipity. In an earlier blog post, we discuss how the Internet is changing reading behavior in general, reducing the depth of inquiry. In another blog, we discuss how signaling can help readers save time.
Evans brings up a few possibilities to explain his data. First, that the better search capabilities online have led to a streamlining of the research process, that authors of papers are better able to eliminate unrelated material, that searching online rather than browsing print “facilitates avoidance of older and less relevant literature.” The online environment better enables consensus, “If online researchers can more easily find prevailing opinion, they are more likely to follow it, leading to more citations referencing fewer articles.” The danger here, as Evans points out, is that if consensus is so easily reached and so heavily reinforced, “Findings and ideas that do not become consensus quickly will be forgotten quickly.” And that’s worrisome-we need the outliers, the iconoclasts, those willing to challenge dogma. There’s also a great wealth in the past literature that may end up being ignored, forcing researchers to repeat experiments already done, to reinvent the wheel out of ignorance of papers more than a few years old. I know from experience on the book publishing side of things that getting people to read the classic literature of a field is difficult at best. The keenest scientific minds that I know are all well-versed in the histories of their fields, going back well into the 19th century in some fields. But for most of us, it’s hard to find the time to dig that deeply, and reading a review of a review of a review is easier and more efficient in the moment. But it’s less efficient in the big picture, as not knowing what’s already been proposed and examined can mean years of redundant work.
The greater availability of research papers in recent years thanks to electronic publication (and open access) should broaden and not narrow the papers that we read and ultimately cite in our own publications. But looking at my own behavior when reading papers or writing a publication, and thinking about many discussions we had on related topics, these findings make perfect sense.
Today’s technology allows us to make the distribution of scientific papers in electronic form very efficient, and thanks to this technology we have new business models (author-pays) and an ever-increasing number of journals. Access to research articles is now easier, cheaper and for a broader audience than in ever was before. This is of course a wonderful development, but unfortunately creates a new problem: information overflow and how to filter out the relevant information.
Twenty years ago the typical researcher would use the personal or institutional journal subscription to regularly follow the important papers in his field. Index Medicus and Current Contents were used to find additional articles, but they were cumbersome to use. Today few researchers regularly read printed journals. Most papers are found by searches of online databases and by subscriptions of tables of content by email or RSS. There are many clever tools to facilitate this, but most people probably are overwhelmed by the information and stick to some very specific research interests and high-profile journals.
In any case, the study highlights two complementary strategies in information retrieval: finding relevant papers by targeted searches versus staying informed on a broad range of topics by systematic browsing. In our Google-driven era, we may have the tendency to forget the importance of good old-fashioned ‘table-of-content-skimming’ to stimulated cross-disciplinary thinking, widen our horizon and cultivate scientific curiosity.
Perhaps it is a specificity of printed media to provide “poor indexing” and therefore enforce broad exposure to unrelated areas of research. On the other hand, some web technologies already help to browse through vast amounts of online publications (for example an RSS aggregator helps me to generate a daily literature survey; this can be further combined, for example here at Frienfeed, with other community-centered feeds; other aggregators highlight information by automatic clustering: Postgenomic and Scintilla). However, these tools remain imperfect and, in our reflection on the future of scientific publishing, we will need to find the right balance between the two strategies above and think of how the increasing efficiency of searching engines can be complemented by means providing continuous exposure to diversity.
Bill Hooker does the most detailed analysis of the paper so far (so click and read the whole thing, graphs and all):
What this suggests to me is that the driving force in Evans’ suggested “narrow[ing of] the range of findings and ideas built upon” is not online access per se but in fact commercial access, with its attendant question of who can afford to read what. Evans’ own data indicate that if the online access in question is free of charge, the apparent narrowing effect is significantly reduced or even reversed. Moreover, the commercially available corpus is and has always been much larger than the freely available body of knowledge (for instance, DOAJ currently lists around 3500 journals, approximately 10-15% of the total number of scholarly journals). This indicates that if all of the online access that went into Evans’ model had been free all along, the anti-narrowing effect of Open Access would be considerably amplified.
In fact, the comparison between print and online access is barely even possible when considering Open Access information. The same considerations of cost — who can afford to read what — apply to commercial print and online publications, but free online information has essentially no print ancestor or equivalent. Few if any scholarly journals were ever free in print, so there’s a huge difference between conversion from commercial print to commercial online on the one hand, and from commercial print to Open Access on the other.
Indeed, I would suggest that if the entire body of scholarly literature were Openly available, so that every researcher could read everything they could find and programmers were free to build search algorithms over a comprehensive database to help the researchers do that finding, then in fact the opposite effect would obtain. Perhaps it’s true that the more commercial online access you have, the less widely a researcher’s literature search net is cast, but as I mentioned above I see no reason to attribute that more to the mode of access than to its cost.
Perhaps with greater accessibility, people have quit citing old papers which they used to cite just because everyone always cites those papers without even reading them. Those who have the least access, tend to cite very old stuff, textbooks, popsci articles, e.g., these guys. Those who have good access can both browse and search and find what is truly relevant to their work. They cite only stuff that they have actually read and found useful. Perhaps people are just getting more honest.