Filtering Isn’t the Problem

Via Twitter, Daniel Lemire has a mini-manifesto advocating “social media” alternatives for academic publishing, citing “disastrous consequences” of the “filter-then-publish” model in use by traditional journals. The problem is, as with most such things, I’m not convinced that social media style publication really fixes all these problems. For example, one of his points is:

The conventional system is legible: you can count and measure a scientist’s production. The incentive is to produce more of what the elite wants. In a publish-then-filter system nobody cares about quantity: only the impact matters. And impact can mean different things to different people. It allows for more diversity in how people produce and consume science. Thus, if you think we would be better if we stopped counting research papers, then you should reject conventional peer review and favor the publish-then-filter system.

The problem here is that countability isn’t the real core of the issue. That is, you can count blog posts or arxiv preprints just as easily as you count journal articles and conference presentations. And if you shifted to a model where blog posts and arxiv preprints are the primary mode of publication, you will very quickly find that the system shifts to simple counting of blog posts or arxiv preprints.

The fundamental problem here isn’t whether you filter pre- or post-publication. The fundamental problem that people have with the whole business of academic science is the existence of a system requiring some sort of quasi-objective comparison of scientists.

The problem is that the people who hire and promote scientists need a way to make those decisions, and in a world where there are vast differences between disciplines and even sub-disciplines, counting papers ends up being the default. Shifting to a social-media type publications system doesn’t change the need for a way to compare scientists within an applicant pool or tenure cohort. And when you need to start making decisions about hiring and promotion using social-media type publications as your primary channel, the people making those decisions will drift toward simple publication counting for all the same reasons that people drift that way now.

“But it’s impact that matters!” you repeat. Sure, but how do you measure impact? Page views? Citations? Those aren’t anything new– there are “impact factor” measurements for journals, and services like the Web of Science database or the Astrophysics Data System allow you to quickly and easily determine the number of citations of people’s journal articles. That’s part of the information that I had to supply for my tenure case, for example. If you like single numbers, there are even things like the h-index.

“But those are imperfect measures!” Yes, and? I haven’t seen any suggestions of measures of impact in more social-media style systems that would be any better. Things like pageviews and links and retweets are essentially random– if a big-name blogger links or retweets a link, that can hugely skew those measures. Citation numbers aren’t any more useful in the electronic world than the paper world.

The paradigm publish-then-filter system in my corner of academia is the physics arxiv. It does some very minimal filtering to weed out the obvious cranks, but other than that, it’s subject to the same vagaries of chance as traditional publications. Back at the Science in the 21st Century workshop a couple of year’s ago, Paul Ginsparg’s talk included a couple of really interesting graphs. The first showed the number of submissions as a function of time, with a gigantic spike at 4pm. The reason for this is that they send a daily email listing new submissions, in the order in which they were submitted, with the “day” starting at 4pm. As a result, people posting papers make a point — sometimes involving automatic pinging of the arxiv servers– of getting their papers in as close to the start of the new “day” as possible, so they’re near the top of that email.

The second graph showed a measure of the number of citations papers received as a function of their position on that list, showing that papers that appeared near the top of their daily email received significantly more citations than papers that appeared farther down the list. So, this system is every bit as game-able as traditional publishing, just in a different manner– if your citation total depends on the timing of your submission, is that really any better than having your access to a premier journal depend on the whims of a referee in the field?

(Assume, for the sake of argument, that we’re talking run-of-the-mill refereeing, not tremendously unethical behavior like spiking papers from rival labs, or using the referee process to delay publication of a result until you can replicate it. That sort of thing is a completely different issue.)

The problem people have with the system isn’t the timing of the filtering, it’s the need for the comparison. People get outraged about the vagaries of traditional academic publishing because it doesn’t seem right for careers to hang in the balance based on such an imperfect system. Non-traditional publishing looks like a huge improvement now, because the stakes are very low in that corner of the world. If it were to become the main channel of publication, then you would find careers hanging in the balance based on the slightly different imperfections of the new system, and people would become outraged over those.

And the thing is, in a world with limited resources, you will always need to make these kind of decisions. You can’t hire everyone who applies, so you need some standard for evaluation that everybody in the department can more or less accept. That will always be a problem, no matter what method you use to sort publications.

Lemire’s other points seem to me to suffer from similar problems, or even to contradict each other. For example, his first point includes:

While you will eventually get your work published, you may have to drastically alter it to make it pass peer review. A common theme is that you will need to make it look more complicated. In a paper I published a few years ago, I had to use R*-trees, not because I needed them, but because other authors had done so. When I privately asked them why they had used R*-trees, while it was easy to check experimentally that they did not help, the answer was “it was the only way to get our paper in a major conference”. So my work has been made more complicated for the sole purpose of impressing the reviewers: “look, I know about R*-trees too!”

But his point 4 is a complaint about the problem of the Minimum Publishable Unit:

You know how you succeed in science these days? Take a few ideas, then try every small variation of these ideas and make a research paper out of each one of them. Each paper will look good and be written quickly, but your body of work will be highly redundant. Instead of working toward deep contributions, we encourage people to repeat themselves more and more and collect many shallow contributions.

These seem to run counter to one another. A common variant of the “you need to make your work look complicated” complaint in my corner of academia is that you have to pack so much in to get a paper in Science or Nature that it becomes all but incomprehensible. But that’s the opposite of the Minimum Publishable Unit problem, which is stretching what is essentially a single measurement over several papers, usually in lesser journals.

Also, it’s not at all clear to me how social media style publications make the MPU problem any better. Quite the contrary– the lack of a referee-imposed threshold and the ease of publication would seem to encourage more incomplete publication, rather than less. That’s one of the maddening things about science blogs– too often, a blog post is not a complete working-out of an idea, but only a partial discussion. There are whole blogs that I’ve more or less stopped reading because they tend to be all teasers for more interesting posts to follow, only the more interesting posts never quite get here. This seems to me to map directly onto the MPU problem in academic publishing, so I don’t think that shifting to a social-media publication model will make any real difference in this area.

And again, while there are good things published via social media, they mostly come from people for whom the stakes are low. To the extent that there are deep and thoughtful things published via new channels, they seem to mostly come from people who have the kind of jobs that are designed to do that– either tenured academics, or people in think-tank sorts of situations where their whole job is to turn out deep and thoughtful analyses. They have the time and the security to publish that sort of thing, and to publish it anywhere they like– their job doesn’t depend on getting publications in time to demonstrate the qualities that will get them through a job search or a tenure review.

If we move to a world where blogs and preprints and that sort of thing are the basis for hiring and promotion decisions, I don’t think we’d find a huge increase in the fraction of deep thoughts that get published. We might even see a net increase in the number of half-assed results thrown out there for the sake of a publication. But that wouldn’t reflect a weakness of the medium, it would reflect the fact that people in early-career jobs need to produce something to get and keep jobs.

The problem isn’t the filtering, it’s the high-stakes evaluation.

Comments

  1. #1 Mike the Mad Biologist
    May 19, 2011

    “…it would reflect the fact that people in early-career jobs need to produce something to get and keep jobs.

    The problem isn’t the filtering, it’s the high-stakes evaluation.”

    Exactly, from a career perspective, publication, especially in glamour mags, is currency, a way of keeping score.

  2. #2 Jeffrey Toney
    May 19, 2011

    Good article.

    “Too often, a blog post is not a complete working-out of an idea, but only a partial discussion. There are whole blogs that I’ve more or less stopped reading because they tend to be all teasers for more interesting posts to follow, only the more interesting posts never quite get here.”

    I agree, and am guilty myself sometimes of quickly posting something that fascinates me to share with my readers. The allure of social media is the immediacy and the ability to inform and inspire; the risk is a relative lack of depth and time for reflection. The best bloggers are able to do both (I’m still learning.)

    With regards to:

    “The problem is that the people who hire and promote scientists need a way to make those decisions, and in a world where there are vast differences between disciplines and even sub-disciplines, counting papers ends up being the default.”

    As an academic Dean of a college of the sciences and mathematics, I certainly do not count papers per se – I evaluate each paper on its merits of scholarship, and yes, using other considerations such as impact factor. I do not include blogs or other non peer-reviewed works. Similar to many Universities, we evaluate faculty based upon excellence in teaching, scholarship and service. Blogs and other social media fit, I believe, well in the category of service in recognition of their important role in community outreach and public education – if well written and done responsibly. I hope that this addresses some of your questions.

  3. #3 Rosie Redfield
    May 22, 2011

    I want the research papers I see to have been pre-filtered; I don’t have time to do all the filtering myself.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.