The Book of Trogool

OA publishers: just use HTML!

I was reading the latest issue of the Journal of Digital Information today, and I found myself wishing I could turn the Readability bookmarklet loose on half its PDF-only articles.

I’m sorry, authors. I know you tried, but those PDFs are terrible-looking. Times New Roman, really? (The one in Arial is the worst, though.) Could we discuss your line-height and why it’s not tall enough? Line-length, and why it’s too long?

Sniff at me for an ex-typesetter if you like (I am an ex-typesetter, as it happens), but the on-the-ground reality is that I didn’t read as much of those articles as I’d have read if they were, you know, readable. As for JoDI, their lack of a consistent look damages their brand and their credibility among their readers. Like it or not, centuries of print journals have created certain expectations for the quality of typesetting in a PDF.

So what’s a shoestring open-access journal that can’t afford professional typesetting to do? Believe you me, this is a common and vexing dilemma. It’s not as though authors will lift a finger to make a publisher’s production or branding job easier, as JoDI trenchantly demonstrates.

My answer: If you’re not going to put effort into typesetting, chuck PDF. HTML is where it’s at for you. Embrace the Web and its pitifully low standards for typography.

This is, of course, easier to say than to do. It does still take more technical savvy to produce decent HTML than to produce a bad PDF from the most typical manuscript formats. Making a print CSS stylesheet for your journal?which is also a good idea, to avoid grumbling from the print-dependent?is also eggheady. If your subject area is math-heavy, you have an entire new suite of problems.

On the whole, though, it’s much easier to produce good HTML than good PDF. Moreover, bad PDFs are essentially irredeemable; there’s nearly no way (and definitely no easy way) to reflow, re-typeset, or otherwise reformat them. If you go the HTML route, as your skills improve you will (trust me!) learn to fix your bad HTML, and if your content-management system is any good, you’ll be able to go back and fix your old articles in a decently automated fashion.

As you rebrand your journal and its look and feel, which you eventually will unless and until the journal dies, you get a bonus: automatic rebranding of your old articles! They never have to look out-of-date, as old-school PDFs often do.

For those of you who have hopes of sending your journal to PubMed Central, there’s an even more compelling reason to stick with HTML: PMC demands NLM XML, which you have no hope of producing straight from PDF. (From your typesetting format, perhaps, but you have to know what you’re doing.) The skills you will learn from making HTML will transfer. PDF, not so much.

I admit that part of my reason for writing this is that I am hopelessly in love with the Readability bookmarklet and wish I could use it in more contexts. (I can’t read Emerald or Informaworld HTML without it.) Still, my advice is heartfelt and I believe it’s good.

I don’t even have to use the Readability bookmarklet to read the code4lib journal. Just sayin’.