When I wrote this post, I left out a whole second “trigger” because of time and energy.
That trigger–once again, wondering whether my humanities background (rhetoric major, math minor) leaves me simply unable to cope with the true Scientific Mind–regarded the format used for publication.
Or, to put it another way, the widespread and vehemently-expressed view that PDF sucks (to use a polite version).
What I saw, in several conversations, was a seeming demand from text-miners that everything must be in HTML (or, better, XML) so it was easy to mine, with a complete disdain for layout and typography as irrelevant. (I can only imagine Donald Knuth’s response to the concept that typography and layout don’t matter…)
Why some of us humanists use PDF
Because we care about typography. Because we care about the presentation of what we’ve written. Because PDF–and, of portable formats, only PDF–can assure us that the typefaces and layouts we’ve chosen will be rendered properly for the reader.
And because it’s easy–pretty much automatic on the Mac, and not difficult on the PC (there’s a free Office download to define a PDF printer; I use Acrobat because it produces much smaller PDF files and because it can combine many PDFs into a single file, but for 95% of users, the free download’s good enough).
Getting from there to HTML
So you want HTML? Make it easy. Actually, for Word2007, it isn’t bad: Save as Web page (filtered), and you get not-too-ugly HTML. (Since .docx is actually an XML package, it probably should be better than it is.) But you have to tune an HTML-version stylesheet if you really want to do both well–one that only uses “easy” typefaces, for example. It won’t be elegant HTML, but it will work.
But, even here, what’s in it for me? Can you demonstrate that I’ll get more money, more fame, or even significantly more readers by taking those small steps?
“It makes it easier for me to plunder your text for my own purposes” is not, I hate to say, a terribly convincing reason. It might be for you, but it isn’t for me.
Still…after years of doing only PDF for my own peculiar ejournal, I started doing Word’s filtered HTML for most essays, because it did seem to serve some subset of readers–and it didn’t add substantially to the production task. But whenever I read one of the HTML versions, I wince a little: It’s just not as good as the PDF.
Going beyond HTML
But, you know, I think you want more than HTML. I think you want semantics–XML or better.
Provision of good-quality HTML from a regular writing-and-layout stream is at least plausible, with no real extra effort on the part of the writers and editors.
Provision of semantics, though–that’s a huge additional effort, and I don’t believe it’s one that’s readily automatable for non-trivial instances.
Which magnifies the question: What’s in it for me?
I’m honestly interested in the answers. “Some neato research down the line that will earn someone else grants and tenure” may not be a wonderful answer. Just sayin’
Update, June 25, 2009:
Based on one comment (not here–ah, the multifarious conversational channels!) I should stress that, when I say “What’s in it for me?” I’m not suggesting that there are no reasons to use HTML. Of course there are. (Hmm. I’m writing this in HTML, because it suits blogging–and, unlike WordPress’ editor, this editor is pretty much raw HTML, other than automatic paragraph breaks.)
I’m suggesting that there are also legitimate reasons to use PDF.
Really, “what’s in it for me?” (a phrase I rarely use) has more to do with demands for HTML–not for readability, but for text-mining–and pressures to do more than HTML. And the constant “PDF sucks!” refrain.
As noted above, I do provide HTML versions of (most) Cites & Insights essays (except for a small number that just don’t work well that way and one “print bonus” feature that appears sometimes)–because some people asked me nicely to do so as an alternative for those who really want to read online, and because it had been a while since people were demanding that my free publication should be revamped to suit their own preferences.
(Yes, I do mean demanding, in at least one case with fairly strong language. My standard response, after the unmailed two-word/seven-letter one, was that there are lots of other things to read on the web…)