Writing Sentences

Taking advantage of a new Amazon feature, Steven Johnson does some literary data-mining:

The two stats that I found totally fascinating were “Average Words Per Sentence” and “% Complex Words,” the latter defined as words with three or more syllables — words like “ameliorate”, “protoplasm” or “motherf***er.” I’ve always thought that sentence length is a hugely determining factor in a reader’s perception of a given work’s complexity, and I spent quite a bit of time in my twenties actively teaching myself to write shorter sentences. So this kind of material is fascinating to me, partially because it lets me see something statistically that I’ve thought a great deal about intuitively as a writer, and partially because I can compare my own stats to other writers’ and see how I fare. (Perhaps there’s a literary Rotisserie league lurking somewhere on those Text Stats pages.)

So I spent a few hours last week plugging in the numbers for my books, as well as a few other authors that I assembled in an entirely unscientific fashion: Malcolm Gladwell, Steven Pinker, Seth Godin, Christopher Hitchens — and then, just to see how far I’d come, I threw in my intellectual (and, sadly, stylistic) heroes from my early twenties, the post-structuralist legends Michel Foucault and Frederic Jameson. I compiled stats for 3-4 books for each author, except Gladwell who has written two, and then plotted them on a scatter chart, with the y axis representing % complex words and the x axis representing words per sentence.

What did Johnson find? The results weren’t exactly shocking. Foucault and Jameson write long sentences (somewhere between 35-55 words per sentence); Gladwell and Godin write short sentences (between 15-20 words per sentence). But there was one surprising result:

Each author’s books are closer to his other books than they are to the other two author’s books. In other words, each of us has a certain sweet spot of complexity that we come back to book after book.

That sounds about right. If I were a literature grad student, I’d be curious to see how this analytic technique could be applied to novelists. For example, I’ve been on a Philip Roth binge lately, and it’s pretty striking how rhetorically consistent he is across all of his novels. (The themes, of course, are consistent, too.) My hunch is that the same could be said for most novelists, once you allowed for an outlier or two. Everybody has that odd early novel, or that one foray into post-modernism, but when you look at writers like Updike or McEwan or Proust or Woolf you generally find a striking syntactic regularity.


  1. #1 speedwell
    October 22, 2007

    Sop what you’re saying is, you’ve found that, amazingly, writers write like themselves.

    Just kidding, lol… cool data!

  2. #2 Gareth
    October 23, 2007

    Plugging in Foucault may be largely pointless, as you would be tracking the translator’s word choice and sentence structure perhaps more than the author’s. Conventions about sentence length and punctuation differ among Romance languages, and the number of sentence in an English translation of Foucault’s work is almost certainly greater than that of the original French, with the degree of rephrasing depending upon the translator.