I’ve been taking a look at William Dembski’s paper, “[Information as a Measure of Variation][imv]”. It was recommended to me as a paper demonstrating Demsbki’s skill as a mathematician that isn’t aimed at evolution-bashing. I’m not going to go into too much detail about it; it’s just not that good. If this is the best work he’s done as a mathematician, well, that’s pretty sad.

The main problems with the paper are:

1. He either doesn’t understand or misrepresents some of the fundamentals of the field he’s allegedly discussing;

2. He presents many of the central ideas of the paper (that is, describing information content of an event in an event sequence in terms of how it affects the probabilities of events that occur after it) as if they were novel when they really are not (in fact, this idea dates back to Shannon’s [self-information][self-info]); and

3. Much of the paper is very unclear, even obfuscatory.

The second two are typical of Dembski’s writing, and not particularly interesting. I’m going to focus on three egregious cases of misrepresentation.

### Misrepresentation One: IT = Shannon IT

The misrepresentations start from quite literally the first line of the paper. The first two paragraphs of the body of the paper are:

>Ordinarily, information refers to the meaning or semantic content of a message.

>Getting a handle on the meaning of a message, however, has proven difficult

>mathematically. Thus, when mathematicians speak of information, they are

>concerned not so much with the meaning of a message as with the vehicle by

>which the message is transmitted from sender to receiver.

>

>The most common vehicle for transmitting messages is the character string.

>The mathematical theory of information is largely about quantifying the com-

>plexity of such strings, characterizing their statistical properties when they

>are sent across a noisy communication channel (noise being represented as a

>stochastic process that disrupts the strings in statistically well-defined

>ways), preserving the strings despite the presence of noise (i.e., the theory

>of error-correcting codes), compressing the strings to improve efficiency, and

>transforming the strings into other strings to maintain their security (i.e.,

>cryptography).

This is wrong, and it’s deliberately wrong.

The reason it’s wrong? As I described in my [introduction to information theory][intro-it] (IT), there are two main branches of IT: Shannon and Kolmogorov-Chaitin. Demsbki is definitely aware of both; he references the work of Chaitin in papers written *before* this one. But in his description of information theory here, he focuses exclusively on Shannon theory, and presents it as if it were the entirety of mathematical IT.

Why would he do that? Well, because it makes it easier to make his argument about why it makes sense to view information in terms of a series of events. Later in the paper, he’s going to argue that information entropy is the *wrong* measure for information; but that argument can *only* make sense in Shannon theory. And even for the Shannon IT that he uses, the way that he characterizes it is sloppy.

### Misrepresentation Two: Information Content in Poker Hands

Moving on, here’s another really dreadful passage:

>Consider, for instance, the following individuation of poker hands: RF (a

>royal flush) and ¬RF (all other poker hands). To learn that something other

>than a royal flush was dealt (i.e., possibility ¬RF ) is clearly to acquire

>less information than to learn that a royal flush was dealt (i.e., possibility

>RF ). A royal flush is highly specific. We have acquired a lot of information

>when we learn that a royal flush was dealt. On the other hand, we have acquired >hardly any information when we learn that something other than a royal flush

>was dealt. Most poker hands are not royal flushes, and we expect to be dealt

>them only rarely. Nevertheless, if our measure of information is simply an

>enumeration of eliminated possibilities, the same numerical value must be

>assigned in both instances since, in each instance, a single possibility is

>eliminated.

Looking at this, it’s hard to tell if he just really *doesn’t get it*, or if he’s trying to slip something past the readers. It’s sufficiently messed up that it’s hard to determine exactly what he means. I can see two very different readings.

1. The two possibilities are, as he says, RF and ¬ RF. That is, we’re going to be told whether or not a hand is a royal flush. We are *not* going to be told what the cards in the hand are: we are simple going to get a yes/no answer to the question “Is the hand a royal flush?” If that’s the case, then this is completely wrong: being told “Yes, it’s an RF” gives you enough information to determine that the hand is one of four sets of cards: (the RF in each of the four suits); being told “No, it’s not an RF” is only enough information to determine that the hand is one of 52! – 4 possible sets of cards. And *in either case*, the information content is determined *not solely by the statement that the cards are, or are not, a royal flush*. The information content of those statements (per Kolmogorov-Chaitin, which is the kind of IT applicable to analyzing statements of this sort) is based on the statement *plus* the rules of poker that define the meaning of a royal flush.

2. We are told exactly what cards are in the hand. In that case, we know whether or not it’s a RF because we know what cards are there. In that case, whether you’ve got an RF or not, *you have a precise description of exactly one poker hand*. No matter what variant of IT you’re using; no matter whether you’ve got a flush or not; the amount of information that you have *is exactly the same*: it’s the identity of the 5 cards in your hand.

### Misrepresentation Three: Explaining Entropy as a Measure

On more example of the misleadingness: the beginning of section two tries to explain why information theorists use entropy. He presents an explanation of information associated with an event; and then he explains entropy *in terms of* that presentation of information as events; and then presents the IT notion of entropy as something *mysterious*:

>Nonetheless, this this notion is almost entirely passed over in favor of a

>different notion, called entropy. Entropy, rather than being associated with a

>particular event, is associated with a partition of events for a given

>reference class of possibilities Ω….

Followed by an rather obfuscatory presentation of the equation for the Shannon IT entropy of an event sequence. So far, this part *could* be taken as just typical of Dembski’s dreadfully obscure writing style. But the next step he takes is where he’s deliberately being misleading. He asks why entropy rather than event probability is the preferred measure for information content?

But he shifts the goalposts. Up until now, he’s been talking about mathematicians and IT theorists. Now suddenly, the question isn’t about why entropy is the preferred measure among *information theorists*; it’s why it’s the preferred measure among *communication theorists* (which are a different group of people); and the *answer* that he quotes is about *communication engineers*.

So: again, he’s deliberately being misleading. He’s trying to convince you that you should think of information content in terms of probability. But instead of making that argument, he presents the definitions in a very strange way, and uses subtle unjustified changes to allow him to present the *weakest possible explanation* for why his preferred measure is not widely accepted.

### Conclusion

Dembski’s a damned lousy mathematician. Even when he isn’t trying to mislead people about evolution, he’s sloppy, obfuscatory, and prone to arguments based on strawmen. If this is an example of his best work as a mathematician, then someone really screwed up in giving him a PhD.

[imv]: http://www.iscid.org/pcid/2005/4/2/dembski_information_variation.php

[intro-it]: http://scienceblogs.com/goodmath/2006/06/an_introduction_to_information.php

[self-info]: http://en.wikipedia.org/wiki/Self-information