Good Math, Bad Math

I’ve been taking a look at William Dembski’s paper, “[Information as a Measure of Variation][imv]“. It was recommended to me as a paper demonstrating Demsbki’s skill as a mathematician that isn’t aimed at evolution-bashing. I’m not going to go into too much detail about it; it’s just not that good. If this is the best work he’s done as a mathematician, well, that’s pretty sad.

The main problems with the paper are:

1. He either doesn’t understand or misrepresents some of the fundamentals of the field he’s allegedly discussing;
2. He presents many of the central ideas of the paper (that is, describing information content of an event in an event sequence in terms of how it affects the probabilities of events that occur after it) as if they were novel when they really are not (in fact, this idea dates back to Shannon’s [self-information][self-info]); and
3. Much of the paper is very unclear, even obfuscatory.

The second two are typical of Dembski’s writing, and not particularly interesting. I’m going to focus on three egregious cases of misrepresentation.

### Misrepresentation One: IT = Shannon IT

The misrepresentations start from quite literally the first line of the paper. The first two paragraphs of the body of the paper are:

>Ordinarily, information refers to the meaning or semantic content of a message.
>Getting a handle on the meaning of a message, however, has proven difficult
>mathematically. Thus, when mathematicians speak of information, they are
>concerned not so much with the meaning of a message as with the vehicle by
>which the message is transmitted from sender to receiver.
>
>The most common vehicle for transmitting messages is the character string.
>The mathematical theory of information is largely about quantifying the com-
>plexity of such strings, characterizing their statistical properties when they
>are sent across a noisy communication channel (noise being represented as a
>stochastic process that disrupts the strings in statistically well-defined
>ways), preserving the strings despite the presence of noise (i.e., the theory
>of error-correcting codes), compressing the strings to improve efficiency, and
>transforming the strings into other strings to maintain their security (i.e.,
>cryptography).

This is wrong, and it’s deliberately wrong.

The reason it’s wrong? As I described in my [introduction to information theory][intro-it] (IT), there are two main branches of IT: Shannon and Kolmogorov-Chaitin. Demsbki is definitely aware of both; he references the work of Chaitin in papers written *before* this one. But in his description of information theory here, he focuses exclusively on Shannon theory, and presents it as if it were the entirety of mathematical IT.

Why would he do that? Well, because it makes it easier to make his argument about why it makes sense to view information in terms of a series of events. Later in the paper, he’s going to argue that information entropy is the *wrong* measure for information; but that argument can *only* make sense in Shannon theory. And even for the Shannon IT that he uses, the way that he characterizes it is sloppy.

### Misrepresentation Two: Information Content in Poker Hands

Moving on, here’s another really dreadful passage:

>Consider, for instance, the following individuation of poker hands: RF (a
>royal flush) and ¬RF (all other poker hands). To learn that something other
>than a royal flush was dealt (i.e., possibility ¬RF ) is clearly to acquire
>less information than to learn that a royal flush was dealt (i.e., possibility
>RF ). A royal flush is highly specific. We have acquired a lot of information
>when we learn that a royal flush was dealt. On the other hand, we have acquired >hardly any information when we learn that something other than a royal flush
>was dealt. Most poker hands are not royal flushes, and we expect to be dealt
>them only rarely. Nevertheless, if our measure of information is simply an
>enumeration of eliminated possibilities, the same numerical value must be
>assigned in both instances since, in each instance, a single possibility is
>eliminated.

Looking at this, it’s hard to tell if he just really *doesn’t get it*, or if he’s trying to slip something past the readers. It’s sufficiently messed up that it’s hard to determine exactly what he means. I can see two very different readings.

1. The two possibilities are, as he says, RF and ¬ RF. That is, we’re going to be told whether or not a hand is a royal flush. We are *not* going to be told what the cards in the hand are: we are simple going to get a yes/no answer to the question “Is the hand a royal flush?” If that’s the case, then this is completely wrong: being told “Yes, it’s an RF” gives you enough information to determine that the hand is one of four sets of cards: (the RF in each of the four suits); being told “No, it’s not an RF” is only enough information to determine that the hand is one of 52! – 4 possible sets of cards. And *in either case*, the information content is determined *not solely by the statement that the cards are, or are not, a royal flush*. The information content of those statements (per Kolmogorov-Chaitin, which is the kind of IT applicable to analyzing statements of this sort) is based on the statement *plus* the rules of poker that define the meaning of a royal flush.

2. We are told exactly what cards are in the hand. In that case, we know whether or not it’s a RF because we know what cards are there. In that case, whether you’ve got an RF or not, *you have a precise description of exactly one poker hand*. No matter what variant of IT you’re using; no matter whether you’ve got a flush or not; the amount of information that you have *is exactly the same*: it’s the identity of the 5 cards in your hand.

### Misrepresentation Three: Explaining Entropy as a Measure

On more example of the misleadingness: the beginning of section two tries to explain why information theorists use entropy. He presents an explanation of information associated with an event; and then he explains entropy *in terms of* that presentation of information as events; and then presents the IT notion of entropy as something *mysterious*:

>Nonetheless, this this notion is almost entirely passed over in favor of a
>different notion, called entropy. Entropy, rather than being associated with a
>particular event, is associated with a partition of events for a given
>reference class of possibilities Ω….

Followed by an rather obfuscatory presentation of the equation for the Shannon IT entropy of an event sequence. So far, this part *could* be taken as just typical of Dembski’s dreadfully obscure writing style. But the next step he takes is where he’s deliberately being misleading. He asks why entropy rather than event probability is the preferred measure for information content?

But he shifts the goalposts. Up until now, he’s been talking about mathematicians and IT theorists. Now suddenly, the question isn’t about why entropy is the preferred measure among *information theorists*; it’s why it’s the preferred measure among *communication theorists* (which are a different group of people); and the *answer* that he quotes is about *communication engineers*.

So: again, he’s deliberately being misleading. He’s trying to convince you that you should think of information content in terms of probability. But instead of making that argument, he presents the definitions in a very strange way, and uses subtle unjustified changes to allow him to present the *weakest possible explanation* for why his preferred measure is not widely accepted.

### Conclusion

Dembski’s a damned lousy mathematician. Even when he isn’t trying to mislead people about evolution, he’s sloppy, obfuscatory, and prone to arguments based on strawmen. If this is an example of his best work as a mathematician, then someone really screwed up in giving him a PhD.

[imv]: http://www.iscid.org/pcid/2005/4/2/dembski_information_variation.php
[intro-it]: http://scienceblogs.com/goodmath/2006/06/an_introduction_to_information.php
[self-info]: http://en.wikipedia.org/wiki/Self-information

Comments

  1. #1 Tom Duff
    July 27, 2006

    Nice analysis. It doesn’t matter to your argument, but the number of non-RF poker hands is 52!/(47! 5!)-4 (about 260 million), not 52!-4 (a 68 digit number).

  2. #2 Blake Stacey
    July 27, 2006

    As always, the explanation of why the IDanista is off his rocker is much more interesting than the IDanista’s argument itself. It’s sort of like Phil Plait teaching good astronomy by discussing the horrible mistakes in movies, or how listening to people about why they can’t stand Atlas Shrugged leads to deep elements of philosophy, aesthetics and poetry.

  3. #3 Mumon
    July 27, 2006

    Dembski, of course doesn’t generally acknowledge when real information theorists and communication engineers respond to his work.

    There’s other areas of information theory and communication theory that make Dembski a joke amongst those of us who get paid to do this; in particlar, the basic theory of statstical hypothesis testing refutes anything Dembski might say about “explanatory filters” and “complex specified information.”

    Kolmogorov also plays a role in the fundamentals of probability theory along with others; by positing that discipline as phenomenological.

  4. #4 EJ
    July 27, 2006

    Hi.

    In paragraph 2, Dembski says that IT is largely about (1) quantifiying the complexity of character strings, and (2) other stuff. Looks to me like (1) is a Very Brief version of K-C theory, and (2) is Shannon theory. So he could argue that, contrary to your claim, he hadn’t ignored K-C theory in his introduction.

    Looks like in the paper he’s slipping inappropriately back and forth between the two, as he has done before, but it doesn’t look to me like your criticism #1 is precisely correct.

    I think this is my third nitpicky comment out of a total of three comments I’ve made to your blog. So let me say that I read your blog because I like it, and not for just wanting to pick pick pick.

  5. #5 Mark C. Chu-Carroll
    July 28, 2006

    EJ:

    He says it’s about quantifying the complexity of character strings, but he always discusses that in the context of a message. Both Shannon theory and K-C theory quantify the complexity of character strings; Shannon is the one that focuses on the idea of a message. There is no message in K-C.

    In his discussion of stuff, he consistently muddles ideas from Shannon and ideas from K-C is very sloppy ways; but for the purpose of the paper, he clearly tries to present things in the Shannon framework, because it makes his argument appear to be a bit stronger.

  6. #6 elspi
    July 28, 2006

    This is what MathSciNet has on Dembski.

    I know graduate students with better vitas.
    This is just someone who wandered into I math department and wouldn’t leave until they threw a sheepskin at him.
    It is a good thing that the Discovery institute hired him; otherwise he would be cleaning toilets for a living. The only paper in a known journal is his first one. That is almost certainly his thesis, which his advisor likely wrote for him just to be rid of him (and who can blame him).

    [1] MR1884094 (2003b:00012) Dembski, William A. No free lunch. Why specified complexity cannot be purchased without intelligence. Rowman & Littlefield Publishers, Inc., Lanham, MD, 2002. xxvi+404 pp. ISBN: 0-7425-1297-5 (Reviewer: David H. Wolpert) 00A69 (92D15)

    Review in linked PDF Add citation to clipboard Document Delivery Service Journal Original Article

    [2] MR1650908 (2000g:00016) Dembski, William A. The design inference. Eliminating chance through small probabilities. Cambridge Studies in Probability, Induction, and Decision Theory. Cambridge University Press, Cambridge, 1998. xviii+243 pp. ISBN: 0-521-62387-1 (Reviewer: Zeno G. Swijtink) 00A30

    Review in linked PDF Add citation to clipboard Document Delivery Service Journal Original Article

    [3] MR1094070 Dembski, William A. Randomness by design. Noûs 25 (1991), no. 1, 75–106. 65C10 (68Q30)

    Review in linked PDF Add citation to clipboard Document Delivery Service Journal Original Article

    [4] MR1067671 (91j:28007) Dembski, William A. Uniform probability. J. Theoret. Probab. 3 (1990), no. 4, 611–626. (Reviewer: S. J. Taylor) 28A75 (28A12 28A33 60A10)

  7. #7 Coin
    July 28, 2006

    So… MathSciNet requires a subscription.

    Just to be clear, is it or is it not known exactly what Dembski’s thesis was for his PH.D in Mathematics? Might there be some way to look that up with the University of Chicago? I’d be curious about that.

  8. #8 One Brow
    July 28, 2006

    I believe you were mistaken about this paper, or any paper by Dembski, not being connected to evolution/IDC. His discussion of information in this paper was intended to set up using his “conclusions” to disprove evolution, create the “Law” of Conservation of Information, etc. There was no attempt at serious mathematics.