Good Math, Bad Math

PZ has already commented on this, but I thought that I’d throw in my two cents. A surgeon, Dr. Michael Egnor, posted a bunch of comments on a Time magazine blog that was criticizing ID. Dr. Egnor’s response to the criticism was to ask: “How much new information can Darwinians mechanisms generate?”

Of course, the Discovery Institute grabbed this as if it was something profound, and posted an article on it – and that’s where they really start to get stupid:

Egnor concludes:

I did a PubMed search just now. I searched for ‘measurement’, and ‘information’ and ‘random’ and ‘e coli’. There were only three articles, none of which have any bearing on my question. The first article, by Bettelheim et al, was entitled ‘The diversity of Escherichia coli serotypes and biotypes in cattle faeces’. searched for an actual measurement of the amount of new information that a Darwinian process can generate, and I got an article on ‘cattle faeces’. I love little ironies.

Did the Darwinists respond to Egnor’s question? Most tried to explain how there can be an increase in Shannon information, but as Egnor explained, “Shannon information is not relevant to biological information.” Egnor points out: “Your example of Labbe’s paper on gene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is ‘It’s just like gene duplication. Plagiarism is new information- you said so on your blog!’”

So now they’ve stepped into my territory – and “stepped in it” is definitely the appropriate phrase.

We’ve got Dr. Egnor demanding that he be shown “how much new information darwinian processes can generate”. and making it very clear that what he wants is an exact measure.

So people respond – showing how to compute the specific quantity of information generated by particular evolutionary processes. And of course, they do it in terms of the only mathematical or scientific framework that can assign specific values to quantities of information: Shannon theory.

And Dr. Egnor plays the good old standard creationist game: move the goalposts. It’s not Shannon information he wants to know about. It’s something else. He want to know how much biological information is created. And of course, “biological information” is undefined.

It’s exactly the same game that Dembski always plays with specified complexity. He challenges you to show him how Darwinian processes can create specified complexity. But specified complexity is undefinable. He’s careful to never precisely nail down just what SC is; it’s always vague, always undefined. The closest he ever comes it to present a list of definitions, saying “it could be this, or it could be that… “

Can Dr. Egnor define biological information as a precisely measurable quantity? In a way that is distinct from Shannon information? Of course not.

Here’s another example of him continuing with the same trick, this time in a comment on Pharyngula:

How much new specified information can random variation and natural selection generate? Please note that my question starts with ‘how much’- it’s quantitative, and it’s quantitative about information, not literature citations. I didn’t ask ‘how many papers can I generate when I go to PubMed and type ‘gene duplication evolution’. I asked for a measurement of new specified information, empirically determined, the reference(s)in which this measurement was reported, and a thoughtful analysis as to whether this ‘rate of acquisition’ of new specified information by random heritable variation and natural selection can account for the net information measured in individuals of the species in which the measurement was made. Mike Lemonick was wrong that this isn’t an important question in evolutionary biology. This is the central question.

Yes, it’s got to be a precise measure of information. It’s got to be empirically measured, with complete details of how the information was generated, and whether or not that matches with what evolutionary processes can produce.

But it’s not Shannon information. It’s undefined information – something allegedly measurable, but which is specifically defined as being different from the only tool that allows us to precisely measure information. And, even if someone did somehow come up with some other measure that defined “biological information” as distinct from Shannon information, it would still not be enough: because you’d need to show a complete model of evolution in terms of Dr. Egnor’s undefined and unspecified biological information theory.

Speaking as a math guy, this is wretched, dishonest garbage. In math, we don’t get to demand that people provide us with undefined measures. In fact, we expect the person making a claim to demonstrate that claim. Dr. Egnor is the one who’s making an appeal to mathematics – by arguing that evolution cannot explain the creation of “biological information”. To make that claim, it is incumbent on him to define his terms with sufficient precision to make it possible to refute him if he’s wrong.. But Dr. Egnor’s claim is full of wiggle room, and attempts to assign the burden of proof
to his opponents – which is an unreasonable thing to do, since he hasn’t defined his terms. There is no way to refute a claim like Dr. Egnor’s – because any refutation will
just be met with “No, that’s not what I meant by biological information” – exactly the way he responded to Shannon calculations showing how information is created by evolutionary processes.

Comments

  1. #1 Jonathan Vos Post
    February 24, 2007

    Right on!

  2. #2 dileffante
    February 24, 2007

    I just did a search in Google with the keywords:

    god powerful invisible “hates darwinists”

    …and nothing came out, not even something about cattle faeces. Ergo, such a being doesn’t exist. QED

  3. #3 MarkP
    February 24, 2007

    Apparently an integral skill for succesful crankery is the ability to consistently shift the burdon of proof, and the resonsibility of defining terms, onto one’s critics.

  4. #4 Reed A. Cartwright
    February 24, 2007

    There are two issues with respect to creationist’s “information” claims. Most people, like this blog, focus on the first issue: “information” is undefined. But there is a second issue, namely that once the creationists have given us a definition and way to measure “information”, they must then prove that evolution would require such “information” to increase.

    In other words, any creationist can come up with an irrelevant metric of information but that does not mean that said metric must be increasable for evolution to be true.

  5. #5 Blake Stacey
    February 24, 2007

    MarkCC:

    But it’s not Shannon information. It’s undefined information — something allegedly measurable, but which is specifically defined as being different from the only tool that allows us to precisely measure information.

    Reminds me of Douglas Adams’s recipriversexclusion: “a number whose existence can only be defined as being anything other than itself.”

  6. #6 _Arthur
    February 24, 2007

    Of course, fow or even no PubMed articles concerning new genes or new mutation will attribute those new genes to “random processes”, because every biologist worth his salt already knows as scientifically demonstrated that mutations *are* random respective to fitness, so contemporary PubMed article will not belabor the point.

    So “random” is a poor keyword for a pubmed search on new gene function.

  7. #7 Tyler DiPietro
    February 24, 2007

    But there is a second issue, namely that once the creationists have given us a definition and way to measure “information”, they must then prove that evolution would require such “information” to increase.

    This is also a good point. Formulations of information are intended to solve certain technological problems. Shannon-Wiener theory is relevant to communication while KCS is relevant to effective compuation of a given object. I’ve never seen any IDCist even beging to describe why either communication or effective computation is relevant to biology, much less evolution. It’s all confused metaphysical ramblings.

  8. #8 MarkP
    February 24, 2007

    Classic. So when they challenge you to explain how information can increase via evolution, counter-challenge them to prove there is any quantity of information there in the first place.

    It’s always the unstated with these guys…

  9. #9 paul
    February 24, 2007

    What strikes me about this is how dishonest Egnor is even on his own dishonest terms. Shannon (or Chaitin, if you want to get just a touch fancier) would agree that simply duplicating a gene doesn’t add a lot of information to a genome. You’ve added a bunch of text, but it’s easily encoded. Duplications are a potential starting point for adding lots of additional information to the genome because the copies get to evolve semi-independently, but that’s the kind of thing that a DI type would studiously avoid mentioning.

    (And that’s even before you get to the notion that shannon-style information content is in the way of a lower bound for “biological information content” because even a few bits of added information at the right level can produce a vastly different end product.)

  10. #10 Torbjörn Larsson
    February 24, 2007

    Since DI puts so much weight on Egnor being a professor of neurosurgery, he should first answer if he thinks in terms of how much information he will restore to his patients, or how much function. Biology is primarily about functions and populations.

    Btw, Egnor’s backstabbing debate tactics and lack of scientific grasp is rather like Cordova’s. I suggest we instate a “cordoviality measure” (reciprocal to sincere cordiality) for the former characteristic and an “egnorance measure” (reciprocal to scientific information) for the latter. Not that I can motivate the need or define the specific values. But why should that bother us?

  11. #11 Mr. Gunn
    February 25, 2007

    I was wondering where you’ve been, Mark! It’s not like you haven’t destroyed these crappy statistical “argument from large numbers” IDiots before.

    Of course, there’s no chance he’ll walk in here trying his bad math argument, but we could hope.

  12. #12 EvolEcol
    February 25, 2007

    I wonder whether Dr. Egnor considers the science underlying neurosurgery to be fundamentally different than the science underlying evolutionary biology. I wonder if he ignores epidemiological evidence, and if he insists on double-blind clinical trials that test the relative effectiveness of every medication and surgical procedure he uses. I wonder if Dr. Egnor thinks that AIDS is caused by HIV, and what he thinks of the purported link between autism and thimerosal.

    I bring this up not because I think the evidence for evolution is lacking, but rather because critics require a much greater standard of evidence for evolution than they require in their own professions.

  13. #13 Kenh
    February 25, 2007

    I find it interesting that IDists are so open with their intellectual dishonesty, yet they believe in a ‘judgement day’. Do they feel that their deity is somehow blind to this intellectual dishonesty and will forget all their past behavior on that judgement day?

  14. #14 Joshua
    February 25, 2007

    It’s becoming painfully clear that these creationists understand as little — possibly even less — information theory as biology. Anybody who’s studied it even briefly (I encountered Shannon as part of a communications theory course, myself, which is almost certainly more than Egnor can claim) knows what a load of crap these objections are.

    More importantly than the fact that there isn’t a purely quantitative way to define information aside from the Shannon theory, Egnor (which, now that I think about it, sounds suspiciously like “ignore”, as in “ignoring evidence”) never explains why Shannon is inappropriate. Seriously, if you want to claim that one of the two founding branches of information theory is invalid in the study of information, you need a damned good explanation for just why that is.

  15. #15 MarkP
    February 25, 2007

    Perhaps we now have a term for willful ignorance despite frequent exposure to the facts: “egnorance”.

  16. #16 Nullifidian
    February 25, 2007

    Obviously, the author of the DI page didn’t follow things too carefully. The only reference I recall to Shannon information came from me on the original Time magazine blog, as a citation of Kimura 1961. Now, Egnor has never specifically dealt with the material I referenced, and I won’t hold my breath since it shows that Shannon information is increased by natural selection. He has, as you’ve seen, waved around it claiming that Shannon information isn’t relevant (when the fact that Kimura was able to use it to evaluate the information increase generated by evolutionary processes shows it is relevant), but he has some private definition of “biological information” which somehow excludes gene duplication and divergence from counting as an increase in information. However, because Egnor could never resist bringing Shannon up at every context, despite never having read the paper I cited, the DI’s resident hack writer has inferred that nearly everyone must have been referring to Shannon information.

    The turn of phrase the writer uses is very interesting, as well.

    Most tried to explain how there can be an increase in Shannon information, but as Egnor explained, “Shannon information is not relevant to biological information.”

    My emphasis.

    Evidently the DI takes a bald assertion as some sort of an explanation, which may explain why they’ve not made a dent in the peer-reviewed lit.

  17. #17 Nullifidian
    February 25, 2007

    The hazards of not using preview. I forgot to include the full citation.

    Kimura, M. (1961) Natural selection as the process of accumulation of genetic information in adaptive evolution. Genetical Research, 2:127-140.

  18. #18 Joshua
    February 25, 2007

    Oh, and I vaguely recalled Claude Shannon doing some work in biology, so I hit the intertubeotron to have a look. I remembered the cellular automata stuff he did, but it turns out he also took a look at genetics for his PhD thesis.

    It doesn’t prove anything relevant (not least because the PhD was submitted well before Shannon wrote his Mathematical Theory of Communication which is usually what people mean when they refer to “Shannon theory”), but I find it amusing given Egnor’s claim that Shannon theory is not relevant to genetics. It makes me wish Shannon were still around so we could ask his opinion. ;)

  19. #19 Dave S.
    February 25, 2007

    MarkP says:

    So when they challenge you to explain how information can increase via evolution, counter-challenge them to prove there is any quantity of information there in the first place.

    I usually ask them how much biological information (as they understand that term) is in my cat?

    If they can’t even begin to give an answer for a single extant individual (because they don’t really have an acceptable biologically relevant quantifiable definition of ‘information’ to start with), how can they expect an answer from me for a process that occured over the past 3-4 billion years and is still occuring involving millions of species?

  20. #20 MarkP
    February 25, 2007

    It doesn’t prove anything relevant (not least because the PhD was submitted well before Shannon wrote his Mathematical Theory of Communication which is usually what people mean when they refer to “Shannon theory”), but I find it amusing given Egnor’s claim that Shannon theory is not relevant to genetics. It makes me wish Shannon were still around so we could ask his opinion. ;)

    The IDers do seem to prefer quoting dead people who aren’t around to defend themselves.

  21. #21 MartinM
    February 25, 2007

    Reed Cartwright said:

    There are two issues with respect to creationist’s “information” claims. Most people, like this blog, focus on the first issue: “information” is undefined. But there is a second issue, namely that once the creationists have given us a definition and way to measure “information”, they must then prove that evolution would require such “information” to increase.

    Actually, I think the second issue is the solution to the first. Suppose we restrict ourselves to definitions where the information content of a genotype is independent of the path taken to produce it, which seems reasonable. Then it would seem that the only definitions of information for which it holds true that mutations cannot increase information are the ones for which it is also true that mutations cannot decrease information. Such definitions are rather irrelevant to evolution, no?

  22. #22 Tyler DiPietro
    February 25, 2007

    Such definitions are rather irrelevant to evolution, no?

    Even if Egnor didn’t actually provid a concrete definition of information in his initial post at the Time blog, however we choose to measure information, Egnor comes out flat on face in the mud. PZ provided a concrete example of new genes with new functions that were, as Egnor demanded, produced without intelligent agency. Once he was provided with these examples, Egnar backpedalled into his current incoherent drivel, which boils down to “Well we can’t quantify biological information, therefore you can’t show me how it’s produced. NEENER NEENER!”

  23. #23 EntropyFails
    February 26, 2007

    Actually, you can defeat this ID clown even in his poorly defined “biological information” terms. Here’s some magic for your arguments.

    His assertation that “copying” doesn’t increase “biological information” is easily disproven. First, you take the average entropy of the “primordial soup” that the first replicators found themselves in. Obviously, this will be very high as it is inert dead matter. However, there will be local low entropy spots that the original life forms would use as power. The highest informational content would be stored in perhaps naturally occurring hydrocarbons and we will give this a specific numeric value called Base_Hydro_Information_Level. Of things belonging to that class it would have a Total_Hydro_Information_Amount.

    Then, after the creation of a biological lifeform, we have a new highest information area that makes up the primordial cell and its DNA. We calculate the informational content of the atoms associated with this cell and call it Base_Life_Information_Level. Again, we can take create a Total_Life_Information_Amount by adding up the percentage of atoms that have the Base_Life_Information_Level.

    The business of life is to convert the relatively lower Base_Hydro_Information_Level into the higher Base_Life_Information_Level. Life does this by eating and creating replicators that eat. So when the cell eats, the Total_Hydro_Information_Amount of the area under study actually decreases and the Total_Life_Information_Amount increases. Of course it must do so in a way that makes the total Entropy of the system increase, but you see easily see how eating enhances the Total_Life_Information_Amount which is what any sane mind must declare “biological information” as.

    But all life has limits and no single lifeform can grow to cover any sufficiently large sample area. It is also dangerous for the information process for it to all be contained in a contiguous area. Hence as some point, the primordial cell divides, perhaps randomly at first, but later via a controlled process. On this division, we now have Total_Life_Information_Amount starting its exponential growth. The 1 becomes the 2. The 2 becomes the 4. And so on…

    Each of these new lifeforms eat from Total_Hydro_Information_Amount and turn it Total_Life_Information_Amount. And due to imperfect copying, it also changes the Base_Life_Information_Level. Life below this level tends to die more readily, causing the Total_Life_Information_Amount to fall back into the Total_Hydro_Information_Amount. Life above this level tends to survive and replicates more readily. As community of life grows, the Total_Life_Information_Amount steadily rises due to these factors. Copying increases the information content of area under study.

    So don’t ever let anyone tell you that reproduction does not increase the total information content of an area that life can inhabit. You can rigorously define “biological information” as the information content of all atoms engaged in the process of living. It is measurably higher than the information of dead things. And reproduction increases that “biological information” total, no matter how you define it.

    The sad thing about these ID people is that they disparage the process that creates them. I don’t know how you could hate the world and processes that birthed you that much. I blame religion.

  24. #24 Anonymous
    February 26, 2007

    Why are IDers so silly?

    OK, I’ll translate the idea of genetic information into something a bit more accessible. Let’s say I have a robot that moves around according to an instruction sheet that it reads. Let’s say the instruction sheet says:

    “Go three feet ahead, turn right 45 degrees, go forward one foot, turn left 30 degrees.”

    Suppose we have a duplication event:

    “Go three feet ahead, turn right 45 degrees, go forward one foot, turn left 30 degrees, go three feet ahead, turn right 45 degrees, go forward one foot, turn left 30 degrees”.

    Is the second message different from the first from an information theory perspective? Of course. The fact that it results from a duplication event does not mean that new information has not been generated.

    I could do the same with reversals, insertions/deletions, and mutations.

    A final example:

    “Dr. Egnor is a wit.”

    (after insertion event)

    “Dr. Egnor is a twit.”

    Does this change represent “new information”???

  25. #25 Xanthir, FCD
    February 26, 2007

    Anon:
    To be fair, a duplication event like you describe adds very little information. I could notate it as:
    “Go three feet ahead, turn right 45 degrees, go forward one foot, turn left 30 degrees. Repeat 1 time.”

    This sort of thing is relevant when discussing compression, which directly applies to the information measure.

    It is, however, still a positive amount, as you say, and it introduces the possibility of creating much more information, as each copy can now mutate independently.

  26. #26 JohnK
    February 27, 2007

    From Dembski’s conclusion in the last chapter of The Design Inference, (his only reviewed work) page 229:

    How, then, does the design inference detect and measure informa­tion? For information to pass from a sources to a receiver r, a message M’ emitted at s must suitably constrain a message M~’ received at r. Moreover, the amount of information passing from s to r is by def­inition the negative logarithm to the base 2 of the probability of M’ (see Dretske, 1981, chs. 1 and 2). For the design inference to detect information, the message M’ emitted at s has to be identified with a pattern D, and the message M” received at r has to be identified with an event E. Moreover, for M’ to suitably constrain M” must then mean that D delimits E. Under the provisional assumption that E is due to the chance hypothesis H, one detects the transmission of information from s to r provided that D is detachable from E and P(D* H) has small probability. Moreover, one measures the amount of information transmitted from s to r as –log2 P(D* H). This account of informa­tion in terms of design is entirely consistent with the classical account in terms of symbol strings (cf. Shannon and Weaver, 1949).

    Egnor better let Dembski in on the straight dope. Or vice versa.

  27. #27 IanC
    February 27, 2007

    (for a moment, I’m ignoring completely the issue of measuring or even defining information)

    If duplication doesn’t increase information, and mutation doesn’t, I’m assuming they’ll deny duplication and mutation together adding information.

    So:
    a -> aa no change.
    aa-> at no change.
    at -> atat no change
    atat -> atct no change.

    You can carry this on as long as you want, you could create a string of duplication/mutations to generate any dna sequence. If there is no change, then ‘a’ has the same information content as the human genome.

    And there are just so many other easy ways of showing it wrong, I can’t understand how they don’t see it.

  28. #28 Joshua
    February 27, 2007

    Anon is spot on, of course. Whether you consider message entropy or algorithmic complexity, the two working and rigorous definitions of information that we have, gene duplication is an increase in information even before mutation and selection takes place.

    As for his requirement that the new information do something… Duplicated genes (not even modified! just duplicated!) typically result in an enhancement of whatever the original gene does. There was a paper published recently documenting a greater than expected variation in number of gene copies between human individuals, and the individuals with more copies expressed the gene more strongly. So, there’s your direct benefit from the increased information resulting from a duplication event alone. Again, not even counting the results of modification to the copy!

  29. #29 MarkP
    February 27, 2007

    Duplication doesn’t create new information you say? I’ve one word for you:

    Twins.

  30. #30 Jonathan Vos Post
    February 27, 2007

    Evolution is presumably behind the ability to digest milk was developed after cattle farming was introduced in Europe some 9,000 years ago. That would be “new information.” But you have to believe that the world is at least 9,000 years old.

    =======================================

    Ability to drink milk a recent evolution

    LONDON, Feb. 27 (UPI) — British researchers say lactose intolerance, the inability to digest dairy products, in Europeans goes back to the Stone Age.

    Genetic research by a team from University College London and Mainz University, Germany suggests that all European adults living between 6,000 BC and 5,000 BC were unable to absorb lactose, The Telegraph newspaper reported.

    DNA tests on Neolithic skeletons from some of the earliest farming communities in Europe suggest the ability to digest milk was developed after cattle farming was introduced in Europe some 9,000 years ago, “making it the most rapidly evolved European trait of the past 30,000 years,” said Dr. Mark Thomas of UCL.

    In a release, Thomas said the ability to drink milk gave some early Europeans a big survival advantage, pointing to “the continuous supply of milk compared to the boom and bust of seasonal crops; its nourishing qualities; and the fact that it’s uncontaminated by parasites, unlike stream water, making it a safer drink.”

    He said it “is the most advantageous trait that’s evolved in Europeans in the recent past.”

    Copyright 2007 by United Press International. All Rights Reserved.

  31. #31 BC
    February 28, 2007

    How much new specified information can random variation and natural selection generate? Please note that my question starts with ‘how much’- it’s quantitative, and it’s quantitative about information, not literature citations.

    Quantitatively, how much information can intelligent design produce?

  32. #32 Jonathan Vos Post
    February 28, 2007

    Here’s some nice simple Math about how long it takes for new information to appear as specific strings of nucleotides.

    http://arxiv.org/pdf/math.PR/0702883

    From: Richard Durrett
    Date: Wed, 28 Feb 2007 15:05:28 GMT (204kb,S)

    Waiting for regulatory sequences to appear

    One possible explanation for the substantial organismal differences between humans and chimpanzees is that there have been changes in gene regulation. Given what is known about transcription factor binding sites, this motivates the following probability question: given a 1000 nucleotide region in our genome, how long does it take for a specified six to nine letter word to appear in that region in some individual? Stone and Wray [Mol. Biol. Evol. 18 (2001) 1764--1770] computed 5,950 years as the answer for six letter words. Here, we will show that for words of length 6, the average waiting time is 100,000 years, while for words of length 8, the waiting time has mean 375,000 years when there is a 7 out of 8 letter match in the population consensus sequence (an event of probability roughly 5/16) and has mean 650 million years when there is not. Fortunately, in biological reality, the match to the target word does not have to be perfect for binding to occur. If we model this by saying that a 7 out of 8 letter match is good enough, the mean reduces to about 60,000 years.

    Authors: Richard Durrett, Deena Schmidt
    Comments: Published at this http URL in the Annals of Applied Probability (this http URL) by the Institute of Mathematical Statistics (this http URL)
    Report-no: IMS-AAP-AAP0209
    Subj-class: Probability
    MSC-class: 92D10 (Primary) 60F05 (Secondary)
    Journal-ref: Annals of Applied Probability 2007, Vol. 17, No. 1, 1-32
    DOI: 10.1214/105051606000000619

  33. #33 Luna_the_cat
    March 1, 2007

    Johnathan Vos Post — fantastic paper, thank you for finding that.

    Re: Dr. Egnor — there seems to be a basic logical fallacy that he is committing, at the root of the deliberate vagueness about how one defines “biological information”.

    He argues that “biological information” cannot be defined and/or accurately quantified. Ok, for the sake of argument, let us accept that as being the present case. Note that at this moment in time, he is NOT arguing that the impossibility of accurately defining or quantifying it means that it doesn’t exist. He seems to accept the existance of such “information”, despite his claim that it cannot be quantified.

    However, he then turns around and claims that, because we cannot accurately quantify information, we therefore cannot accurately quantify change in this information, and we thus cannot quantify how Darwinian processes (no, they are evolutionary processes anyway, dammit) function to increase this information. Therefore we cannot either claim or show that such processes do increase information. In other words, because we can’t put an exact number to it*, we cannot make predictions about it or even demonstrate that it exists.

    The problem to me seems that the world — especially the rather squishy world of biology — is full of continua, and difficult-to-quantify analog phenomena. Take, for example, pain. An ongoing medical challenge is how to quantify the entirely subjective experience of “chronic pain”. About the best that can be managed so far is to have people self-rate their experience on a scale of 1 to 10, which is about as close as you can get to completely undefined and still have something you can write down on paper.

    Nevertheless, I’m going to go out on a limb here (so to speak), and propose the idea that if I take a baseball bat to someone’s knees, this process will vastly increase their pain, both in the short term and on a long-term basis.

    But I cannot quantify how this does so.

    Therefore, by Egnor’s claim, I would not legitimately be able to predict that it would increase the target individual’s pain at all.

    I wonder if the good Dr. would be amenable to experiment?

    To my mind, what he’s doing is an example of “confusing the map with the territory.” If a phenomenon is demonstrable with predictable results — even if you cannot exactly quantify the level of those results — then the lack of quantification reveals a hole in the tools you use to describe reality, not a hole in reality.

    Or would anyone like to pick that apart?

    ———————————-

    * except that, as has been pointed out, a variety of people actually posted up papers specifically demonstrating how to put a number to change in “biological information”.

  34. #34 Torbjörn Larsson
    March 1, 2007

    Luna, you nailed it.

  35. #35 Jonathan Vos Post
    March 1, 2007

    “Put that scalpel down, right now!” department.

    Some people become VERY angry when they can’t handle the math. Or, the words of the police report: “upset when she couldn’t grasp the material.”

    Frustrated by math, student stabs her teacher: cops

    January 10, 2007

    BY ANNIE SWEENEY Crime Reporter
    http://suntimes.com/news/metro/203989,CST-NWS-stab10.article

    A 51-year-old Malcolm X College instructor who was demonstrating a math problem on her blackboard Tuesday was stabbed in the back by a student who apparently became frustrated with the exercise, authorities said.
    The veteran instructor, who was teaching a GED class, was taken to Stroger Hospital, where she received stitches and was released, officials said.

    The incident occurred around 11 a.m., when a 40-year-old female student in the class repeatedly asked the instructor to explain a math problem and apparently became upset when she couldn’t grasp the material, Chicago Police said.

    The student pulled a small steak knife and stabbed the teacher in the left shoulder area, said Monroe District Capt. John Kenny.

    Kenny said that before the stabbing the student repeatedly said: “I don’t understand. . . . Explain to me.”

    At least one student intervened to help the instructor, and the suspect soon was caught by school security in a stairwell, said Zerrie Campbell, president of Malcolm X, which is at 1900 W. Van Buren.

    Charges were pending against the suspect, who has an arrest record that includes 27 misdemeanors.

    There are no metal detectors at Malcolm X — “there is no need for it” because violence is almost unheard of at the school, Campbell said.

    asweeney@suntimes.com

  36. #36 Corkscrew
    March 3, 2007

    Xanthir:

    To be fair, a duplication event like you describe adds very little information. I could notate it as:

    You’re conflating Shannon information (the “surprisal” of a given string) with Kolmogorov information (the compressibility of a given string). A duplication event adds very little Kolmogorov information. Shannon information, by contrast, is effectively history-free – it’s the sum of the informations of the component symbols – so duplicating the string doubles the Shannon information.

    That’s if I remember my introductory information theory course correctly, anyway. It’s been a while.

  37. #37 Xanthir, FCD
    March 3, 2007

    Sorry, you’re right. This is what I get for having only a sophisticated layman’s understanding of IT. Most of what I know comes from KC, so I naturally think about information in terms of compressibility. But of course Shannon information is a function of the length of the string and the probabilities of each symbol, so doubling the string length produces a corresponding increase in (Shannon) information.

    I have also taken this opportunity to increase my knowledge of Game Theory. Isn’t Wikipedia wonderful?

  38. #38 Torbjörn Larsson
    March 4, 2007

    Convergent sequences are how we define limits of functions.

    I must thank John for his clear exposition here, since I had some vague problems in this direction with Enigman’s comment as I read it, problems I couldn’t really formulate.

    The Banach-Tarski paradox is not just a problem for set theory, John, not if set theory is our way of defining the real number line. The trouble with the Axiom of Choice is that it is not particularly troubling, if a realistic view of sets is taken.

    I think this is the third thread I have seen where Enigman whips forth Banach-Tarski paradox as a problem for the real number line and measure theory. I’m not a mathematician, but it seems to me that the paradox points out a bizarre consequence when using AC with non-measurable sets.

    As I understand it AC is a simplifying tool like the use of infinities. If you choose not to use one or more of them you get different types of math (such as Zermelo-Fraenkel set theory without AC or constructive math) as when you do.

    As for measure theory, the only use of AC I saw in my old book (Cohn, “Measure Theory”) was for set definition. And to show that not all subsets if R are Lebesque measurable, which can only be done by ZF set theory with AC added, apparently. Which presumably shows one eminent use of AC.

    Enigman seems to be concerned with philosophical “truth” and “the metaphysics of continuity”. I hope the above doesn’t illustrate a preconceived notion of where a problem lies, because that seems less fruitful in such a quest.

    Perhaps it shows the implausibility of Lebesgue measure or its generality.

    As I understand it, the path integral in physics isn’t Lebesque integrable, suggesting just the later problem with the Lebesgue measure, it isn’t general enough for all the things we want to do. (Already improper Riemann integrals shows that the Lebesgue measure isn’t quite enough, I think.) Apparently it is yet impossible to calculate path integrals except in very simple cases.

    I just recently learned about another integral which also seems inspired by physics namely the gauge integral, which seems somewhat easier to handle.

    From an introduction discussing its potential use in graduate classes to deepen the understanding of Lebesgue integrals: “In fact, the Henstock-Kurzweil formulation — the gauge integral — is considerably simpler than the Lebesgue idea, and its definition is only slightly different from the definition of the Riemann integral. [...] For instance, every Lebesgue integrable function is also gauge integrable. [...] This analogy may be helpful: The gauge integrable functions are like convergent series; then the Lebesgue integrable functions are like absolutely convergent series.” ( http://www.math.vanderbilt.edu/%7Eschectex/ccc/gauge/ )

  39. #39 Torbjörn Larsson
    March 4, 2007

    Sorry, please disregard my last comment, I posted in the wrong browser window.

  40. #40 Jonathan Vos Post
    March 4, 2007

    Charles Darwin
    by
    Jonathan Vos Post

    Origin of Species. What is the origin?
    What shapes the flower to the honey-bees,
    what paintbrush stripes the tigerskin?
    Origin of Species. What is the origin?
    Why such dissimilarities
    from masculine to feminine?

    What is the nature of Natural Selection?
    The 10-year-old boy with a beetle collection
    stares at their wings in silent reflection.
    His father’s a doctor, but his affection
    is life; upset by blood and by infection
    he leaves the theatre in mid-dissection.

    A Naturalist’s Voyage. Saw the world in five years
    with the moody but brilliant Captain Fitz-Roy.
    Horseback through Patagonia, with mountaineers
    and ten mules crossing the Andes twice. Joy
    among Volcanic Islands, in the atmospheres
    of learning, working, growing to man from boy.

    The Structure and Distribution of Coral Reefs,
    viewed by moonlight from the H.M.S. Beagle,
    the cattleman’s domesticated beefs,
    the cross-bred wheat in heavy golden sheafs
    and the fierce expression of the pinioned eagle
    all added evidence to the same beliefs.

    The Descent of Man. And what is our descent?
    Did we evolve by accident or plan?
    What was the source of our development?
    He studied the native South American;
    the Tierra del Fuegan, equally intent,
    studied young Charles Darwin with astonishment.

    2050-2400
    6 May 1981

  41. #41 Daniel Morgan
    April 14, 2007

    Mark,

    The man won’t. Shut. Up.

    Actually, all I did was ask a question: how much biologically relevant information can Darwin’s mechanism of chance and necessity actually generate? I didn’t settle for hand-waving or for reassurances that “Darwin’s theory is a fact.” I wanted a measurement of biological complexity, with empirical verification, in a way that was meaningful to biology. I never got an answer to my question. [emphasis mine]

    This was written April 13th.

  42. #42 Caledonian
    April 14, 2007

    He doesn’t get it – and it’s because he doesn’t choose to get it.

    Evolutionary processes can produce any arbitrarily-chosen structure.

  43. #43 Jonathan Vos Post
    April 14, 2007

    Caledonian: “Evolutionary processes can produce any arbitrarily-chosen structure.”

    Although some, i.e. fire-breathing dragons on wheels, take more mutations…

  44. #44 Randy Stimpson
    February 11, 2008

    Which of the following two paragraphs has more information?

    George Washington was the first president of the United States.

    George Washington was the first president of the United States. George Washington was the first president of the United States.

    Now if you make some random modifications to the second sentence of the second paragraph will you get more information?

  45. #45 Mark C. Chu-Carroll
    February 12, 2008

    Randy:

    Mathematically, the second paragraph, with two copies of the original statement, is slightly more information than the first, with only one copy. If you modify the second, you get even more information.

  46. #46 Randy Stimpson
    February 13, 2008

    How do you mathematically compute the amount of information in those paragraphs? You don’t need to explain it, just give me a reference. I’ll go look it up and educate myself.

    Also I don’t see how randomly modifying the second sentence gives you more information. What definition of information would allow you to come to that conclusion?

  47. #47 Mark C. Chu-Carroll
    February 13, 2008

    Randy:

    There’s two main branches of information theory. They have similar, but not identical definition of information.

    There’s Shannon’s theory – look for “Communication in the Presence of Noise”, which is his original paper on the subject.

    Then there’s Kolmogorov/Chaitin information theory. Greg Chaitin has written multiple books on the topic – “Algorithmic Information Theory” by Gregory Chaitin is a very thorough introduction; he’s also got a ton of popular-level books: “Exploring Randomness” is a very good one.

    The basic issue here is that people confuse “information” with “meaning”. They’re not the same thing. When you’re looking at “information” in mathematical terms, you’re just looking at sequences of symbols. What they mean doesn’t matter.

    Take a sentence like “george washington was the first president.” From an information theory standpoint, that’s a string of 44 characters, taken from an alphabet consisting of the 26 letters, plus spaces and punctuation. How much information it contains in bits is tricky, but speaking loosely, it’s a measure of the smallest number of bits you can compress it into.

    If you make it “george washington was the first president.george washington was the first president.”, then it’s more information, because a compression needs to say “2 copies of …”. Make a random modification to a single character – say, change the “w” in “washington” to a K and now the you can’t just say “2 copies of …” – now you’ve got something extra – an extra character – which you need to somehow specify besides just copying the first string literally. Need an extra character to describe it, it’s more information.

    The meaning that a string encodes is separate from the information in that string.

  48. #48 Jonathan Vos Post
    February 14, 2008

    In fact, there’s a paradox if you assumed that semantic meaning enters into the syntactic definition of information, and we assume that things are the same as what they are called.

    Assume, for the sake of argument, that “george washington was the first president” is interpreted as mathematical equality:
    “george washington” = “the first president”

    Then, by the law of substitution, where something can be replaced by its equal:

    “george washington” has 16 letters, not counting blanks, but “the first president” has 17 letters, not counting blanks.

    Hence 16 = 17. Contradiction.

    Hence, one or more of these 3 assumptions is logically incorrect:
    (1) semantic meaning enters into the syntactic definition of information;
    (2) things are the same as what they are called;
    (3) “george washington was the first president” is interpreted as mathematical equality.

    Additionally, “make a random modification to a single character” presupposes:
    (a) a mapping from what things are to what they are called (as character strings in a formal language;
    (b) a definition of “random” that applies Probability Theory of some kind to either the ensemble of things represented or to the ensemble of the character strings that represent those things.

    It is a Category Error, philosophically, to confuse a string of numerals with the number that they represent in a given base representation (i.e. “111″ which has 3 characters with “100 + 10 + 1″ which has 8 characters not counting blanks);

    It is a Category Error, philosophically, to confuse a string of numerals with something that they represent outside of the abstract world of integer arithmetic (i.e. 666 = Satan, or 888 = Jesus);

    It is a Category Error, philosophically, to confuse a string of nucleotide triples (codons) with the amino acid that this codes for in the genetic code (i.e, more generally, to confuse genotype with phenotype).

    I should not have to say that all three category errors are routinely made by the ignorant, egnorant, lunatic, and liars that Mark Chu-Carroll so brilliantly deconstructs in “Good Math, Bad Math” — but I’ll say it anyway.

    There’s Good Math; there’s Bad Math; there’s Good Biology; there’s Bad Biology; there’s Good Theology; there’s Bad Theology; and woe unto those who can’t tell one from the other, for, as it is written in Jeremiah 31:29 [King James Bible]:
    “In those days they shall say no more, The fathers have eaten a sour grape, and the children’s teeth are set on edge.”

    [where sour is to be taken as a pH value at the Cemistry level, and something else at the symbolic level; and fathers and children are taken as two different generations at one level, and something else at the symbolic level]

    cf. Lamentations 5:7 “Our fathers sinned, and are no more; It is we who have borne their iniquities.”
    [which makes Darwinian fitness hard to compute]

    cf. Ezekiel 18:2 “What do you mean by using this proverb concerning the land of Israel, saying, ‘The fathers eat the sour grapes, But the children’s teeth are set on edge’?”
    [which makes it hard to relate scripture with the human, physical, and spiritual world it is meant to describe].

    In conclusion, as it is written in John 5:14 [King James Bible]:
    “Afterward Jesus findeth him in the temple, and said unto him, Behold, thou art made whole: sin no more, lest a worse thing come unto thee.”

    Amen!

    Thus endeth my sermon on Theomathematics and Theobiology.

    QED!

  49. #49 Randy Stimpson
    February 14, 2008

    So Mark, if we think of information as you have described it, we would say that the second string below has more information than the first, right?

    George Washington was the first President of the United States

    epowk;vneuiwpfdmnra ajdiddg8kfl adgmqapk mdjkajdn alkjd,auihdioqklakj akjdmn [pqioryrqpz]a.,,dkaidjknadkjn f klajkdl,n lakdjnyykal jlidlkn lan alkjl kjad lj;lakjd l;kjesszoiqoi[dfp[lkj;’a alkn agmkjdlkj a; lkjsg91g2@3#dlkjalkjd ;^)

  50. #50 Jonathan Vos Post
    February 14, 2008

    Randy Stimpson:

    (1) Actually, you fell into my first trap: George Washington was NOT the first president of the United States. He was 7th.

    (2) You should go read the citations that Mark Chu-Carroll provided. You still don’t get it.

    (3) Here are some of the things that you must learn to even be able to ask a meaningful question (I used mathWorld as a guide and source of excerpts and references. Believe it or not, it takes MATH to understand Information and Evolution. It takes the denial of MATH to believe Intelligent Design or Governor Huckabee of George W. Bush. Are you willing to do some homework, or just complain that nobody appreciates your dazzling qualitative insights?

    ============

    Information Theory

    The branch of mathematics dealing with the efficient and accurate storage, transmission, and representation of information.

    SEE ALSO: Coding Theory, Compression, Entropy, Quantum Information Theory. [Pages Linking Here]

    REFERENCES:

    Goldman, S. Information Theory. New York: Dover, 1953.

    Hankerson, D.; Harris, G. A.; and Johnson, P. D. Jr. Introduction to Information Theory and Data Compression. Boca Raton, FL: CRC Press, 1998.

    Lee, Y. W. Statistical Theory of Communication. New York: Wiley, 1960.

    Pierce, J. R. An Introduction to Information Theory. New York: Dover, 1980.

    Reza, F. M. An Introduction to Information Theory. New York: Dover, 1994.

    Singh, J. Great Ideas in Information Theory, Language and Cybernetics. New York: Dover, 1966.

    Weisstein, E. W. “Books about Information Theory.” http://www.ericweisstein.com/encyclopedias/books/InformationTheory.html.

    Zayed, A. I. Advances in Shannon’s Sampling Theory. Boca Raton, FL: CRC Press, 1993.

    ============

    Coding Theory

    Coding theory, sometimes called algebraic coding theory, deals with the design of error-correcting codes for the reliable transmission of information across noisy channels. It makes use of classical and modern algebraic techniques involving finite fields, group theory, and polynomial algebra. It has connections with other areas of discrete mathematics, especially number theory and the theory of experimental designs.

    SEE ALSO: Encoding, Error-Correcting Code, Finite Field, Hadamard Matrix. [Pages Linking Here]

    REFERENCES:

    Alexander, B. “At the Dawn of the Theory of Codes.” Math. Intel. 15, 20-26, 1993.

    Berlekamp, E. R. Algebraic Coding Theory, rev. ed. New York: McGraw-Hill, 1968.

    Golomb, S. W.; Peile, R. E.; and Scholtz, R. A. Basic Concepts in Information Theory and Coding: The Adventures of Secret Agent 00111. New York: Plenum, 1994.

    Hill, R. First Course in Coding Theory. Oxford, England: Oxford University Press, 1986.

    Humphreys, O. F. and Prest, M. Y. Numbers, Groups, and Codes. New York: Cambridge University Press, 1990.

    MacWilliams, F. J. and Sloane, N. J. A. The Theory of Error-Correcting Codes. New York: Elsevier, 1978.

    Roman, S. Coding and Information Theory. New York: Springer-Verlag, 1992.

    Stepanov, S. A. Codes on Algebraic Curves. New York: Kluwer, 1999.

    Vermani, L. R. Elements of Algebraic Coding Theory. Boca Raton, FL: CRC Press, 1996.

    van Lint, J. H. An Introduction to Coding Theory, 2nd ed. New York: Springer-Verlag, 1992.

    Weisstein, E. W. “Books about Coding Theory.” http://www.ericweisstein.com/encyclopedias/books/CodingTheory.html.

    ============

    Entropy

    In physics, the word entropy has important physical implications as the amount of “disorder” of a system. In mathematics, a more abstract definition is used. The (Shannon) entropy of a variable X is defined as
    H(X)=-sum_(x)P(x)log_2[P(x)]

    bits, where P(x) is the probability that X is in the state x, and Plog_2P is defined as 0 if P=0. The joint entropy of variables X_1, …, X_n is then defined by
    H(X_1,…,X_n)=-sum_(x_1)…sum_(x_n)P(x_1,…,x_n)log_2[P(x_1,...,x_n)].

    SEE ALSO: Differential Entropy, Information Theory, Kolmogorov Entropy, Kolmogorov-Sinai Entropy, Maximum Entropy Method, Metric Entropy, Mutual Information, Nat, Ornstein’s Theorem, Redundancy, Relative Entropy, Shannon Entropy, Topological Entropy. [Pages Linking Here]

    REFERENCES:

    Ellis, R. S. Entropy, Large Deviations, and Statistical Mechanics. New York: Springer-Verlag, 1985.

    Havil, J. “A Measure of Uncertainty.” §14.1 in Gamma: Exploring Euler’s Constant. Princeton, NJ: Princeton University Press, pp. 139-145, 2003.

    Khinchin, A. I. Mathematical Foundations of Information Theory. New York: Dover, 1957.

    Lasota, A. and Mackey, M. C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed. New York: Springer-Verlag, 1994.

    Ott, E. “Entropies.” §4.5 in Chaos in Dynamical Systems. New York: Cambridge University Press, pp. 138-144, 1993.

    Rothstein, J. “Information, Measurement, and Quantum Mechanics.” Science 114, 171-175, 1951.

    Schnakenberg, J. “Network Theory of Microscopic and Macroscopic Behavior of Master Equation Systems.” Rev. Mod. Phys. 48, 571-585, 1976.

    Shannon, C. E. “A Mathematical Theory of Communication.” The Bell System Technical J. 27, 379-423 and 623-656, July and Oct. 1948. http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf.

    Shannon, C. E. and Weaver, W. Mathematical Theory of Communication. Urbana, IL: University of Illinois Press, 1963.
    ============

    The mutual information between two discrete random variables X and Y is defined to be

    I(X;Y)=
    sum_(x in X)sum_(y in Y)P(x,y)log_2((P(x,y))/(P(x)P(y)))

    bits. Additional properties are

    I(X;Y) = I(Y;X)

    I(X;Y) >= 0,

    and
    I(X;Y)=H(X)+H(Y)-H(X,Y),

    where H(X) is the entropy of the random variable X and H(X,Y) is the joint entropy of these variables.

    SEE ALSO: Entropy. [Pages Linking Here]

    This entry contributed by Erik G. Miller

    REFERENCES:

    Cover, T. M. and Thomas, J. A. Elements of Information Theory. New York: Wiley, pp. 18-26, 1991.
    ============

  51. #51 Jonathan Vos Post
    February 14, 2008

    [this version has URLs stripped out, unlike the one an hour or two ago, to get through the queue]

    Randy Stimpson:

    (1) Actually, you fell into my first trap: George Washington was NOT the first president of the United States. He was 7th.

    (2) You should go read the citations that Mark Chu-Carroll provided. You still don’t get it.

    (3) Here are some of the things that you must learn to even be able to ask a meaningful question (I used mathWorld as a guide and source of excerpts and references. Believe it or not, it takes MATH to understand Information and Evolution. It takes the denial of MATH to believe Intelligent Design or Governor Huckabee of George W. Bush. Are you willing to do some homework, or just complain that nobody appreciates your dazzling qualitative insights?

    ============

    Information Theory

    The branch of mathematics dealing with the efficient and accurate storage, transmission, and representation of information.

    SEE ALSO: Coding Theory, Compression, Entropy, Quantum Information Theory. [Pages Linking Here]

    REFERENCES:

    Goldman, S. Information Theory. New York: Dover, 1953.

    Hankerson, D.; Harris, G. A.; and Johnson, P. D. Jr. Introduction to Information Theory and Data Compression. Boca Raton, FL: CRC Press, 1998.

    Lee, Y. W. Statistical Theory of Communication. New York: Wiley, 1960.

    Pierce, J. R. An Introduction to Information Theory. New York: Dover, 1980.

    Reza, F. M. An Introduction to Information Theory. New York: Dover, 1994.

    Singh, J. Great Ideas in Information Theory, Language and Cybernetics. New York: Dover, 1966.

    Weisstein, E. W. “Books about Information Theory.”

    Zayed, A. I. Advances in Shannon’s Sampling Theory. Boca Raton, FL: CRC Press, 1993.

    ============

    Coding Theory

    Coding theory, sometimes called algebraic coding theory, deals with the design of error-correcting codes for the reliable transmission of information across noisy channels. It makes use of classical and modern algebraic techniques involving finite fields, group theory, and polynomial algebra. It has connections with other areas of discrete mathematics, especially number theory and the theory of experimental designs.

    SEE ALSO: Encoding, Error-Correcting Code, Finite Field, Hadamard Matrix. [Pages Linking Here]

    REFERENCES:

    Alexander, B. “At the Dawn of the Theory of Codes.” Math. Intel. 15, 20-26, 1993.

    Berlekamp, E. R. Algebraic Coding Theory, rev. ed. New York: McGraw-Hill, 1968.

    Golomb, S. W.; Peile, R. E.; and Scholtz, R. A. Basic Concepts in Information Theory and Coding: The Adventures of Secret Agent 00111. New York: Plenum, 1994.

    Hill, R. First Course in Coding Theory. Oxford, England: Oxford University Press, 1986.

    Humphreys, O. F. and Prest, M. Y. Numbers, Groups, and Codes. New York: Cambridge University Press, 1990.

    MacWilliams, F. J. and Sloane, N. J. A. The Theory of Error-Correcting Codes. New York: Elsevier, 1978.

    Roman, S. Coding and Information Theory. New York: Springer-Verlag, 1992.

    Stepanov, S. A. Codes on Algebraic Curves. New York: Kluwer, 1999.

    Vermani, L. R. Elements of Algebraic Coding Theory. Boca Raton, FL: CRC Press, 1996.

    van Lint, J. H. An Introduction to Coding Theory, 2nd ed. New York: Springer-Verlag, 1992.

    Weisstein, E. W. “Books about Coding Theory.”
    ============

    Entropy

    In physics, the word entropy has important physical implications as the amount of “disorder” of a system. In mathematics, a more abstract definition is used. The (Shannon) entropy of a variable X is defined as
    H(X)=-sum_(x)P(x)log_2[P(x)]

    bits, where P(x) is the probability that X is in the state x, and Plog_2P is defined as 0 if P=0. The joint entropy of variables X_1, …, X_n is then defined by
    H(X_1,…,X_n)=-sum_(x_1)…sum_(x_n)P(x_1,…,x_n)log_2[P(x_1,...,x_n)].

    SEE ALSO: Differential Entropy, Information Theory, Kolmogorov Entropy, Kolmogorov-Sinai Entropy, Maximum Entropy Method, Metric Entropy, Mutual Information, Nat, Ornstein’s Theorem, Redundancy, Relative Entropy, Shannon Entropy, Topological Entropy. [Pages Linking Here]

    REFERENCES:

    Ellis, R. S. Entropy, Large Deviations, and Statistical Mechanics. New York: Springer-Verlag, 1985.

    Havil, J. “A Measure of Uncertainty.” §14.1 in Gamma: Exploring Euler’s Constant. Princeton, NJ: Princeton University Press, pp. 139-145, 2003.

    Khinchin, A. I. Mathematical Foundations of Information Theory. New York: Dover, 1957.

    Lasota, A. and Mackey, M. C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics, 2nd ed. New York: Springer-Verlag, 1994.

    Ott, E. “Entropies.” §4.5 in Chaos in Dynamical Systems. New York: Cambridge University Press, pp. 138-144, 1993.

    Rothstein, J. “Information, Measurement, and Quantum Mechanics.” Science 114, 171-175, 1951.

    Schnakenberg, J. “Network Theory of Microscopic and Macroscopic Behavior of Master Equation Systems.” Rev. Mod. Phys. 48, 571-585, 1976.

    Shannon, C. E. “A Mathematical Theory of Communication.” The Bell System Technical J. 27, 379-423 and 623-656, July and Oct. 1948.

    Shannon, C. E. and Weaver, W. Mathematical Theory of Communication. Urbana, IL: University of Illinois Press, 1963.
    ============

    The mutual information between two discrete random variables X and Y is defined to be

    I(X;Y)=
    sum_(x in X)sum_(y in Y)P(x,y)log_2((P(x,y))/(P(x)P(y)))

    bits. Additional properties are

    I(X;Y) = I(Y;X)

    I(X;Y) >= 0,

    and
    I(X;Y)=H(X)+H(Y)-H(X,Y),

    where H(X) is the entropy of the random variable X and H(X,Y) is the joint entropy of these variables.

    SEE ALSO: Entropy. [Pages Linking Here]

    This entry contributed by Erik G. Miller

    REFERENCES:

    Cover, T. M. and Thomas, J. A. Elements of Information Theory. New York: Wiley, pp. 18-26, 1991.
    ============

  52. #52 Randy Stimpson
    February 15, 2008

    Wow Jonathan, you are smart. I don’t know any one who can copy and paste from webpages as good as you can.

  53. #53 Jonathan Vos Post
    February 15, 2008

    Re: #51,

    As someone who’s taught far in excess of 2,000 students in the classroom, ages 13-93, and who has been an Adjunct Professor of Mathematics and an Adjunct Professor of Astronomy, with many refereed papers on Mathematical Biology published in the proceedings of international conferences, I believe that I am entitled to the Fair Use provision of copyright law. I can cut and paste as I see fit, for scholarly and educational purposes, on a non-profit basis. But knowing WHAT to cut and paste, well, that comes from 50 years of Good Math experience on my part. Are you willing to read the ad hoc textbook? Or did Pavlov’s dog eat your first homework assignment?

    It should not surprise you then, Randy Stimpson, that the Magic Dragon Multimedia web domain of original information by myself and my wife, now more than 12 years online, gets over 15,000,000 hits per year, and ranks in the top 10 according to several different key words, such as “science fiction”, according to Google?

    Wriggle on the end of the hook all you like, d00d, the fact is that people who actually know something are telling you what you need to learn to be able to ask an intelligent question.

    And you do what? You resort to ad hominem attack, as if you remain unable or unwilling to ask an intelligent question.

    Now, if you intend actual dialog, and not just knee-jerk Creationist vomitus, can you explain from what ensemble of possible messages, over what alphabet, with what statistical distribution, comes your string:

    epowk;vneuiwpfdmnra ajdiddg8kfl adgmqapk mdjkajdn alkjd,auihdioqklakj akjdmn [pqioryrqpz]a.,,dkaidjknadkjn f klajkdl,n lakdjnyykal jlidlkn lan alkjl kjad lj;lakjd l;kjesszoiqoi[dfp[lkj;’a alkn agmkjdlkj a; lkjsg91g2@3#dlkjalkjd ;^)

    If you can do that, we can then calculate an entropy, and say how much information there is.

    Or are you mired in the Creationist rut that dogmatically claims that information cannot be generated except by an Intelligent Designer?

  54. #54 Mark C. Chu-Carroll
    February 15, 2008

    So Mark, if we think of information as you have described it, we would say that the second string below has more information than the first, right?

    George Washington was the first President of the United States

    epowk;vneuiwpfdmnra ajdiddg8kfl adgmqapk mdjkajdn alkjd,auihdioqklakj akjdmn [pqioryrqpz]a.,,dkaidjknadkjn f klajkdl,n lakdjnyykal jlidlkn lan alkjl kjad lj;lakjd l;kjesszoiqoi[dfp[lkj;’a alkn agmkjdlkj a; lkjsg91g2@3#dlkjalkjd ;^)

    First, I want to be clear. The definitions of information that I’m using aren’t my definitions. I’m not nearly smart enough to have invented the mathematical abstractions for describing information.

    Second, to answer your question, I need you to be more specific.

    (1) There are two very different mathematical abstractions for dealing with information. They’re related, but have different foci. Are you talking about Shannon information theory, or Kolmogorov/Chaitin information theory?

    (2) If Shannon, then what’s the alphabet? And what is the expected probability distribution within that alphabet?

    (2) If Kolmogorov/Chaitin, then what computational device and what alphabet are we talking about?

  55. #55 Torbjörn Larsson, OM
    February 15, 2008

    Actually, all I did was ask a question: how much biologically relevant information can Darwin’s mechanism of chance and necessity actually generate? I didn’t settle for hand-waving or for reassurances that “Darwin’s theory is a fact.”

    This is of course egnoring the basic and simple to understand fact that if the theory, which is primarily about functions and populations, survives predictive testing while conforming to the definition of evolution, it shows that information isn’t “biologically relevant”.

    Unfortunately for Egnor this means that he has to define his measure and show how it applies to biology. A task which, as the post and now comment #41 testifies to, he is frenetically avoiding.

    Meanwhile, biologists who aren’t afraid of doing science, have studied how information is collected by the genome. (Google for example the work on ev, where Egnor also finds the requested quantifications.) Of course they find that Shannon information can illuminate aspects of that, and the reason isn’t hard to understand:

    Note that Shannon’s definition of the quantity of information is independent of whether it is true. The measure he came up with was ingenious and intuitively satisfying. Let’s estimate, he suggested, the receiver’s ignorance or uncertainty before receiving the message, and then compare it with the receiver’s remaining ignorance after receiving the message. The quantity of ignorance-reduction is the information content.

    [...]

    Mutation is not an increase in true information content, rather the reverse, for mutation, in the Shannon analogy, contributes to increasing the prior uncertainty. But now we come to natural selection, which reduces the “prior uncertainty” and therefore, in Shannon’s sense, contributes information to the gene pool.

    Dawkins finishes rather poetically with the observation that:

    If natural selection feeds information into gene pools, what is the information about? It is about how to survive. Strictly it is about how to survive and reproduce, in the conditions that prevailed when previous generations were alive. To the extent that present day conditions are different from ancestral conditions, the ancestral genetic advice will be wrong. In extreme cases, the species may then go extinct. To the extent that conditions for the present generation are not too different from conditions for past generations, the information fed into present-day genomes from past generations is helpful information.

    [...]

    “And isn’t it an arresting thought? We are digital archives of the African Pliocene, even of Devonian seas; walking repositories of wisdom out of the old days. You could spend a lifetime reading in this ancient library and die unsated by the wonder of it.”

    So information is learned from the environment and the realized functions of the organisms themselves. It is fed into the genome by the true and tested method of dumb (redoes previous mistakes) trial-and-error. The information is coded by the frequencies of alleles in the population.

    And this information happens to be exactly what population genetics and quantitative genetics both describe. In other words, we have gone full circle and Egnor have the answer I provided in the beginning – evolution is about functions and populations.

    Now, is anyone going to bet on Egnor avoiding the loopiness of creationism, or will we see another endless iteration of the scammers?

    (Btw, I find it mildly humorous that Randy repeats Egnor’s misconceptions and his rejection of observed biological mechanisms. But only mildly.)

  56. #56 Randy Stimpson
    February 15, 2008

    Hi Mark, let’s focus on the Shannon definition of information. Is there any alphabet and expected probability distribution within that alphabet that would identify the first string as having more information than the second string?

  57. #57 Mark C. Chu-Carroll
    February 15, 2008

    Randy:

    That’s a very bad question. The reason is because I can always define a probability distribution that makes a given string as unlikely as I want For example, if I use UTF-8, and I make the uppercase letters *extremely* rare, then within that framework, I can assign a higher information value to the string.

    Shannon information is sometimes informally described as “surprise”. Given a string of symbols, presented to you one at a time, you ask “How surprised am I that X is the next character?”. Then the amount of information in the string is the sum of how much you were surprised at each step. With a strict definition of how you measure how surprised you were by each character, that’s pretty much Shannon information.

    The way that the probability distribution figures in is that it’s a way of saying how surprised you should be by each symbol. If you have an alphabet of two symbols, “X”, and “Y”, and you expect to see 10,000,000 “X”s for each “Y”, then seeing a “Y” is always much more surprising than seeing an X. Whenever you’re waiting for a character, you expect to see an X; it’s no surprise. Whenever you get a Y, it’s an unexpected rare event.

    So if I take unicode characters as an alphabet, and set the probability distribution so that uppercase letters are incredibly rare relative to anything else, I can easily make your string be very surprising, since it has 5 uppercase letters in a short span.

    So I can define a distribution by saying that you’ll select characters by random selection. You’ll initially pick from a set consisting of the printable non-uppercase unicode characters under 128, plus one extra. When you randomly select a symbol, if you pick the extra, then you generate another random number between 0 and 1. If you generate something smaller that 1/1,000,000,000,000 then you randomly select an uppercase character; otherwise, you randomly select one of the non-uppercase. So your probability of getting an uppercase character is only 1 in 1 trillion; that means that the odds of getting 5 of them in a span of 40 or so symbols is vanishingly small.

    -
    If the odds of any printable, non-uppercase character in the UTF range under 128 as equal

  58. #58 Randy Stimpson
    February 15, 2008

    Thanks Mark. What I am getting at is how applicable the math is. Doing the math is generally a lot easier than figuring out how and when to apply it. The hardest class I took in graduate school was Mathematical Modeling. It seems that the Shannon information model is not very good at measuring how much meaningful information is in a sentence. Would would be more applicable is a model that would satisfy these criterial.

    1) Given two almost identical sentences, but one with spelling errors, the sentence with no spelling errors would be chosen as having more information.

    2) A random sequence characters that doesn’t convey any meaningful information would a measure of 0 or near 0.

  59. #59 Mark C. Chu-Carroll
    February 15, 2008

    Randy:

    The problem with arguments like that are that they are philosophical, not scientific.

    What does it mean for something to be meaningful?

    We can choose all sorts of definitions of “meaningful”. And we can create all sorts of examples that show that no matter how you define “meaning”, there is no way of distinguishing between a piece of potential information with meaning, and one without.

    In terms of math, there are very good reasons for defining information the way that we do. And no one has ever come up with a reasonable way of identifying or quantifying “meaningful” information, distinct from “random” information. In fact, one of the fundamental facts that we get from Greg Chaitin’s work is a solid proof that you can’t do that.

    Further – the whole “meaning/nonmeaning” thing is almost always brought up in the context of some creationist argument, that relies on some assumption that “you can’t create information”. The thing is, you can. By any definition that measures information of any form, it’s downright easy to show that random processes create information. By any definition that shows that living things
    have information encoded in DNA, there are a thousand easy to find natural processes that also encode information by that definition. As a proof for theism, it’s a total disaster.

  60. #60 Jonathan Vos Post
    February 15, 2008

    I apologize if I ruffled anyone’s feather. I was getting a bit irrational, myself.

    You don’t actually need me in the loop here.

    In no particular order, correct and insightful things are being said in this thread by (recently):
    (1) Caledonian
    (2) Mark C. Chu-Carroll
    (3) Torbjörn Larsson, OM

    Until there is consensus with Randy Stimpson, whose intentions may indeed be perfectly sincere willingness to learn, and be in Socratic dialogue, there is no need for me to get to deep issues that interest me, nor to summarize the conversations that I had with Claude Shannon.

    Except to say that there is a grain of truth in Randy Stimpson’s statement that “the Shannon information model is not very good at measuring how much meaningful information is in a sentence.” Claude Shannon agreed with a statement that I made (rather more subtle and sophisticated in presentation, but overlapping what Randy Stimpson said if I take his statement in the most favorable light, the way Federal judges lean over backwards to make sense of semi-coherent arguments by In Pro Per parties).

    In briefest summary, Claude Shannon told me that the artificial assumption that he made was that the receiver of a message is not structurally changed by having received it from the sender. Whereas, in a common sense way, the whole point of writing or speaking to persuade or to teach or to induce a population to rise up and overthrow a tyrant or to read a love poem and have sex with the author (masterful command of language does increase Darwinian fitness!) is for the sender to succeed in structurally changing the receiver. But how do we put that in equations, prove theorems, and relate it to what we culturally feel about “meaning”?

    I’ll save that conversation for the some time in the future with fellow experts. With all due respect, it’s a little over the head of the typical undergraduate or the equivalent.

    I just had a second very good meeting Harry Gray at Caltech (google him to see who he is). If what we agree on happens, it will change my availability for blogging.

    Until I know for sure, have a beautiful weekend.

    ** returns to lurking **

  61. #61 Randy Stimpson
    February 18, 2008

    Well we haven’t performed any experiments here so we can’t say we are being scientific. We can try and be logical – that’s where math and philosophy help. But when a line of reasoning leads to an absurd conclusion, a mistake has been made somewhere along the line. That is the basis of my argument. Your line of reasoning has led you to believe that a sequence of randomly typed characters can contain more information than a well constructed sentence. Also, if you haven’t noticed, “random information” is an oxymoron.

    I can’t speak for creationists in general because I haven’t read any of their recent publications, but I doubt that they would say that you can’t create information. They are probably asserting that some types of information can only be created by intelligence or that some types of information can’t be created by a random process. Luckily computer programs can’t be generated by random processes or you and I would be out of a job.

    Maybe it’s not possible to mathematically distinguish meaningful information from random data, but you and I can identify some types of meaningful information and we have two ways of doing that – either we understand the language or observe the affects. You know this sentence is meaningful because you understand English. We also know that computer programs contain meaningful information – not because we can read and understand the binary code of an exe — but because we can observe what the program can do. That also is how we know that DNA contains biological information that is considerably more sophisticated than what teams of software engineers like you and I are capable of generating on purpose.

    Lastly I don’t want to go on record as someone who is dismissing Claude Shannon or information theory. Who wants to argue with a guy that has 10 honorary doctorates? I am only saying that the application of information theory has its limits and I think you agree with that. Shannon information theory will not tell you how much meaningful information is in this paragraph, how complex a computer program is, or anything about the quality of biological information in DNA.

  62. #62 Mark C. Chu-Carroll
    February 18, 2008

    Randy:

    The problem with your argument is that it’s based on total ignorance of what information theory says. You can babble until you’re blue in the face – but it doesn’t make a bit of difference if you don’t have the slightest clue of what you’re talking about.

    For example, “Random information” is not an oxymoron.

    When we look at information mathematically, one of the important things that we need to do is to understand just how much information is contained in a particular string. What makes that hard is that most strings contain a huge amount of redundancy.

    The ultimate definition of information content is, roughly, how long the string is in its maximally compressed form.

    What is a maximally compressed string? It’s something where looking at any subsequence of the string gives you no clue of what the rest of the string looks like. In other words, it’s a string that meets any reasonable definition of randomness.

    The problem with your examples is that you just don’t understand what we’re looking at in terms of information. We’re looking at strings of symbols. You want to look at a different level – you want to say “Those symbols represent something in the english language, and so they contain more meaning”. But the string doesn’t encode the information that “this string is english text”.

    Another of the tricky things in information theory is that we can only talk about the quantity of information relative to a particular computing system – that is, relative to a system that assigns meaning to the string. When you try to add english meaning to one string, what you’re really doing is asserting that that string implicitly contains a shitload of exterior information. But in information theory, you don’t get to do that. If you want english semantics, you have to find some way of encoding those semantics into a string.

    If you really wanted to talk about a string like “George Washington was the first president”, and you wanted to say that that string included the information about what that string represents in english, you would need to add a description of how the ISO-Latin1 characters encode english words, and how english words encode the semantics that express the information that you want to count in that string. (Robert Carpenter has a decent, if somewhat difficult, book on how to encode natural language semantics in terms of Lambda calculus, which is enough to express the meaning that you want.) The thing is, if you did that, what you’d end up with is a tremendously long, random-looking string – which, if you looked at it without knowing what it was, you would then assert doesn’t contain any information, because it’s just random.

    It’s just a big old stupid game where you pretend like you’re not totally ignorant of what you’re talking about, and keep babbling the same phrases – which are, in fact, meaningless, because you don’t know what you’re talking about.

    Go read Chaitin’s textbook, and then maybe we can have an intelligent conversation.