Both in comments, and via email, I've received numerous requests to take a look at the work of Dembski and Marks, published through Professor Marks's website. The site is called the "Evolutionary Informatics Laboratory". Before getting to the paper, it's worth taking just a moment to understand its provenance - there's something deeply fishy about the "laboratory" that published this work. It's not a lab - it's a website; it was funded under very peculiar circumstances, and hired Dembski as a "post-doc", despite his being a full-time professor at a different university. Marks claims that his work for the EIL is all done on his own time, and has nothing to do with his faculty position at the university. It's all quite bizarre. For details, see here.
On to the work. Marks and Dembski have submitted three papers. They're all in a very similar vein (as one would expect for three papers written in a short period of time by collaborators - there's nothing at all peculiar about the similarity). The basic idea behind all of them is to look at search in the context of evolutionary algorithms, and to analyze it using an information theoretic approach. I've picked out the first one listed on their site: Conservation of Information in Search: Measuring the Cost of Success
There's two ways of looking at this work: on a purely technical level, and in terms of its presentation.
On a technical level, it's not bad. Not great by any stretch, but it's entirely reasonable. The idea of it is actually pretter clever. They start with NFL. NFL says, roughly, that if you don't know anything about the search space, you can't select a search that will perform better than a random walk. If we have a search for a given search space that does perform better than a random walk, in information theoretic terms, we can say that the search encodes information about the search space. How can we quantify the information encoded in a search algorithm that allows it to perform as well as it does?
So, for example, think about a search algorithm like Newton's method. It generally homes in extremely rapidly on the roots of a polynomial equation - dramatically better than one would expect in a random walk. For example, if we look at something like y = x2 - 2, starting with an approximation of a zero at x=1, we can get to a very good approximation in just two iterations. What information is encoded in Newton's method? Among other things, it's working in a Euclidean space on a continuous, differentiable curve. That's rather a lot of information. We can actually quantify that in information theoretic terms by computing the average time to find a root in a random walk, compared to the average time to find a root in Newton's method.
Further, when a search performs worse than what is predicted by a random walk, we can say that with respect to the particular search task, that the search encodes negative information - that it actually contains some assumptions about the locations of the target that actively push it away, and prevent it from finding the target as quickly as a random walk would.
That's the technical meat of the paper. And I've got to say, it's not bad. I was expecting something really awful - but it's not. As I said earlier, it's far from being a great paper. But technically, it's reasonable.
Then there's the presentation side of it. And from that perspective, it's awful. Virtually every statement in the paper is spun in a thoroughly dishonest way. Throughout the paper, they constantly make statements about how information must be deliberately encoded into the search by the programmer. It's clear the direction that they intend to go - they want to say that biological evolution can only work if information was coded into the process by God. Here's an example from the first paragraph of the paper:
That's the first one, and the least objectionable. But just half a page later, we find:
The significance of COI [MarkCC: Conservation of Information - not Dembski's version, but from someone named English] has been debated since its popularization through the NFLT [30]. On the one hand, COI has a leveling effect, rendering the average performance of all algorithms equivalent. On the other hand, certain search techniques perform remarkably well, distinguishing themselves from others. There is a tension here, but no contradiction. For instance, particle swarm optimization [10] and genetic algorithms [13], [26] perform well on a wide spectrum of problems. Yet, there is no discrepancy between the successful experience of practitioners with such versatile search algorithms and the COI imposed inability of the search algorithms themselves to create novel information [5], [9], [11]. Such information does not magically materialize but instead results from the action of the programmer who prescribes how knowledge about the problem gets folded into the search algorithm.
That's where you can really see where they're going. "Information does not magically materialize, but instead results from the action of the programmer". The paper harps on that idea to an inappropriate degree. The paper is supposedly about quantifying the information that makes a search algorithm perform in a particular way - but they just hammer on the idea that the information was deliberately put there, and that it can't come from nowhere.
It's true that information in a search algorithm can't come from nowhere. But it's not a particularly deep point. To go back to Newton's method: Newton's method of root finding certainly codes all kinds of information into the search - because it was created in a particular domain, and encodes that domain. You can actually model orbital dynamics as a search for an equilibrium point - it doesn't require anyone to encode in the law of gravitation; it's already a part of the system. Similarly in biological evolution, you can certainly model the amount of information encoded in the process - which includes all sorts of information about chemistry, reproductive dynamics, etc.; but since those things are encoded into the universe, you don't need to find an intelligent agent to have coded them into evolution: they're an intrinsic part of the system in which evolution occurs. You can think of it as being like a computer program: computer programs don't need to specifically add code into a program to specify the fact that the computer they're going to run on has 16 registers; every program for the computer has that wired into it, because it's a fact of the "universe" for the program. For anything in our universe, the basic facts of our universe - of basic forces, of chemistry, are encoded in their existence. For anything on earth, facts about the earth, the sun, the moon - are encoded into their very existence.
Dembski and Marks try to make a big deal out of the fact that all of this information is quantifiable. Of course it's quantifiable. The amount of information encoded into the structure of the universe is quantifiable too. And it's extremely interesting to see just how you can compute how much information is encoded into things. I like that aspect of the paper. But it doesn't imply anything about the origin of the information: in this simple initial quantification, information theory cannot distinguish between environmental information which is inevitably encoded, and information which was added by the deliberate actions of an intelligent agent. Information theory can quantify information - but it can't characterize its source.
If I were a reviewer, would I accept the paper? It's hard to say. I'm not an information theorist; so I could easily be missing some major flaw. The style of the paper is very different from any other information theory paper that I've ever read - it's got a very strong rhetorical bent to it which is very unusual. I also don't know where they submitted it, so I don't know what the reviewing standards are - the reviewing standards of different journals are quite different. If this were submitted to a theoretical computer science journal like the ones I typically read, where the normal ranking system is (reject/accept with changes and second review/weak accept with changes/ strong accept with changes/strong accept), I would probably rank it either "accept with changes and second review" or "weak accept with changes".
So as much as I'd love to trash them, a quick read of the paper seems to show that it's a mediocre paper, with an interesting idea. The writing sucks: it was written to try to make a point that it can't make technically, and it makes that point with all the subtlety of a sledgehammer, despite the fact that the actual technical content of the paper can't support it.


Comments
Even if the information theoretic part is okay, I would be appalled to see this paper in a journal. What if someone wrote an (otherwise) excellent paper about a new quantum programming technique, but constantly throughout the paper kept saying how it proved coat hangers are edible?
Posted by: Andrew | September 17, 2007 10:40 AM
Read "Intelligent Design and the NFL Theorems: Debunking Dembski" by Olle Häggström (pretty smart probability theorist) http://www.math.chalmers.se/~olleh/papers.html
Posted by: mo | September 17, 2007 11:56 AM
one can say: this paper tells a lot of interesting and true things. the only problem is that true things in it are not interesting, and interesting things are not true.
Posted by: krisztian pinter | September 17, 2007 1:00 PM
Mark,
Your analysis is very well stated. As you point out, the technical side of their work and the presentation thereof are two separate matters. I agree that the presentation has some serious problems. It serves to obscure rather than clarify concepts that I find rather trivial, and it seems intended to create an impression of ID-friendliness.
As for the technical side, they simply explore the various ways that searches can be adjusted to increase their efficiency. I'm sure that this has been done many times before, even in homework assignments.
The parts that I have problems with are:
a) Casting the concepts in terms of information
b) The quantification of the increase in efficiency
c) The arbitrary nature of selecting a baseline search
In classical information theory, one information measure of a message is the surprisal, which is the negative log of the probability of the message. This is the measure that D&M use, but they use it in a strange way: They take the probability that a search will succeed -- that is, the "message" is binary, either "success" or "failure" -- but instead of associating the information with the success/failure outcome of the search, they associate it with the search itself, which is confusing to say the least.
What's even more confusing is their definition of active information. AI is not the negative log of a probability; rather, it's the negative log of a ratio of probabilities. So how do we pinpoint the "message" that contains this information?
Here's an attempt. The event A associated with the active information is defined such that P(A)=P(B)/P(E) where B is the success of a baseline search and E is the success of an efficient search. Let's pretend that the parameters of the efficient search were chosen from all possible parameters; we'll call this selection event X. NFL tells us that P(X)*P(E|X) = P(X). After the negative log transformation, I(A)
But how is that useful? I'm having a hard time seeing the significance of this measure. Can we conclude anything useful from it that we didn't already have to know in order to calculate it? As far as I can tell, no.
As far as the baseline search, D&M tell us only that it's a blind search. But what is the search space? And what is the search structure? For instance, in this paper, they explore two blind searches, one of which is much more effective than the other (or so they say) because of its search structure. Which of those should be used as a baseline? It seems rather arbitrary.
(If you've read this far, I'll throw in another tidbit. I'm pretty sure that the numerical results and conclusions in the paper cited above are completely wrong, so D&M will have to rewrite it. You heard it here first.)
Posted by: secondclass | September 17, 2007 1:14 PM
Dang less-than's and greater-than's. Here's another try:
"Here's an attempt. The event A associated with the active information is defined such that P(A) = P(B)/P(E) where B is the success of a baseline search and E is the success of an efficient search. Let's pretend that the parameters of the efficient search were chosen from all possible parameters; we'll call this selection event X. NFL tells us that P(X)*P(E|X) <= P(B), so P(X) <= P(B)/P(E|X), so P(A) >= P(A). After the negative log transformation, I(A) <= I(X). That is, the active information measure gives us a lower bound on the information associated with the selection of the particular search."
Posted by: secondclass | September 17, 2007 1:24 PM
i gotta say, i'm with Andrew. the technical ideas in the paper are acceptable, yeah, but they're being abused (blatantly and inelegantly abused at that) in a vain attempt at making a theological point. it ruins the paper's conclusions, it stops any less critical reader from getting the most out of the information they provide and it gives them free publicity; it doesn't belong in a computing journal, imo.
Lepht
Posted by: Lepht | September 17, 2007 3:08 PM
So, to generalize, what if one would write a very simple but very general algorithm (let's call it POKE-AROUND) that would quickly but randomly create other algorithms, a tiny fraction of which might turn out to be useful for search. Would it be conceivable that, in time, our POKE-AROUND algorithm would stumble across a more efficient search algorithm that may actually encode by chance, some knowledge about the search space? And because POKE-AROUND itself is a very simple program, can it be that itself could be a result of chance, especially given a long time?
Posted by: n3w6 | September 17, 2007 3:11 PM
n3w6:
Nope. Using a meta-search to find a search doesn't work. A meta-search that chooses from some set of options or random inputs to find a search is really just itself another search - and falls victim to NFL in the same way as any other search.
The thing is, that's not a problem. Dembski likes to try to claim NFL as a much stronger result than it actually is. NFL only talks about the properties of searches averaged over all possible search spaces. It doesn't saying anything about how searchable particular sets of spaces are.
For some search spaces, it's very easy to find search algorithms that converge on solutions very quickly. NFL relies on the fact that you're talking about performance averaged over all possible search spaces - and like most mathematical structures, most theoretically possible search spaces are highly irregular, and have no properties that are easily exploitable.
Think of it like this: most functions - the overwhelming majority of functions - are neither differentiable nor continuous. Given an arbitrary function which you know nothing about, there's no way to find its zeros faster than just randomly guessing until you get one. But if you're working with polynomials, then you can easily find its zeros using a simple search process.
Search is exactly the same way. Given a search space about which you know nothing, there's no way to pick an algorithm that will do better than random. But if you know that your landscape is a smooth, continuous, differentiable surface in R3, there are a ton of search algorithms that
will perform better than random.
Posted by: Mark C. Chu-Carroll | September 17, 2007 4:21 PM
"those things are encoded into the universe"
Yep, and who encoded the universe ...
I think it's just another way to say "if there's evidence of evolutions it's only because God wanted it to be that way". Same reason he went through the trouble of burying those fern leaves in the coal; artistic license so to speak.
Posted by: Mu2 | September 17, 2007 4:53 PM
Correction to my correction: Replace "so P(A) >= P(A)" with "so P(A) >= P(X)" in my second post above.
Posted by: secondclass | September 17, 2007 6:43 PM
Mark-
Interesting post, but you should prepare yourself psychologically for the inevitable distortion of what you wrote over at Dembski's blog. A while back I wrote a similar post about one of Dembski's papers, describing things in much the same way as you did. I concluded that technically the paper was acceptable, but that it was abysmally written and that its broader conclusions were not correct. Along the way I remarked that the proofs seemed to be correct and that Dembski knew how to manipulate his symbols. It wasn't long before Salvador Cordova was gushing over at Uncommon Descent that I had said Dembski's paper was correct. Ugh.
Also, I think you mean “random search” as opposed to “random walk.” A random walk usually refers to a situation where you are moving through some discrete space in which from every point you have known probabilities of moving to certain subsequent points. A random search is when you choose the points to sample at random from the search space. Thus, there is no connection between the point you sample next and the previous points you have already sampled. I think it was the latter meaning that is intended in the NFL theorems.
Posted by: Jason Rosenhouse | September 17, 2007 10:38 PM
Let me put your critique in a nutshell to check my understanding. If a given search method works better than a random walk in some search space, that fact tells us about (some) properties of the search space. However, it tells us nothing about where those properties came from, and nothing about how that search method came to be used on that search problem. Is that it?
Posted by: RBH | September 17, 2007 11:09 PM
See Mathworld for the hotlinks and citations to the mathematical literature for this excerpted definition:
Random Walk. From MathWorld--A Wolfram Web Resource.
"A random process consisting of a sequence of discrete steps of fixed length. The random thermal perturbations in a liquid are responsible for a random walk phenomenon known as Brownian motion, and the collisions of molecules in a gas are a random walk responsible for diffusion. Random walks have interesting mathematical properties that vary greatly depending on the dimension in which the walk occurs and whether it is confined to a lattice."
Posted by: Jonathan Vos Post | September 18, 2007 1:13 AM
Also see the recent PT thread
http://www.pandasthumb.org/archives/2007/09/how_does_evolut.html
where at least one Tom English weighs in, not so much on the math as on the Baylor controversy.
I agree that the technical points about "active information" are valid, but only mildly interesting. re his Chengdu keynote, I find it hard to believe that the "ev" algorithm contributed negative information per these definitions. Is Marks being misleading or is he cherry picking an example, the way UD bloggers like to harp on Dawkins' WEASEL?
ev is itself an intersting case. Schneider makes some effort to make it a "biologically realistic" model, and makes some argument that this realism is important. But then so much about the model isn't realistic, it kind of undercuts his argument.
Posted by: David vun Kannon | September 18, 2007 4:47 AM
Mark,
I appreciate your balanced response. I'm "somebody named English" who wrote in 1996 that NFL is a consequence of conservation of Shannon information in search. I'm affiliated now with Bob Marks and the EvoInfo (virtual) lab, but I'm an adversary of ID. You can find more about that at The Panda's Thumb.
Not to nitpick, but in the NFL framework the search space (aka solution space) is the domain of the cost functions, which is fixed. I wish Wolpert and Macready hadn't spoken so much of averages in their plain-language remarks. Their theorems actually show that all algorithms have identically distributed results. A search result is, loosely, the sequence of costs obtained by the algorithm, and search performance is a function of the search result. If all algorithms have identically distributed results, then all algorithms have identically distributed performance and identical average performance.
Easy for an intelligent agent like you, but easy for some algorithm? Dembski wants precisely to show that you, and not an algorithm, can easily design a search algorithm with higher performance.
Yes, almost all cost functions are algorithmically random, or nearly so. A consequence is that for the typical cost function, almost all algorithms obtain good solutions rapidly. (Intuitively, there are just as many good solutions as bad ones, and they're scattered all about. It's hard not to bump into one.) To put it another way, all search algorithms are almost universally efficacious. See Optimization Is Easy and Learning Is Hard in the Typical Function.
My next paper will demonstrate clearly that this is a theoretical result, not a practical one.
Some care is required in speaking of an algorithm being better or worse than random search. There is the performance of a particular realization of random search of a cost function, and there is also the expected performance of random search of that function.
When there is no information to suggest that any (deterministic) algorithm will perform better than any other on a cost function to be drawn randomly (not necessarily uniformly), selecting an algorithm uniformly is the best you can do. But a single run of random search is precisely equivalent to uniformly selecting a deterministic algorithm and running it (see No More Lunch). So picking an algorithm randomly vs. applying random search is a distinction without a difference.
Hope I didn't talk your ear off. I love this stuff.
Posted by: Tom English | September 18, 2007 8:07 AM
The Newton's method comparison impressed me with how much information the problem space contributes. In evolution the problem space is constrained by both physical and chemical laws.
And as evolution is also a natural process there is a lot of contribution of inherent information here too. A lot of mechanisms aren't accounted for in these papers. If we measure information as randomness, there is randomness inherent in evolution. Both in evolutionary mechanisms, such as crossovers in sexual reproduction, and evolutionary processes, such as fixation during selection, or drift.
For the papers the discussion we have here and elsewhere will help creationists generally (the interest and rational analysis stroking their egos) and specifically (in making the papers better). But as they can't support what they pretend to support (teleology in evolution) it will not matter much.
Perhaps they can find that biologically inspired software models have amounts of "artificial" information inserted. But they don't need to as targets can be randomly selected and constraints are natural. After all, evolutionary theory is explicitly non-teleological in its description of biological systems behavior so we can make natural models the same.
Ironically, observations in sciences are most often artificially produced from experiments, and they can still be used to test predictions. But ID wants to tear the whole of science down, so they ignore that.
Posted by: Torbjörn Larsson, OM | September 18, 2007 8:23 AM
He is cherry-picking for all I can see. He runs a comparison with ev over a wide range of parameters outside the area Schneider expressly says will model the biological situation and hence evolution in action. There isn't really a discussion in the paper, but in the conclusion Marks and Dembski weasel-word the following:
Which is exactly true, of course. The fortunate matching is when the parameters match the biological model.
Also, Marks totally disregards that the ev perceptron itself models the genetic machinery that has earlier resulted from evolution. So as I understand it there is at least two technical problems in that paper.
Yes, Schneider's model isn't fully realistic, he discusse a lot of approximations and omissions. Also, he assumes (which is the fortunate matching) that the program should mimic independent variation. This is the usual situation in nature, but note that the biological theory itself neither demands or requires that, only "variation".
But as a demonstration that the genome accepts Shannon information from the environment, it is an interesting experiment. (I wouldn't say test, since evolutionary theory doesn't concern information.) But I wish Schneider had picked a simpler and more clearcut model to demonstrate it with.
Posted by: Torbjörn Larsson, OM | September 18, 2007 8:57 AM
Oops. Too hasty there:
As it was put here, misleading or cherry-picking, he is also misleading IMO. The cherry-picking is in using an evolutionary algorithm modeling a natural system instead of examples of designed algorithms. The misleading is in the discussion of Schneiders choice of parameters and the weasel words in the conclusion.
biological theory itself neither demand or predicts that,
Posted by: Torbjörn Larsson, OM | September 18, 2007 9:06 AM
RBH:
Yes, exactly. Well said!
Posted by: Mark C. Chu-Carroll | September 18, 2007 9:33 AM
When I beta tested John Holland's book on the Genetic Algorithm (1975-1976) and was the first to use it to find solutions to an unsolved problem in the scientific literature, I was guided by Oliver Selfridge (Father of Machine Perception).
He had a bunch of grad students compete in a learning problem, the repeated "coin guessing" problem (2x2 payoff bimatrix, but similar in learning complexity to rock-paper-scissors).
My program came in second of over a dozen competing. It was a GA with its own parameters coded into the "gene" string, which varied its chromosome length and other parameters based on the history of the competition.
Mine lost to a program that bundled many smaller programs within it, and passed the token between them to the one whose score would have been best so far if it had been the one with the token all along.
The (British) author of the winner was hired by IBM, and moved to Florida to work on the not-yet-released PC.
Few people today seem to understand the implicit parallelism in GA (in that it is exponentially evolving sampled "schema" of chromosomes in a larger search space concurrently with evolving chromosomes in the base search space).
I still have unanswered questions on the meta-GA which evolves its own parameters concurrently with evolving populations. My questions and partial results of 1/3 century ago have been cited in refereed papers by Prof. Philip V. Fellman.
In the No Free Lunch Theorem, and its abuses, I am still unsure about the definition of "algorithm" and "search space" and "cost function" and the like. I suspect that there are hidden assumptions about distributions and the spaces of possible spaces.
Posted by: Jonathan Vos Post | September 18, 2007 1:05 PM
Easy for an intelligent agent like you, but easy for some algorithm? Dembski wants precisely to show that you, and not an algorithm, can easily design a search algorithm with higher performance.
What is an "intelligent agent"?
Posted by: Coin | September 18, 2007 3:11 PM
Hmm, coin. Know anything about collective intelligences (COINs)?
I was trying to echo Dembski. I should have written "an intelligence like yours" to stay clear of embodiment and agency. In intelligent design, an intelligence is a supernatural (ID proponents used to say "non-natural," and now say "non-material") source of information. If the active information in a search seems too much to have arisen naturally, Dembski will say it must have come from an intelligence.
Few who seriously investigate intelligence in animals believe there is any one thing that constitutes intelligence, or that intelligence is anything but a hypothetical construct. The norm is to define intelligence operationally, and definitions differ hugely from study to study. Unfortunately, some very bright scientists and engineers slip into treating intelligence as a vital essence that inheres in some systems and not in others. They haplessly play into the hands of ID advocates who are better philosophers than they.
No doubt many ID proponents secretly equate "unembodied" intelligence with spirit. That is, humans are able to create information because they, created in the image of God, are spiritual, and not just physical entities.
Posted by: Tom English | September 19, 2007 1:55 AM
Hi, David.
The measure is relative, and is negative simply because the ev doesn't perform as well as random search does on average.
There have been several times over the years that I have suggested in reviews of conference papers that the authors compare their fancy new algorithms to random search. Given that random search is, loosely speaking, the average search, using it to establish a baseline makes a lot of sense.
Posted by: Tom English | September 19, 2007 2:32 AM
Tom English:
"So picking an algorithm randomly vs. applying random search is a distinction without a difference."
Isn't this true when the algorithms being selected contain no information about the target, but not true if it does?
Posted by: Anonymous | September 19, 2007 10:07 AM
Tom English:
"No doubt many ID proponents secretly equate "unembodied" intelligence with spirit. That is, humans are able to create information because they, created in the image of God, are spiritual, and not just physical entities."
Well said. So why don't you agree?
Posted by: Anonymous | September 19, 2007 10:21 AM
Anonymous:
That would be because it's nonsense. First - when we're talking about information theory, the idea of "unembodied", "spiritual", "not just physical" are all undefinable concepts. They just don't mean anything in terms of the theory. If you want to adopt information theory for an argument, you're stuck working in terms of the concepts that are defined in the framework of information theroy.
Second, according to the definition of information in information theory, information is *constantly* produced by what are presumed to be un-intelligent, purely physical entities, by natural processes that are effectively random.
The ID folks want to create some special distinguished kind of "information" which can only be produced by intelligent agents. That's really the idea behind specified complexity, irreducibly complexity, and several other similar arguments. The problem is, they can't define what an intelligent agent *is* by anything other than a silly circular argument. What's an intelligent agent according to Dembski? An agent that can produce specified complexity. What's specified complexity? Complexity which has a property that could only be created by an intelligent agent.
They drown those ideas in dreadful prose and massive amounts of hedging, to try to distract people from noticing the ultimate circularity of it. But look at anything by Dembski: does he *ever* offer a precise definition of specification, which doesn't contradict his definition of complexity?
Posted by: Mark C. Chu-Carroll | September 19, 2007 11:12 AM
Tom English:
You're entirely welcome to talk my ear off all you want.
Two of my favorite things on my blog are having people who know more than me drop by and teach me something; and having someone involved in something I'm writing about come by to join the conversation.
Posted by: Mark C. Chu-Carroll | September 19, 2007 11:15 AM
It is unlikely that this conversation will resolve the question: "is intelligence the result of entirely physical processes?"
That is a metaphysical question.
Most neurophysiologists assume that intelligence can be reduced to an emergent property of neurons (possibly with DNA, RNA, Protein interaction of some sort as well as electrochemical) of specific structure in a network of specific structure which learns by specific structural changes.
Most practioners or theorists of Artificial Intelligence assume that intelligence is an emergent property of software (perhaps in AI languages) running on commercial hardware.
The argument about existence or nonexistence of "spirits" of various kinds, elves, angels, demons, gods, hinges on the metaphysical stance.
The argument about animal rights, on the basis that animals are intelligent in the same way (albeit a different quantity) as humans hinges on the metaphysical stance.
After spending several years operating within the cult of Strong AI, back in the early and mid 1970s in grad school, I have retreated to being a strong AI agnostic.
The late Alex the Parrot slightly shifted my belief in animal intelligence in the direction that John Lilly tried to persuade me decades earlier about dolphins.
I think that ID is a metaphysical stance trying to pretend that it is a Scientific Theory. It is rather difficult to apply Math to Metaphysics. I have joked here before about Theomathematics and Theophysics. But the ID advocates are not joking.
Posted by: Jonathan Vos Post | September 19, 2007 12:12 PM
Anonymous says:
Really? Have you read what Dembski says about "unembodied designers" in NFL? It's quite brilliant, you know.
Dembski has no problem incorporating "unembodied designers" into information theory via quantum mechanical probabilities. You should read it.
Keith Devlin, I believe it is, wrote a review of NFL. In it he points out the rather severe limitations of both Shannon information and Kolmogorov complexity. CSI, is a much more realistic concept of what we generally mean by "information". Let' remember that both Shannon and Kolmogorov were dealing with digital codes; hardly the stuff of normal day life (except for code writers).
Really? Is it a circular argument? Let's see: we find in nature something that is both complex and specified by some independent pattern; and if the complexity is of sufficient magnitude, then design is inferred. What's circular about that? It's the conjunction of a specified pattern and a high level of complexity that allows us to draw such an inference. There is no special property of complexity. Complexity ends up being simply the inverse of Shannon information. You seem entirely comfortable with that notion, right?
The problem with defining 'specification' is that it involves a simultaneous intellectual act, and to define its mathematical constituents is not easy, nor does it lend itself to simple exposition. It's generally the recognition of a pattern which induces a rejection region in the extremal ends of a uniform probability distribution of such magnitude as to exceed the universal probability bound of 1 in 10^-150.
Now, if you want circularity, how's this: Who survives? The fittest. Who are the fittest? Those who survive.
Posted by: Lino D'Ischia | September 19, 2007 12:55 PM
Lino:
I've read Dembski's NFL stuff, and I've commented on it on this blog multiple times. It doesn't do anything to define just what an "unembodied" intelligence is.
Specified complexity is, as I've argued numerous times, a
nonsensical term. Dembski is remarkably careful in presentations and writings to never precisely define just what specification means.
There's a good reason for that. Because specification, as
he defines it informally, translated into formal terms, means one of two things.
One possibility, the more charitable one, is that specification is a kind of subset property of information. That is, a specification of a system is a partial description of it - a description which includes some set of properties that the full information must have. A system that matches the specification contains the properties described by the specification - in information theoretic terms, the embodiment of the specification contains a superset of the information in the specification. The problem with this one is that under this definition of specification, every complex system is specifiable. You can always extract a subset of the information in a system, and use it to create a specification of that system; and every specification can be realized by an infinite number of complex systems. If everything complex has specified complexity; and every specification can be realized by a variety of complex systems, then SC is useless and meaningless.
The other possible sense of specification is the opposite of complexity. Under this definition, a
specifiable system is a system that can be completely described by a simple specification. But if the specification is simple and completely describes the system, then according to information theary, the system cannot by complex. Using this definition (which Dembski implies is the correct one in several papers, while leaving enough weasel-space to wiggle out),
a system with "specified complexity" is a system which
has both high information content (complex) and low information content (specification) at the same time.
And I'll point out that you engage in exactly the same kind of weaseling as Dembski. You can't define specification. You want to claim that Dembski's math defines some new kind of information theory, and that that theory gives you a handle on how to capture ideas which cannot be represented in conventional information theory. But you can't give a mathematical definition. You can't define what specification means, or how to compute it. Why is that?
Finally, your question about the circularity of survival of the fittest: any time you reduce a complex scientific theory down to a trivial one-sentence description, you're throwing out important parts of it. If all evolution said was embodied in "survival of the fittest", then you'd be right that it would be an empty, meaningless thing that explained nothing: the individuals that live to reproduce are the individuals that live to reproduce.
But in fact, that description of evolution is an example of the first possible definition of "specification" as given above. The fact that some individuals survive and some do not, and only the ones that survive reproduce - is a crucial
ingredient in the process of evolution. You could call it a specification of one necessary aspect. But just like that definition of specification doesn't do what Dembski wants it to do, it doesn't work well in this case. Because it's incomplete, and can be matched by both the real observed phenomenon of evolution, and numerous other phenomena as well.
"Survival of the fittest" leaves out crucial parts of the real definition of evolution. Evolution isn't just the fact that some survive and reproduce, and some don't. It also includes change: the population of individuals is undergoing a constant process of change. Every individual has mutations in their genes. When those mutations help, the individual might manage to survive when others wouldn't. When those mutations hurt, the individual might not survive where others would. The effect of change combined with differential success means that the genetic makeup of the population is changing over time.
Even that is a simplification, but a far more informative and complete one than "survival of the fittest". And it demonstrates why there's no real circularity.
On the other hand, Dembski, by refusing to provide real definitions of specification, intelligence, etc., turns his
voluminous writings into a meaningless pile of rubbish, because at its foundation, it has no meaning. Because it lacks any actual meaningful foundations, the whole thing collapses under its own weight. It's just a smoke-screen, trying to hide the fact that there's nothing really there.
Posted by: Mark C. Chu-Carroll | September 19, 2007 1:55 PM
Lino:
Dembski's CSI is utter crap. See my paper with Elsberry, http://www.talkreason.org/articles/eandsdembski.pdf , which explains in detail why CSI is incoherent and doesn't have the properties Dembski claims.
Posted by: Jeffrey Shallit | September 19, 2007 2:36 PM
Lino D'Ischia :
And they differ from classical probabilities how?
This is crap á la Dembski.
First, there is no "universal probability bound". Sometimes it is useful to exclude improbable events, but that is always made in a specific model which tells you what limits to use.
Second, you assume that the process you observe has a uniform probability. That is uncommon in natural processes. Every energy driven process that dissipates energy will see the system visit improbable states where it is driven. Dissipation requires such states or the energy would be conserved. And the biosphere is energy driven by the sun and dissipating into space.
Third, we know that selection enhances evolution rates so that new traits appears and fixates on much shorter time scales than the above bound implies. For example, human populations have evolved lactose tolerance several times in recent history, when effective population sizes have been a few thousand in the herders areas. So we are discussing evolution rates for new traits of at least 10^-6 traits/generation or so, in sexual populations. Those traits aren't planned but is the process response to the environment.
Posted by: Torbjörn Larsson, OM | September 20, 2007 2:37 PM
And they differ from classical probabilities how?
Well, they're complex. And as we all know, God is an imaginary number.
Posted by: Coin | September 20, 2007 2:49 PM
Mark C. Chu-Carroll wrote, "It also includes change: the population of individuals is undergoing a constant process of change."
Which was precisely the argument that disconcerted a couple of evengelicals who came to my door yesterday.
I pointed out that things designed by humans are largely identical. There is little variation in the shape of a door or a window, extruded vinyl siding is incredibly homogenous; things that are designed by an intelligence are often made as identical copies to the best of our abilities. (Which is one of the points of the six-sigma initiatives.)
Things growing by natural processes have far greater differences than those designed by man. (With obvious exceptions of course.)
I pulled a few leaves off the ivy and showed them the vast differences found even on the same plant. Size, shape, and color, were all explainable by natural processes but not even close to what we see in items which are designed.
I didn't convince them, but I think they might have seen my point. To them, I suspect, it made their creator even more impressive.
Posted by: Flex | September 20, 2007 3:25 PM
I'm giving this a very casual response. For any algorithm with information there's a corresponding algorithm with misinformation (negative information). If you fix the cost function and randomly draw algorithms a large number of times, the positive and negative information cancel one another out.
Posted by: Tom English | September 20, 2007 9:35 PM
Mark,
I've read and enjoyed your comments many times. When I was in grad school, a friend of mine used to say, as he headed off to teach, "Well, guess I'll go stomp me out some ignorance." You keep on stomping, guy.
Tom
Posted by: Tom English | September 20, 2007 10:15 PM
Tom,
Thanks for responding to my comment. I understand the claim that ev was relatively worse than random search and therefore contributed "negative information". My question, looking at the slides in chengdu.ppt on the EvoInfo resources page, was how that claim is supported. Even allowing for all the skipped steps that I would expect in a keynote, not a rigorous presentation, I find Marks' claims difficult to believe. The numbers thrown around on those slides just don't make a coherent argument to me.
I'm happy to discuss the weakneses of ev if it comes to that, just like I'm happy to discuss the weaknesses of WEASEL. That's what I call cherrypicking a weak example. However, if Marks' numbers are wrong, that is what I would (charitably) call misleading.
Posted by: David vun Kannon | September 20, 2007 11:58 PM
http://www.thenation.com/doc/20071008/hacking
Root and Branch
by IAN HACKING
The Nation
[from the October 8, 2007 issue]
First the bright side. The anti-Darwin movement has racked up one astounding achievement. It has made a significant proportion of American parents care about what their children are taught in school. And this is not a question of sex or salacious novels; the parents want their children to be taught the truth. None of your fancy literary high jinks here, with truth being "relative." No, this is about the real McCoy.
According to a USA Today/Gallup poll conducted this year, more than half of Americans believe God created the first human beings less than 10,000 years ago. Why should they pay for schools that teach the opposite? These people have a definite and distinct idea in mind. Most of the other half of the population would be hard-pressed to say anything clear or coherent about the idea of evolution that they support, but they do want children to learn what biologists have found out about life on earth. Both sides want children to learn the truth, as best as it is known today.
The debate about who decides what gets taught is fascinating, albeit excruciating for those who have to defend the schools against bunkum. Democracy, as Plato keenly observed, is a pain for those who know better. The public debate about evolution itself, as opposed to whether to teach it, is something else. It is boring, demeaning and insufferably dull.
[truncated]
The Discovery Institute, a conservative think tank, states that "neo-Darwinism" posits "the existence of a single Tree of Life with its roots in a Last Universal Common Ancestor." That tree of life is enemy number one, for it puts human beings in the same tree of descent as every other kind of organism, "making a monkey out of man," as the rhetoric goes. Enemy number two is "the sufficiency of small-scale random variation and natural selection to explain major changes in organismal form and function." This is the doctrine that all forms of life, including ours, arise by chance. Never underestimate the extraordinary implausibility of both these theses. They are, quite literally, awesome.
[truncated]
Posted by: Jonathan Vos Post | September 22, 2007 2:53 PM
Jeff Shallit:
"Dembski's CSI is utter crap. See my paper with Elsberry, http://www.talkreason.org/articles/eandsdembski.pdf , which explains in detail why CSI is incoherent and doesn't have the properties Dembski claims.
It's taken me a little while to work through your paper; so sorry for the delay. As to the paper, I don't see any substantive criticism by you and Elsberry that makes any serious dents in Dembski's explanation of CSI. What I detect in your criticism, in most instances, is a confusion between the notion of "information" and "Complex-Specified-Information" = CSI, and between "specifying" and the more formal "specification". Now, having said that, the whole notion of what a "specification" is is no easy task. (That's what I alluded to in the previous post.) So it's very understandable that there is a struggle to fully grasp the concept (it proves to be a rather slippery concept), but most, if not all, of your objections, I believe, can be countered.
Not being of the mind to write a 90 page paper to rebut every argument you make, I would be happy to discuss any of these arguments with you. Just select one.
If I may, to get things started, I'll just give one (almost glaring) example of where you fail to distinguish between "information" and CSI, with the result that your argument ends up dissolving away.
In Section 9, "The Law of Conservation of Information", your argument runs along these lines: Ω0 ⊆ Σ*, where Σ and Δ are finite alphabets . . . Dembski justifies his assertion by transfomring the probability space Ω1 by ∫-1. This is reasonable under the causal-history-based interpretation. But under the uniform probability interpretation, we may not even know that j is formed by fi. In fact, it may not even be mathematically meanignful to perform this transform, since j is being viewd as part of larger unifrom probability space, and f -1may not even be defined there.
This error in reasoning can be illustrated as follows. Given a binary string x we may encode it in "pseudo-unary" as follows: append a 1 on the front of x, treat the result as a number n represented in base 2, and then write down n 1's followed by a 0. . . . If we let f: Σ* → Σ * be the mapping on binary strings giving a unary encoding, then it is easy to see that f can generte CSI. For example, suppose we consider an 10-bit binary string chosen randomly and uniformly from the space of all such strings, of cardinality 1024. The CSI in such a string is clearly at most 10 bits. Now, however, we transform this space using f. The result is a space of strings of varying length l, with 1025≤ l≤ 2048. If we viewed this event f(i) for some i we would , under the uniform probability interpretation of CSI, interpret it as being chosen from the space of all strings of length l. But now we cannot even apply f-1 to any of these strings, other than f (i)! Furthermore, because of the simple structure of f(i) (all 1's followed by a 0), it would presumably be easily specified by a target with tiny probability. The result is that f (i) would be CSI, but i would not be."
The first error I see is that you have equated CSI with a 10-bit string. But Dembski very clearly assigns an upper probability bound of 10150, or 2500, or 500 bits. You acknowledge the upper probability bound in Section 11 (CSI and Biology). Since 10-bits falls well short of the 500 bits necessary, it is meaningless to speak of CSI. IOW, both i and f(i) do not exhibit CSI. Now, if you were to use string lengths i of sufficient length (i.e., ≥500), using this "pseudo-unary" program, we would find that the output f(i) would then be between 10140 and 10150 1's. Now there are only 1080 particles in the entire universe, so even if you lined up all the atoms that exist in the world, you would be way short of what you needed.
The second error occurs in the next paragraph on p. 26 where you invoke the Caputo case as a instance of "specification", much like you did in the penultimate sentence I quoted above. The "reference class of all possibilities" in the Caputo case was about a half a trillion. The 40 D's and 1 R was simply one "event" that belonged to that reference class. In order for CSI to be present, the reference class would have to be comprised of at least 10150 elements/events. So, indeed, the 40 D's and 1 R of the Caputo case is certainly "specified", but it doesn't constitute a "specification" because the "rejection region" it defines is not of sufficient complexity.
The third error I see again involves "specification". As I just mentioned, in Dembski's technical defintion of CSI, a "specification" is a true "specification" when the pattern that is identified by the intelligent agent induces a rejection region such that, including replicational resources and specificational resources, the improbability of the conceptual event that coincides with the physical event is less probable than 1 in 10150.
I've already gone farther than I intended. But, before I leave, I want to ask you something about your SAI, formulated in Appendix A. Below are two bit strings, A and B. Using any compression programs you have available to you (I have none; or if I do have them available I sure don't know how to get to them), which of the two ends up with the smallest input string; i.e., which has the greater SAI? And, then, if you can tell me, which of the two is "designed"?
Here they are:
A:
1001110111010101111101001
1011000110110011101111011
0110111111001101010000110
1100111110100010100001101
1001111100110101000011010
0010101000011110111110101
0111010001111100111101010
11101110001011110
B:
1001001101101000101011111
1111110101000101111101001
0110010100101100101110101
0110010111100000001010101
0111110101001000110110011
0110100111110100110101011
0010001111110111111011010
00001110100100111
A:
1001110111010101111101001
1011000110110011101111011
0110111111001101010000110
1100111110100010100001101
1001111100110101000011010
0010101000011110111110101
0111010001111100111101010
11101110001011110
B:
1001001101101000101011111
1111110101000101111101001
0110010100101100101110101
0110010111100000001010101
0111110101001000110110011
0110100111110100110101011
0010001111110111111011010
00001110100100111
Posted by: Lino D'Ischia | September 22, 2007 4:35 PM
Sorry. I don't know how the two bit-strings got duplicated. But that is what it is: a simple duplication. So please ignore the repeat.
Posted by: Lino D'Ischia | September 22, 2007 4:38 PM
I'd heard that Salvador was going offline. Is that true?
In any case, with regard to disembodied designers, I do wonder what the bandwidth of information transfer is "at the limit" as the energy approaches zero.
Posted by: Unsympathetic reader | September 22, 2007 5:34 PM
Unsympathetic Reader:
"I'd heard that Salvador was going offline. Is that true?
In any case, with regard to disembodied designers, I do wonder what the bandwidth of information transfer is "at the limit" as the energy approaches zero.
"
If you're interested in just what "unembodied designers" can do, Dembski talks about that very thing in NFL. He has a very interesting QM take on it. It's really quite brilliant.
As to Sal, what kind of commentary on the biology community is it when someone like Sal has to disappear from blogs so as to not threaten his newly-started up university education?
Is this modern-day Lysenkoism?
Posted by: Lino D'Ischia | September 22, 2007 7:23 PM
Sure, it's quite brilliant, provided what you mean by "quite brilliant" is utter nonsense cleverly written to make it appear as if it says something deep while actually saying absolutely nothing.
Dembski is a master at weaseling around, making compelling looking arguments while leaving enough gaping holes in the argument to allow him to weasel out of any possibly critique.
One of the sad things about quantum theory is how it's become a magnet for liars. Because pretty much no one really understands it, it's easy for people like Dembski to jump in, wave his hands around shouting "quantum, quantum", and pretending that it somehow supports what he's saying.
As for what Sal's disappearance says about the biology community, I'd argue that what is really says is: "If you want to have any chance of being taken seriously as a researcher, you probably don't want to be known as a slimy,
quote-mining, lying sycophant to a bunch of loonie-tune assholes".
Posted by: Mark C. Chu-Carroll | September 22, 2007 9:02 PM
Mark C. Chu-Carroll:
"One of the sad things about quantum theory is how it's become a magnet for liars. Because pretty much no one really understands it, it's easy for people like Dembski to jump in, wave his hands around shouting "quantum, quantum", and pretending that it somehow supports what he's saying."
Mark, I would agree with you on this point. You quite frequently find people extending and extrapolating QM to places and in ways that should never be. But what Dembski does is quite legitimate. He simply points out that the statistical nature of QM permits events taking place that don't involve the imparting of energy but simply a rearranging of the elements of the probability distribution. I don't think I would have ever thought of it.
Posted by: Anonymous | September 22, 2007 10:16 PM
He simply points out that the statistical nature of QM permits events taking place that don't involve the imparting of energy but simply a rearranging of the elements of the probability distribution.
The last part of your sentence doesn't make any sense. Are you talking about measurement of entangled states? This is a cop-out, which distribution?
Here's the quote from Dembski (via talk.origins):
"Thermodynamic limitations do apply if we are dealing with embodied designers who need to output energy to transmit information. But unembodied designers who co-opt random processes and induce them to exhibit specified complexity are not required to expend any energy. For them the problem of "moving the particles" simply does not arise. Indeed, they are utterly free from the charge of counterfactual substitution, in which natural laws dictate that particles would have to move one way but ended up moving another because an unembodied designer intervened. Indeterminism means that an unembodied designer can substantively affect the structure of the physical world by imparting information without imparting energy." [p. 341]
"For now, however, quantum theory is probably the best place to locate indeterminism." [p. 336]
The problem: the processes are not random, they're stochastic. The results will follow QM distributions.
Where is this information being imparted? In atoms? Fermions? Bosons? Spin states? Momentum states? Will I always roll a spin-up? 1st excited state? Left circular polarization? You still need energy to create the perturbation that would favor a quantum state with certain information.
Consider teleportation: If the "unembodied designer" wanted to simply copy his quantum information into the quantum information of another atom, he would still require two extra atoms(or photons, electrons, Josephsen Junctions) along with some pertubations to both couple and change the atoms' state.
Posted by: creeky belly | September 23, 2007 3:52 AM
...Josephsen Junctions) along with some pertubations to both couple and change the atoms' state.
I should mention that the perturbations in this case are creating the two extra atoms, since they can't be co-opted from others (what state would they be in?).
More information on teleportation here.
Posted by: creeky belly | September 23, 2007 4:06 AM
Lino:
Well, I'll give you credit for one thing: at least you've actually read the paper and responded to it, which is more than Dembski has done.
To respond to your critiques: first, you claim that one must have 500 bits to constitute CSI. I say, take that up with Dembski, then because on page 159 of his book "Intelligent Design", Dembski says, "The sixteen-digit number on your VISA card is an example of CSI".
Second, you object to our simple example of how CSI can be generated if one doesn't specify the probability space correctly. But you have failed to understand the objection. The point is that f(i), when viewed as an element of the space of binary strings, does exhibit CSI, since it has 1024 bits. i itself does not because it is too short, but that is precisely our point! Here we have constructed CSI out of applying a function to something that isn't -- something that Dembski claims is impossible.
As for specification, I think you also fail to understand that Dembskian concept. Specification only deals with the assignment of an event to a subset of a reference class of events; there is nothing inherent in a specification that says it must refer to a subset with low probability. Go read section 1.4 of No Free Lunch again. Or go to page 111, where Dembski writes, "The 'complexity' in 'specified complexity' is a measure of improbability". So if the word complexity refers to improbability, it follows that the specified part must not, it itself, be related to probability.
As for your last question, I think you are confused. I am not claiming that Dembski or SAI can "detect design". It is the whole point of our paper that "detecting design" is not something one can determine by mathematical arguments alone.
Posted by: Jeffrey Shallit | September 23, 2007 7:52 AM
You still need energy to create the perturbation that would favor a quantum state with certain information.
No you don't. Unembodied designers can do whatever the hell they want. All Dembski said was that, "for now, however, blah blah indeterminism, blah blah." (I'm paraphrasing.)
Note the tentative "for now, blah blah blah blah."
Lol, "unembodied designers". What hooey!
Posted by: 386sx | September 23, 2007 4:27 PM
"Indeterminism means that an unembodied designer can substantively affect the structure of the physical world by imparting information without imparting energy."
So essentially, the "unembodied designer" is a perpetual motion machine?
Posted by: Tyler DiPietro | September 23, 2007 5:31 PM
Anonymous:
That's exactly what I mean by chanting "quantum, quantum" while waving hands around.
Quantum physics says that there's some level where we don't understand what's going on, and which we can only describe in terms of a probability distribution.
Dembski's argument is, basically, saying that because we don't understand what's happening on that level, that he can stick the actions of his "disembodied designer" into that unexplained level.
It's a clever way of arguing, because it's playing with something that is, genuinely, deeply mysterious. And since we don't have a particularly good understanding of what's going on on that level - even the best experts find it largely incomprehensible - it's very hard for a layman to make any argument against it. So the laymen can't really respond. But Dembski *also* doesn't actually show where/how his "unembodied designer" fits into the intricate and subtle math of quantum physics - so it's too vague for an expert to form a good argument against.
In other words, it's classic Dembski. It sounds very impressive, it's full of obfuscatory math to make it look and sound complicated, but it's so vague and ultimately meaningless that you can't pin it down enough to conclusively debunk it as the nonsense that it is: any attempt at debunking it will simply be met with "But that's not what I meant".
Posted by: Mark C. Chu-Carroll | September 23, 2007 7:35 PM
Jeff:
To respond to your response: First, you dispute my claim that 500 bits of information are necessary to have CSI. But then in responding to my second objection, you say: "The point is that f(i), when viewed as an element of the space of binary strings, does exhibit CSI, since it has 1024 bits." And in disputing my claim you quote Dembski from his book "Intelligent Design", which is why, I guess, in the preamble of your paper you indicate that unless Dembski refutes something from his prior writings, you consider everything he wrote in play (since, as is clear to anyone who compares, the section on Visa cards and phone numbers in "Intelligent Desgin" has been deleted from NFL).
Secondly, in responding to my objection to your example of the "pseudo-unary" function, you say that the output represents 1024 bits, far beyond the 500 bits necessary. But, of course, these are 1024 "pseudo-bits", since the output is a unary output in binary form. Prescinding from this for the moment, for the sake of argument, let's say this really did represent 1024 bits of information. The question is this: Does this, or does it not, represent CSI? I guess you think that this output bit string represents CSI because, like Caputo's string of 40 D's and 1 R, this bit string is "specified". Well here, as I mentioned the first time, I would say you've missed the technical meaning of "specification". CSI is an ordered pair of events (T,E) with T inducing a rejection function that in turn forms a rejection region within the reference class of events. IOW, CSI represents the conjunction of a physical event and a conceptual event. [This is all abundantly clear in "No Free Lunch"]. In the case of this 1024 bit-string, which represents the "physical event", what is the "conceptual event" that describes it and, in describing it, induces a rejection region? You don't provide any such description nor rejection region. We're left with one-half of CSI, and so we can't call an ordinary bit-string CSI.
Further, as is clear in Dembski's discussion on pp. 152-154, T induces a rejection region onto Ω0. In the example you use, no such rejection region is mentioned or specified in any way. So, for the sake of argument, let's say that your definition of a 10-bit string, the input parameter, represents T0, the rejection region in the reference class Ω0. The cardinality of this rejection region, as you point out, is 1024. Now let us suppose that the "conceptual event", C0 falls in this rejection region, and is identical with the physical event E0. Then the probablity of C0 = E0 =1 in 1024. Now let's look at the output reference class Ω1. The function f transforms T0 to T1, the rejection region in Ω1, C0 to C1, and E0 to E1. Now if the size of the re