While I'm away, I'll leave you with this introduction to likelihood theory (originally published Nov. 22, 2005).
In the Washington Post last week, Charles Krauthammer boldly opposed the Tin Foil Helmet wing of the Republican Party by calling intelligent design a "fraud." The best part of his column was when he pointed out that, just like biology, chemistry and physics are also godless:
The school board thinks it is indicting evolution by branding it an "unguided process" with no "discernible direction or goal." This is as ridiculous as indicting Newtonian mechanics for positing an "unguided process" by which Earth is pulled around the sun every year without discernible purpose. What is chemistry if not an "unguided process" of molecular interactions without "purpose"? Or are we to teach children that God is behind every hydrogen atom in electrolysis?
This might be the only time I ever agree with Krauthammer, so enjoy it while it lasts. He also writes (italics mine):
A "theory" that violates the most basic requirement of anything pretending to be science -- that it be empirically disprovable. How does one empirically disprove the proposition that God was behind the lemur, or evolution -- or behind the motion of the tides or the "strong force" that holds the atom together.
Many of the arguments against ID rely on the Popperian concept of falsification; I'm not going to jump all over Krauthammer or anyone else for making this argument, because I have made the same argument. Nonetheless, I want to use this opportunity to do two things:
- Talk about a likelihood approach to statistical inference. Why? Because it's my blog, and because likelihood statistics are often used to reconstruct evolutionary histories. But I promise I'll lighten it up with cute pictures.
- Point out that ID is ridiculous regardless of what statistical method one prefers.
So what exactly are likelihood statistics? The place to start is not likelihood, but parametric statistics. These are the statistics most people are familiar with.
Imagine you flip a coin 100 times, and it comes up heads 52 times, and tails 48 times. Suppose you want to determine if this is a weighted coin (i.e., if it really is biased towards heads). With a falsification approach, you would establish a null hypothesis: the coin is unbiased-the ratio of heads to tails should not differ significantly from 50:50. (In the example I gave it does not). Here you use a distribution built from the relative frequencies of expected ratios of coin flips, given an unbiased coin. As long as the observed ratio isn't out on the extremes of the distribution, you do not reject the hypothesis of an unbiased coin.
The point that's relevant to likelihood, however, is that you assume a model of how your data will be distributed: each group should follow a bell curve, for example. There is an a priori assumption of how the world works-data are distributed according to a particular pattern.
(An aside: there are many statistical distributions one can use. Also, there are tests one can do if your data don't follow any known distribution. I'm aware of this; I'm just trying to make a point here, ok?).
There are many reasons why certain distributions actually describe naturally occurring data, but they won't be discussed here. Instead, you get a baby panda break:
Ok, onto likelihood. With a likelihood approach, you don't have a null hypothesis per se. Instead, you test an underlying model of 'coin flippiness.' In other words, you ask what is the probability that given a coin that is unbiased (50:50), what is the probability that you would yield 52 heads and 48 tails. You then ask the same question with a hypothetical coin of 51:49 heads:tails, a coin of 52:48 tails, and so forth. Obviously, the most likely hypothetical coin, given the observation of 52:48 heads to tails, is a coin that is biased 52:48. But it's not much more likely than a coin biased 51:49, or unbiased (50:50).
The difference between the two approaches is that falsification tests data against an a priori model, while likelilhood uses the data to build the most likely model given the existing data. The strength of likelihood is that it does not assume how the world works. It also allows you to judge the relative likelihood of different models (or processes). The disadvantage is a garbage-in garbage-out problem: 92% of the time, an unbiased coin would yield an observed ratio that is not 50:50.
So why would evolutionary biologists use a likelihood approach when reconstructing evolutionary histories ("phylogenies")? First, a puppy break:
When evolutionary biologists construct phylogenies with DNA sequence data, one of the most common methods used-I think the most common method, but I haven't tried to quantify it-is maximum likelihood. With this method, you take the sequence data and simultaneously estimate three things:
- The underlying model of molecular evolution. Are certain nucleotides ("ACTG") more or less likely to change to certain other nucleotides? Do all sites with each sequence change at the same rate, or are some sites more likely to change than others?
- The topology of the phylogeny. To put this in English, this how things are related to each other: for example, A and B are each other's closest relatives, and their common ancestor is the closest relative of C.
- The branch lengths of the phylogeny. Again, to translate into English, this is an estimate of how time has passed since each divergence event (each evolutionary 'split').
For each set of parameters, you estimate the likelihood, and choose the evolutionary history with the best likelihood-hence the term "maximum likelihood" (one could choose the least likely evolutionary history, but I'm not sure why anyone would want to do that).
The point here is that this really isn't a falsification method; it's a likelihood method. You are picking the most likely evolutionary history based on the data (being a pessimist, I actually think you're choosing the least awful history, but that's just me).
Anyway, this is a very long digression from a simple point: evolutionary biologists (and other scientists) use criteria other than falsification to test evolutionary hypotheses. I find it a little ironic that ID is always decried because it can't be falsified, when evolutionary biologists ourselves often don't use the falsification criterion.
All three of you who are still with me are probably wondering, "Is the Mad Biologist about to embrace ID?" Of course not, you silly! This brings me to second (and mercifully much shorter) point: ID is untenable from a likelihood perspective too. How does estimate the likelihood of God did it? Or to use Krauthammer's phrase, the likelihood of "the proposition that God was behind the lemur" can not be scientifically estimated.
There endeth the lesson.
(An aside: There are Popperian falsification methods one can use to discern evolutionary histories. One such method is maximum parsimony. Here, you choose the evolutionary history that requires the fewest evolutionary events. The underlying model is that evolution happens in the parsimonious manner: you provisionally choose an evolutionary history that requires the fewest events, and falsify it by finding another history with fewer steps.)
- Log in to post comments
Bravo! A masterful explanation of something I never really had bothered to understand.
What, no kittens?
I read all the way to the end, & no kittens?
Actually, if I remember correctly, there's a Chick tract where Chick denounces the godless theory of "gluons" and insists that God must be responsible for the force holding the atoms together. The argument was something along the lines that if God were not intervening, the electromagnetic force surely would cause the protons to repel each other until the atom were pulled apart.
I don't have a lot of time to browse blogs or just surf for info on anything, but occasionally something urges me to sit and follow some thread of info around. Catching the tail end of the movie "Flock of Dodos" got me here. I'm glad there is a sanely "mad" point of view. Thanks for taking the time to write for people that actually wish to learn. The lowest common denominator mentality has been really killing me for years, along with soundbyte fed crap from all sides.....I dig science...carry on...we need you....before we stop evolving.
Yow, agreed, what he said, and all thats. Thanks.
One objection:
Those two things are not the same. Evolutionary biologists (and many others) may not always use the falsification criterion to test every hypothesis. However, that doesn't mean that they go around posing hypotheses that cannot be falsified. They of course must be falsifiable to be scientific. So, I dare say we can decry ID because it cannot be falsified. Hereby done.
The argument was something along the lines that if God were not intervening, the electromagnetic force surely would cause the protons to repel each other until the atom were pulled apart.
I see the maximum parsimony argument a little different than you do. Parsimony is accepting the simplest estimation of nature your data supports. It is not an argument that your interpetation is correct, just that it is the best you can do. One can find in the literature comments that there is no reason evolution is parsimoneous. In the early days of cladistics, workers accepted parsimony as the test of their hypothesis, and thought their work was done. Others of us objected. The test of your hypothesis is its predictive power. Say you have erected a most parsimoneous tree based on jaw morphology. How well this hypothesis predicts the distribution of various behaviors, digestive enzimes, DNA, geographic distribution, etc. is, in my mind, the test of the original hypothesis. I'm not sure if we are talking falisification or pragmatic utility here. If we had the "one true tree"; however, I would expect it to have outstanding predictive power, parsimoneus or not.