Messing with big numbers: using probability badly

By goodmath on August 15, 2006.

After yesterdays post about the sloppy probability from ann coulter's chat site, I thought it would be good to bring back one of the earliest posts on Good Math/Bad Math back when it was on blogger. As usual with reposts, I've revised it somewhat, but the basic meat of it is still the same.

--------------------

There are a lot of really bad arguments out there written by anti-evolutionists based on incompetent use of probability. A typical example is [this one][crapcrap]. This article is a great example of the mistakes that commonly get made with probability based arguments, because it makes so many of them. (In fact, it makes every single category of error that I list below!)

Tearing down probabilistic arguments takes a bit more time than tearing down the information theory arguments. 99% of the time, the IT arguments are built around the same fundamental mistake: they've built their argument on an invalid definition of information. But since they explicitly link it to mathematical information theory, all you really need to do is show why their definition is wrong, and then the whole thing falls apart.

The probabilistic arguments are different. There isn't one mistake that runs through all the arguments. There's many possibly mistakes, and each argument typically stacks up multiple errors.

For the sake of clarity, I've put together a taxonomy of the basic probabilistic errors that you typically see in creationist screeds.

Big Numbers
-------------

This is the easiest one. This consists of using our difficulty in really comprehending how huge numbers work to say that beyond a certain probability, things become impossible. You can always identify these argument, by the phrase "the probability is effectively zero."

You typically see people claiming things like "Anything with a probability of less than 1 in 10^60 is effectively impossible". It's often conflated with some other numbers, to try to push the idea of "too improbable to ever happen". For example, they'll often throw in something like "the number of particles in the entire universe is estimated to be 3x10^78, and the probability of blah happening is 1 in 10^100, so blah can't happen".

It's easy to disprove. Take two distinguishable decks of cards. Shuffle them together. Look at the ordering of the cards - it's a list of 104 elements. What's the probability of *that particular ordering* of those 104 elements?
The likelihood of the resulting deck of shuffled cards having the particular ordering that you just produced is roughly 1 in 10¹⁶⁶. There are more possible unique shuffles of two decks of cards than there are particles in the entire universe.

If you look at it intuitively, it *seems* like something whose probability is
100 orders of magnitude worse than the odds of picking out a specific particle in the entire observable universe *should* be impossible. Our intuition says that any probability with a number that big in its denominator just can't happen. Our intuition is wrong - because we're quite bad at really grasping the meanings of big numbers.

Perspective Errors
---------------------

A perspective error is a relative of big numbers error. It's part of an argument to try to say that the probability of something happening is just too small to be possible. The perspective error is taking the outcome of a random process - like the shuffling of cards that I mentioned above - and looking at the outcome *after* the fact, and calculating the likelihood of it happening.

Random processes typically have a huge number of possible outcomes. Anytime you run a random process, you have to wind up with *some* outcome. There may be a mind-boggling number of possibilities; the probability of getting any specific one of them may be infinitessimally small; but you *will* end up with one of them. The probability of getting an outcome is 100%. The probability of your being able to predict which outcome is terribly small.

The error here is taking the outcome of a random process which has already happened, and treating it as if you were predicting it in advance.

The way that this comes up in creationist screeds is that they do probabilistic analyses of evolution built on the assumption that *the observed result is the only possible result*. You can view something like evolution as a search of a huge space; at any point in that spaces, there are *many* possible paths. In the history of life on earth, there are enough paths to utterly dwarf numbers like the card-shuffling above.

By selecting the observed outcome *after the fact*, and then doing an *a priori* analysis of the probability of getting *that specific outcome*, you create a false impression that something impossible happened. Returning to the card shuffling example, shuffling a deck of cards is *not* a magical activity. Getting a result from shuffling a deck of cards is *not* improbable. But if you take the result of the shuffle *after the fact*, and try to compute the a priori probability of getting that result, you can make it look like something inexplicable happened.

Bad Combinations
--------------------

Combining the probabilities of events can be very tricky, and easy to mess up. It's often not what you would expect. You can make things seem a lot less likely than they really are by making a easy to miss mistake.

The classic example of this is one that almost every first-semester probability instructor tries in their class. In a class of 20 people, what's the probability of two people having the same birthday? Most of the time, you'll have someone say that the probability of any two people having the same birthday is 1/365²; so the probability of that happening in a group of 20 is the number of possible pairs over 365², or 400/365², or about 1/3 of 1 percent.

That's the wrong way to derive it. There's more than one error there, but I've seen three introductory probability classes where that was the first guess. The correct answer is very close to 50%.

Fake Numbers
--------------

To figure out the probability of some complex event or sequence of events, you need to know some correct numbers for the basic events that you're using as building blocks. If you get those numbers wrong, then no matter how meticulous the rest of the probability calculation is, the result is garbage.

For example, suppose I'm analyzing the odds in a game of craps. (Craps is a casino dice game using six sided dice.) If I say that in rolling a fair die, the odds of rolling a 6 is 1/6th the odds of rolling a one, then any probabilistic prediction that I make is going to be wrong. It doesn't matter that from that point on, I do all of the analysis exactly right. I'll get the wrong results, because I started with the wrong numbers.

This one is incredibly common in evolution arguments: the initial probability numbers are just pulled out of thin air, with no justification.

Misshapen Search Space
-------------------------

When you model a random process, one way of doing it is by modeling it as a random walk over a search space. Just like the fake numbers error, if your model of the search space has a different shape than the thing you're modeling, then you're not going to get correct results. This is an astoundingly common error in anti-evolution arguments; in fact, this is the basis of Dembski's NFL arguments.

Let's look at an example to see why it's wrong. We've got a search space which is a table. We've got a marble that we're going to roll across the table. We want to know the probability of it winding up in a specific position.

That's obviously dependent on the surface of the table. If the surface of the table is concave, then the marble is going to wind up in nearly the same spot every time we try it: the lowest point of the concavity. If the surface is bumpy, it's probably going to wind up a concavity between bumps. It's *not* going to wind up balanced on the tip of one of the bumps.

If we want to model the probability of the marble stopping in a particular position, we need to take the shape of the surface of the table into account. If the table is actually a smooth concave surface, but we build our probabilistic model on the assumption that the table is a flat surface covered with a large number of uniformly distributed bumps, then our probabilistic model *can't* generate valid results. The model of the search space does not reflect the properties of the actual search space.

Anti-evolution arguments that talk about search are almost always built on invalid models of the search space. Dembski's NFL is based on a sum of the success rates of searches over *all possible* search spaces.

False Independence
---------------------

If you want to make something appear less likely than it really is, or you're just not being careful, a common statistical mistake is to treat events as independent when they're not. If two events with probability p₁ and p₂ are independent, then the probability of both p₁ and p₂ is p₁×p₂. But if they're *not* independent, then you're going to get the wrong answer.

For example, take all of the spades from a deck of cards. Shuffle them, and them lay them out. What are the odds that you laid them out in numeric order? It's 1/13! = 1/6,227,020,800. That's a pretty ugly number. But if you wanted to make it look even worse, you could "forget" the fact that the sequential draws are dependent, in which case the odds would be 1/13¹³ - or 1/3×10¹⁴ - about 50,000 times worse.

[crapcrap]: http://www.parentcompany.com/creation_essays/essay44.htm

More like this

Basics: Standard Deviation

When we look at a the data for a population+ often the first thing we do is look at the mean. But even if we know that the distribution

Seasons, short and simple

I love this question: Why is it warmer in the summer than in the winter (for the Northern hemisphere)? Go ahead and ask your friends. I suppose they will give one of the following likely answers:

The Real Bozo Attempts to Atone: Why the DDWFTW Car Works

Technorati Tags: ddftw, bozos, markcc-screwups

BIO101 - Lecture 7 - Physiology: Coordinated Response

Last week we looked at the organ systems involved in regulation and control of body functions: the nervous, sensory, endocrine and circadian systems. This week, we will cover the organ systems that are regulated and controlled.

Hi
Love your blog - would you be OK with me translating this post and using it on a wiki, or rather I will base an article on our wiki on your post.

The Wiki is
http://wiki.ateist.net/

A wiki for a nordic debate forum, based on atheism

I think you've got some unclosed <sub> tags that are wreaking havoc with the rest of the layout.

Although not at all a proponent of chain probability calculations, I never understood why people bring up the "every hand of cards is equally unlikely" argument as a rebuttal. Nobody is arguing, for example, that a royal flush is less likely than any other specified hand. The criticism totally misses the boat.

I will point out that virtually nobody in modern cosmology is satisfied with this sort of explanation. Nobody that I know of argues: "our collection of physical constants is just as likely as any other, so why should we be surprised?"

Instead, at the risk of oversimplifying, the argument is closer to: yes, if the constants come (as some theories suggest) from essentially a random draw, then our set of which we are so proud is no less likely than any other set. However if only our set, or sets in the neighborhood of ours, will produce habitable universes, then that does demand an explanation, and the typical explanation is that there must have been, somehow, many, perhaps infinite, shufflings in the form of multiple universes.

There is an acknowledgement in modern cosmology that even if all hands are equally likely, it appears that the probability of a winning hand (a fertile universe) is significantly less than a losing hand (a sterile universe). The String landscape uses its 10^500 hands to explain the fact that a few of them (such as ours, obviously) are the equivalent of a royal flush. It does not argue: "what's the big deal? Our universe is just as unlikely as any other."

The "all hands are equally [un]likely" rebuttal often misses that point. All hands may be equally likely, but that doesn't mean a good hand is as likely as a bad hand.

If your life depends on being dealt any extremely unlikely hand from a shuffled deck, then you are in no danger. If it demands a specific no-less-likely-than-any-other hand, then you will face the reality of an "effectively zero" chance.

Big number arguments are not always wrong. If they were, no physicist would give a damn about fine-tuning. But instead what we see are research programs aimed at explaining the fine-tuning--when in fact it would be a lot cheaper to say, were it true, that "it's just a big numbers argument."

David:

When it comes to the "fundamental constants" and fine-tuning, I don't find *any* probabilistic argument compelling. We simple do not understand *where* those constants came from enough to be able to assign meaningful probabilities to them.

Where I think that combinatorial probabilistic arguments are screwed up in things like the article I posted yesterday: assume that human beings (or some other species) are the *only possible result* of an evolutionary process, and then work out the a priori probability of human beings evolving.

That's pretty much an *exact* parallel to the shuffling-cards example: you're taking an event that has already occured, and whose specific outcome was randomly determined, and then *going back* to before the event occured, and computing its probability as if it it were the only possible outcome.

Re: the whole fine-tuning issue, the good folks over at Cosmic Variance pointed me to an interesting article entitled "A Universe Without Weak Interactions", the abstract of which I quote below:

A universe without weak interactions is constructed that undergoes big-bang nucleosynthesis, matter domination, structure formation, and star formation. The stars in this universe are able to burn for billions of years, synthesize elements up to iron, and undergo supernova explosions, dispersing heavy elements into the interstellar medium. These definitive claims are supported by a detailed analysis where this hypothetical "Weakless Universe" is matched to our Universe by simultaneously adjusting Standard Model and cosmological parameters. For instance, chemistry and nuclear physics are essentially unchanged. The apparent habitability of the Weakless Universe suggests that the anthropic principle does not determine the scale of electroweak breaking, or even require that it be smaller than the Planck scale, so long as technically natural parameters may be suitably adjusted. Whether the multi-parameter adjustment is realized or probable is dependent on the ultraviolet completion, such as the string landscape. Considering a similar analysis for the cosmological constant, however, we argue that no adjustments of other parameters are able to allow the cosmological constant to raise up even remotely close to the Planck scale while obtaining macroscopic structure. The fine-tuning problems associated with the electroweak breaking scale and the cosmological constant therefore appear to be qualitatively different from the perspective of obtaining a habitable universe.

Of course, this kind of speculation is by its very nature, well, speculative. But I think it says a great deal about our "anthropic" reasoning (and what a vain name that "anthropic" is!). Imposing the requirement that intelligent life must exist might restrict you to the peaks of a fitness landscape, but if many maxima exist then what progress have you made?

And as I wrote several posts ago, but feel like bringing up again, cosmological arguments do a very poor job of supporting any religious conclusions, for a very simple but Goedel-esque reason:

OK, say the cosmological parameters of the Universe were "fine-tuned". Then, the argument goes, there had to be a Fine Tuner. But the Fine Tuner does not --- indeed, cannot --- live within the Universe we know. Ergo, intelligence can exist in a realm which is not at all like our Universe. Yet the whole argument was based on the idea that all the peculiarities of our Universe are essential for intelligent life!

All fall down.

Minor nitpick - unless I've made a mistake somwhere, with 20 people the probability of having at least 2 with matching birthdays is a bit over 41% (which is certainly much closer to 50% than the wrong guess you cite, but I wouldn't call "very close"). The "magic" number I usually used for this problem is 23 people, which puts the probability at just over 50%.

The cosmological arguments David Heddle uses are another example of a perspective error. If you need the universe to come about to produce intelligent life something like us then, yes, probably we could consider it fine-tuned. But we don't have any idea whether intelligent life could come to be if the cosmological constants were different. We might be able to say that human life can't come about, but that doesn't rule out something entirely different. Any argument dealing with fine-tuning requires one to show that there are no (or few) other possible good outcomes -- that's a damn hard argument to make.

JBL,

But we don't have any idea whether intelligent life could come to be if the cosmological constants were different.

Yes we do. If you change the CC just a bit you get either a universe that is only Hydrogen and Helium (no stars, therefore no heavy elements) or one that re-collapses. In either case, no life--no just 'no life as we know it', but no life period.

Of course you might postulate that somehow, in violation of the laws of physics and chemistry, and without the aid of large molecules to store information, inert Hydrogen gas might organize itself into a Star-Trek like cosmic intelligence--but then since most of the universe is just that, Hydrogen, we should see such life, and see it abundantly.

The bottom line is no stars means no elements means no life. So life of any kind, not just human life--is dependent of the CC fine-tuning.

This reminds me of my favourite statistics problem, "Let's Make a Deal."

Given three doors with a prize behind only one, you choose a door. Then the host exposes a non-chosen non-winning door, and offers you a chance to change your choice to the other still closed door. What should you do to maximize your chance of winning?

Example analysis: Odds are 50/50, it makes no difference to switch.

My favourite analysis: Since you have more information after the exposure, you must swith to take advantage of it!

Anecdotal studies suggest math geeks struggle with this problem more than would be expected.

Man, that problem messed me up *so* bad. I was *convinced* that the odds were 50/50 for a long time. I finally changed my mind when I ran a simulation. ^_^

By far the easiest way to understand the "let's make a deal" problem is to realize that if your strategy was never to switch, then it's exactly as if you never had the opportunity to switch, and thus the probability of winning if you never switch is 1/3. So switching gives you a 2/3 probability of winning.

David:

"But instead what we see are research programs aimed at explaining the fine-tuning--when in fact it would be a lot cheaper to say, were it true, that "it's just a big numbers argument."

This is a strawman. The reason for the research is that scientists wish to know more about fundamental theories. They want to find out why fundamental parameters have their values. They also want to know which and how the vacuum is picked from the string landscape.

As I understand it there is at least 4 basic ideas here:
1) A mechanism is found to pick vacua. One or several universes, deterministic.
2) Vacua are found to be nearly equiprobable.
2a) One universe, low a priori probability for our vacua.
2b) Multiuniverse, high a priori probability for our vacua.
3) Vacua are found to be unequiprobable. One or several universes, high a priori probability for our vacua.

As Mark says, aposteriori even 2a) is okay. But physicists miss out on a mechanism that helps them understand more, so they hope it isn't so. Jefferys alternative is 2b), to connect back to yesterdays commenting. 3) doesn't help much, since without a better description the problem to find the correct vacua is typically NP hard and sometimes NP complete ( http://www.arxiv.org/abs/hep-th/0602072 ) .

"The String landscape uses its 10^500 hands"
This is a number that changes often. What I think has happened is that from 10^500 it went down to 10^100 - 10^200 when some compactions were found. Then a part of the landscape was found with oo solutions. And now they have found a mechanism that constrains that to decidedly realisable vacua, probably around 10^100 or so.

Yep. After I used the simulation to prove myself wrong, I thought about *why* I was wrong, and that's what I came up with.

"When it comes to the "fundamental constants" and fine-tuning, I don't find *any* probabilistic argument compelling. We simple do not understand *where* those constants came from enough to be able to assign meaningful probabilities to them."

I stumbled on a paper that discuss naturalness (constants, flatness, entropy, et cetera) and probability arguments. (In the same thread Blake found the Weakless universe in.)

First, it describes the problem in more detail.

"What makes a situation "natural"?

Ever since Newton, we have divided the description of physical systems into two parts: the configuration of the system, characterizing the particular state it is in at some specific time, and the dynamical laws governing its evolution. For either part of this description, we have an intuitive notion that certain possibilities are more robust than others.

For configurations, the concept of entropy quantifies how likely a situation is.

For dynamical laws, the concept of naturalness can be harder to quantify. As a rule of thumb, we expect dimensionless parameters in a theory (including ratios of dimensionful parameters such as mass scales) to be of order unity, not too large nor too small."

Second, it manages to describe probability arguments for a multiverse with a probability measure but find that they are "at best beyond our current abilities, and at worst completely hopeless".

"Instead, the possible epistemological role of the multiverse is to explain why our
observed parameters are natural. In principle, the multiverse picture allows us to predict the
probability distribution for these parameters. ... Just to mention the most
obvious difficulty: in the context of eternal inflation, there is every reason to believe that the
volumes Vn of some (if not all) vacua are infinite, and the expression is simply undefined."

It when transforms Weinberg's prediction of the cosmological constant to this form and shows that it is equivalent to making wrong assumptions about the multiverse.

"At the present time, then, there is not a reliable environmental explanation for the
observed value of the cosmological constant. Meanwhile, other attempts to use anthropic
reasoning lead to predictions that are in wild disagreement with observations [66]."

Which pretty much confirms Mark's view.

BTW, the paper also makes some other good observations, for example how spontaneous eternal inflation may explain the low entropy start of each universe and the arrow of time. And it ends by observing on multiverses:

"The multiverse is not a theory; it is a consequence of certain theories (of quantum gravity and cosmology), and the hope is that these theories eventually prove to be testable in other ways. Every theory makes untestable predictions, but theories should be judged on the basis of the testable ones.

The ultimate goal is undoubtedly ambitious: to construct a theory that has definite consequences for the structure of the multiverse, such that this structure provides an explanation for how the observed features of our local domain can arise naturally, and that the same theory makes predictions that can be directly tested through laboratory experiments and astrophysical observations."
( http://arxiv.org/PS_cache/hep-th/pdf/0512/0512148.pdf )

On the same note, the same cosmologist has an earlier paper on why "adding God would just make things more complicated, and this hypothesis should be rejected by scientific standards" in cosmology. ( http://pancake.uchicago.edu/~carroll/nd-paper.html )

I'm not sure I understand your dismissal of the "Big Number Argument." There is a difference between stating that something is "impossible" and "that the probability is effectively zero." If you randomly mix a deck of cards you get a result. No big deal. But what about the odds of predicting or replicating that result? 1 in 10^166... not impossible... but effectively zero for a single attempt.

I see the "Big Number" error from exactly the opposite stance... Most people focus on numerator. When confronted with the odds of winning a state lottery, a dupe will often and correctly argue that someone has to win. This is, of course, true... but the probability that they will win is effectively zero.

Glenn - You've actually hit the nail on the head, but I don't think you realize it.

If you have a lottery, *someone* is bound to win it eventually. Any specific person almost certainly won't, but if you have enough people trying it will eventually happen.

This is the basic issue with "Big Number Arguments" that are used against evolution. They're looking at the one guy who won the lottery and asking what the chances of that were. They are *completely* ignoring the fact that there are a large number of possible winners, all of whome are roughly equally likely to win. The identify of the winner isn't important, but they're making it the basis of their calculations.

That's the problem here. Genes can adapt in any number of ways, producing a nigh-infinite number of possible creatures. The fact that we currently see only a relatively small number of them simply means that they were the lottery winners, not that they were somehow special or central (though they *were* higher probability than many others, due to the various kinds of fitness selection).

When IDists analyze this, though, they assume that the currently existing creatures were the only ones who *could* exist at this moment, and they try to challenge evolution to explain how that could happen with 'random chance'. This is as ridiculous as trying to make statisticians explain *why* a particular person won the lottery.

"But instead what we see are research programs aimed at explaining the fine-tuning--when in fact it would be a lot cheaper to say, were it true, that "it's just a big numbers argument."

This is an important point--cosmologists' desire to know why things are as they are does not make them all devotees of the strong anthropic principle. The design argument is no more legit in physics than anywhere else.

Yes, virtually all cosmologists think it's worth investigating how fundamental universal constants got their current values. But that's not the same as asking why those values fall in "life-friendly" regions in particular. Some care about the latter question (I would imagine most theists in the field do); some don't.

Aside from anything else, how do you know you aren't? As far as I'm aware, astronomers can't yet determine by observation whether a given interstellar gas cloud embodies a cosmic intelligence....

Aside from anything else, how do you know you aren't? As far as I'm aware, astronomers can't yet determine by observation whether a given interstellar gas cloud embodies a cosmic intelligence....

Perhaps, and we must assume that said intelligence wishes to remain hidden, but the problem remains that chemists and physcists know that, in the environment of interstellar space, Hydrogen and Hydrogen makes--um--Hydrogen.

Yoe are entitled to your religious beliefs that such an intelligence exists, but nothing from science supports it.

Perhaps, and we must assume that said intelligence wishes to remain hidden,

Now we're making claims about hypothetical sentient interstellar gas cloud psychology? My word, I had no idea the field was so advanced.

but the problem remains that chemists and physcists know that, in the environment of interstellar space, Hydrogen and Hydrogen makes--um--Hydrogen.

I hear it does that here on Earth too...

Yoe are entitled to your religious beliefs that such an intelligence exists, but nothing from science supports it.

Nothing from science really covers the matter at all. Which, surprisingly enough, makes claims hinging on the nonexistence of sentient gas clouds across the multiverse rather unscientific.

Anton Mates

hear it does that here on Earth too...

Yes, but it should be rather obvious that here we have the advantage of the presence other elements such as oxygen and carbon and iron and potassium...--all of which were manufactured by stars. Life that formed out of just hydrogen would have to be made of, you guessed it, just hydrogen.

Nothing from science really covers the matter at all. Which, surprisingly enough, makes claims hinging on the nonexistence of sentient gas clouds across the multiverse rather unscientific.

Since science understands that making self-replicating structures that can store information out of just interstellar hydrogen violates the known laws of physics, it is not outside the realm of science to say no such life exists. In fact, I think you'd get a overwhelming majority of scientists to agree that a universe of just hydrogen gas would be lifeless.

By the way, you used the term "multiverse" fairly cavalierly. Since there is no experimental test for the multiverse, I would argue that using that term as if it is an established fact is far less scientific in the Popperian sense than claiming that there is no intelligent life comprised of just hydrogen.

Yep, which would be rather a problem if you were trying to make a hamster, or indeed any intelligence running on chemical reactions. (Though I suppose, depending on the temperatures involved, you could at least have periodic ionization or dissociation of the hydrogen molecules.) Now, the proof that you can't make an intelligence without such reactions is...?

Since science understands that making self-replicating structures that can store information out of just interstellar hydrogen violates the known laws of physics

My goodness; must be different laws than I learned about in my physics classes. Self-replication isn't a prerequisite to intelligence, incidentally.

In fact, I think you'd get a overwhelming majority of scientists to agree that a universe of just hydrogen gas would be lifeless.

Virtually every scientist I've ever seen asked whether [insert extremely alien environment here] could develop life has responded with "Probably not our sort of life; beyond that I have no clue." But perhaps they're atypical.

By the way, you used the term "multiverse" fairly cavalierly. Since there is no experimental test for the multiverse

No, I was just using the term in the sense of whatever universe or ensemble of universes you may happen to believe in. Which I suppose might seem cavalier, but I do read a lot of comic books.

As regards the actual theorizing about a multiverse in physics, and the experimental untestability thereof, TorbjÃ¶rn had a nice quote about that upthread. Certainly it's far from an established fact.

Messing with big numbers: using probability badly

More like this

Basics: Standard Deviation

Seasons, short and simple

The Real Bozo Attempts to Atone: Why the DDWFTW Car Works

BIO101 - Lecture 7 - Physiology: Coordinated Response

Moving on

Goodbye, Scienceblogs

Seed, Conflicts of Interest, and Sleaze

Searching for Topics

Saturday Recipe: Ginger Scallion Sauce

Flightlessness in azhdarchids, marsupial brains and pelagic desmostylians: SVPCA 2010 (part II)

Messier Monday: The Triangulum Galaxy, M33

The Panamanian Blue Hill Monster (or Cerro Azul Monster)