Redefining the Binomial

By evolgen on May 16, 2007.

There's an interesting post over at Statistical Modeling, Causal Inference, and Social Science on calculating probabilities. Traditionally, if you observe a certain number of events (y) in some number of trials (n), you would estimate the probability (p) of the event as y/n. To calculate the variance around this estimate, you would use this equation: p(1-p)/n.

This leads to two problems. First, if you never observe the event, your estimate of the probability of the event is zero; if you observe the event in every trial, your estimate is one. This leads to a deterministic model even if the unobserved event is possible. Second, if p is estimated as zero or one, then the estimated variance is zero (once again, suggesting a deterministic model).

To get around these problems, the formula (y+1)/(n+2) is proposed for calculated p. Using this formula, you can never get a probability of zero or one, and the variance will always be greater than zero. There is further discussion of the implications of this calculation at SMCISS.

More like this

Statistics, damn statistics and well kept secrets

Marilyn Mann pointed me to an interesting post by David Rind over at Evidence in Medicine (thanks!). It's a follow-on to an earlier post of his about the importance of plausibility in interpreting medical literature, a subject that deserves a post of its own. In fact the piece at issue, "HIV…

Pavlov's Dogs: Proving the Null With Bayesianism

How many times did Pavlov ring the bell before his dogs' meals until the dogs began to salivate? Surely, the number of experiences must make a difference, as anyone who's trained a dog would attest. As described in a brilliant article by C.R. Gallistel (in Psych. Review; preprint here), this has…

Who Will Win Control of Congress In November? Statisticians Make a Prediction

If you're not reading the Columbia University stats blog, Statistical Modeling, Causal Inference, and Social Science, you're missing a lot of great stuff. For example, today's post by Andrew Gelman discusses the paper "Forecasting House Seats from Generic Congressional Polls" by Bafumi, Erikson,…

Messing with big numbers: using probability badly

After yesterdays post about the sloppy probability from ann coulter's chat site, I thought it would be good to bring back one of the earliest posts on Good Math/Bad Math back when it was on blogger. As usual with reposts, I've revised it somewhat, but the basic meat of it is still the same…

So the chance of me having a three-way with Madeleine Albright and an 17-tentacled alien from Arcturus humming show tunes on the roof of Sogo department store in downtown Osaka is 0.5? Sweet!

A related concept is the use of pseudocounts in position-specific weight matrix building -- since the matrices are often log-transformed, zero counts are a royal pain -- and an overestimate of the improbability of that character appearing at a position. The problem is particularly acute for amino acid matrices -- if you have aligned fewer than 20 proteins, then you can't possibly have seen all amino acids -- and at any position at all conserved you need to align many more proteins before you might see them all.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

EPA Reconsiders Its Biden Ban On Asbestos Everywhere

More by this author

This is a Good-bye Post

January 16, 2009

This is the final post ever at evolgen. It was a fun 4+ years, the last three spent at ScienceBlogs, but it has come time for me to close up shop. When I first got into blogging, I did it as a way to share what was on my mind to the few people who would read what I had to say (usually in topics…

Mendel's Garden #27 - Call for Submissions

January 2, 2009

Mendel's Garden is the original genetics blog carnival. The next edition will be hosted by Jeremy at Another Blasted Weblog. If you would like to submit a blog post to be included in the carnival, send an email to Jeremy (jcherfas at mac dot com). The carnival should be posted within the next few…

Eric Lander Teaches?

December 20, 2008

John Hawks points out that Eric Lander has been appointed to co-chair Obama's Council of Advisers on Science and Technology along with science adviser John Holdren and Nobel Laureate Harold Varmus. Here's how the AP article describes Lander: Lander, who teaches at both MIT and Harvard, founded the…

The Implementation of Molecular Evolution for the Masses

December 18, 2008

A couple of years ago, there was talk in the bioblogosphere about getting the general public interested in bioinformatics and molecular evolution: Amateur bioinformatics? Lowering the Ivory Tower with Molecular Evolution Molecular Evolution for the Masses The idea was inspired by the findings of…

Do people still use microarrays?

December 17, 2008

Larry Moran points to a couple of posts critical of microarrays (The Problem with Microarrays): Why microarray study conclusions are so often wrong Three reasons to distrust microarray results Microarrays are small chips that are covered with short stretches of single stranded DNA. People…