Now on ScienceBlogs: Book Review: Don't be SUCH a Scientist

Seed Media Group

Collective Imagination

evolgen

AT THE CONVERGENCE OF EVOLUTION AND GENETICS

About evolgen

side_view_toon_small.JPG We talk about molecular population and evolutionary GENETICS and GENOMICS. You know, the caliper measurement of a gene's evolvability in moles.

Eschewing obfuscation ever since Morgan.

Search

Recent Posts

Recent Comments

Archives

Convergent Resouces

Blogs at the Convergence Blogs by Journals Other Blogs Blog Carnivals

Journals at the Convergence

« Is Science Outreach Good if the Facts are Wrong? | Main | Math for Biologists »

Redefining the Binomial

Category: Statistics
Posted on: May 16, 2007 4:15 PM, by RPM

There's an interesting post over at Statistical Modeling, Causal Inference, and Social Science on calculating probabilities. Traditionally, if you observe a certain number of events (y) in some number of trials (n), you would estimate the probability (p) of the event as y/n. To calculate the variance around this estimate, you would use this equation: p(1-p)/n.

This leads to two problems. First, if you never observe the event, your estimate of the probability of the event is zero; if you observe the event in every trial, your estimate is one. This leads to a deterministic model even if the unobserved event is possible. Second, if p is estimated as zero or one, then the estimated variance is zero (once again, suggesting a deterministic model).

To get around these problems, the formula (y+1)/(n+2) is proposed for calculated p. Using this formula, you can never get a probability of zero or one, and the variance will always be greater than zero. There is further discussion of the implications of this calculation at SMCISS.

Comments

1

So the chance of me having a three-way with Madeleine Albright and an 17-tentacled alien from Arcturus humming show tunes on the roof of Sogo department store in downtown Osaka is 0.5? Sweet!

Posted by: Janne | May 16, 2007 10:05 PM

2

A related concept is the use of pseudocounts in position-specific weight matrix building -- since the matrices are often log-transformed, zero counts are a royal pain -- and an overestimate of the improbability of that character appearing at a position. The problem is particularly acute for amino acid matrices -- if you have aligned fewer than 20 proteins, then you can't possibly have seen all amino acids -- and at any position at all conserved you need to align many more proteins before you might see them all.

Posted by: Keith Robison | May 17, 2007 12:49 AM

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Enter to win a free copy of The Monty Hall Problem
Visit the Collective Imagination blog
Advertisement
Collective Imagination

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM