Now on ScienceBlogs: The Galaxy's Biggest Valentine

ScienceBlogs Book Club: Inside the Outbreaks

evolgen

AT THE CONVERGENCE OF EVOLUTION AND GENETICS

About evolgen

side_view_toon_small.JPG We talk about molecular population and evolutionary GENETICS and GENOMICS. You know, the caliper measurement of a gene's evolvability in moles.

Eschewing obfuscation ever since Morgan.

Search

Recent Posts

Recent Comments

Archives

Convergent Resouces

Blogs at the Convergence Blogs by Journals Other Blogs Blog Carnivals

Journals at the Convergence

« Is Science Outreach Good if the Facts are Wrong? | Main | Math for Biologists »

Redefining the Binomial

Category: Statistics
Posted on: May 16, 2007 4:15 PM, by RPM

There's an interesting post over at Statistical Modeling, Causal Inference, and Social Science on calculating probabilities. Traditionally, if you observe a certain number of events (y) in some number of trials (n), you would estimate the probability (p) of the event as y/n. To calculate the variance around this estimate, you would use this equation: p(1-p)/n.

This leads to two problems. First, if you never observe the event, your estimate of the probability of the event is zero; if you observe the event in every trial, your estimate is one. This leads to a deterministic model even if the unobserved event is possible. Second, if p is estimated as zero or one, then the estimated variance is zero (once again, suggesting a deterministic model).

To get around these problems, the formula (y+1)/(n+2) is proposed for calculated p. Using this formula, you can never get a probability of zero or one, and the variance will always be greater than zero. There is further discussion of the implications of this calculation at SMCISS.

Share on Facebook
Share on StumbleUpon
Share on Facebook

Comments

1

So the chance of me having a three-way with Madeleine Albright and an 17-tentacled alien from Arcturus humming show tunes on the roof of Sogo department store in downtown Osaka is 0.5? Sweet!

Posted by: Janne | May 16, 2007 10:05 PM

2

A related concept is the use of pseudocounts in position-specific weight matrix building -- since the matrices are often log-transformed, zero counts are a royal pain -- and an overestimate of the improbability of that character appearing at a position. The problem is particularly acute for amino acid matrices -- if you have aligned fewer than 20 proteins, then you can't possibly have seen all amino acids -- and at any position at all conserved you need to align many more proteins before you might see them all.

Posted by: Keith Robison | May 17, 2007 12:49 AM

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.