statistics

There's an interesting post over at Statistical Modeling, Causal Inference, and Social Science on calculating probabilities. Traditionally, if you observe a certain number of events (y) in some number of trials (n), you would estimate the probability (p) of the event as y/n. To calculate the variance around this estimate, you would use this equation: p(1-p)/n. This leads to two problems. First, if you never observe the event, your estimate of the probability of the event is zero; if you observe the event in every trial, your estimate is one. This leads to a deterministic model even if the…
Want to know when to use Standard Deviation (SD) as opposed to Standard Error (SE) or a Confidence Interval (CI)? Then you should read this really useful paper in JCB about error bars in scientific papers. Here is just a sampling of their useful rules: Rule 3: error bars and statistics should only be shown for independently repeated experiments, and never for replicates. If a "representative" experiment is shown, it should not have error bars or P values, because in such an experiment, n = 1... Rule 4: because experimental biologists are usually trying to compare experimental results with…
In the comments of my dinosaur genome size post, Shelley asked: So do ALL birds have equally small genomes or is there variation among species? I don't think she was looking for a trite response along the lines of: "Of course there's variation among species." What she was asking, I presume, is how much variation in genome size do we see in birds? As you can see in this phylogeny, all birds (and nearly all theropods) have small genomes. But that tree only presents data from a few species. To get a better idea of genome size variation within birds, I downloaded C-values (amount of DNA in a…
Yet another reader sent me a great bad math link. (Keep 'em coming guys!) This one is an astonishingly nasty slight of hand, and a great example of how people misuse statistics to support a political agenda. It's by someone named "Dr. Deborah Schurman-Kauflin", and it's an attempt to paint illegal immigrants as a bunch of filthy criminal lowlifes. It's titled "The Dark Side of Illegal Immigration: Nearly One Million Sex Crimes Committed by Illegal Immigrants in the United States." With a title like that, you'd think that she has actual data showing that nearly one million sex crimes were…
When we look at a the data for a population+ often the first thing we do is look at the mean. But even if we know that the distribution is perfectly normal, the mean isn't enough to tell us what we know to understand what the mean is telling us about the population. We also need to know something about how the data is spread out around the mean - that is, how wide the bell curve is around the mean. There's a basic measure that tells us that: it's called the standard deviation. The standard deviation describes the spread of the data, and is the basis for how we compute things like the degree…
In general, when we gather data, we expect to see a particular pattern to the data, called a normal distribution. A normal distribution is one where the data is evenly distributed around the mean in a very regular way, which when plotted as a histogram will result in a bell curve. There are a lot of ways of defining "normal distribution" formally, but the simple intuitive idea of it is that in a normal distribution, things tend towards the mean - the closer a value is to the mean, the more you'll see it; and the number of values on either side of the mean at any particular distance are…
Statistics is something that surrounds us every day - we're constantly bombarded with statistics, in the form of polls, tests, ratings, etc. Understanding those statistics can be an important thing, but unfortunately, most people have never been taught just what statistics really mean, how they're computed, or how to distinguish the different between statistics used properly, and statistics misused to deceive. The most basic concept in statistics in the idea of an average. An average is a single number which represents the idea of a typical value. There are three different numbers which can…
The curtain has been drawn and the secrets to data analysis revealed. Do you have a data set sitting around in need of analysis? Read this and learn how to find significant results somewhere -- anywhere -- in your data. Because negative results won't get you published; and you won't get hired/tenure if you don't publish; and your career will be a failure. Here's a taste: There are always anomalies. The World Series has been swept 17 times, five more than the model would predict. Plug this into the BINOMDIST function in Excel. (Understanding how this function works is optional and may in some…
Pim van Meurs has a blog post at The Panda's Thumb about the recent paper on translational selection on a synonymous polymorphic site in a eukaryotic gene (DOI link). He points out that this was predicted in a paper from 1987. In short, the rate of translation depends on the tRNA pool -- amino acids encoded by more abundant tRNA anti-codons will be incorporated more quickly than amino acids with rare tRNAs. Because protein folding begins during translation, codon usage can influence protein secondary structure. That's because rare codons could stall translation, allowing for protein…
John Wilkins has replied to Larry Moran on the role of "chance" in evolution (incidentally, Moran replies to Wilkins on the same topic, but a different post by Wilkins). Here's what Larry wrote: Nobody denies the power of natural selection and nobody claims that natural selection is random or accidental. However, the idea that everything is due to natural selection is the peculiar belief of a relatively small number of people, of whom Richard Dawkins is the most outspoken. A great deal of evolution is the result of chance or accident, as is a great deal of the rest of the universe. It's…
So after reading Brad DeLong's post about how the Democrats won with a 13.4% majority in the Senate (if you total all the votes cast for each party), I decided to do the same with the Congressional races, since everyone votes for a congresscritter. Before I get to the results, here's some caveats: I didn't include uncontested races. Since I was pulling data from CNN.com which didn't have totals for uncontested races, I'm underestimating the number of Democratic votes cast (the Democrats had far more uncontested seats than Republicans). This could result in an additional 1.2-1.5 Democratic…
When the Lancet study first came out, I argued that conservatives couldn't just criticize--they had to offer their own alternative, credible numbers for the civilian death toll. Matt Yglesias goes one further: why not a second study using credible methods of which conservatives approve?
Or at least 655,000 (± 140,000) of them. Before I get to the news reports, I think it's important to make something clear. These statistical techniques are routinely used in public health epidemiology and nobody complains about them. Critics of this estimate can't play the same game the creationists do. They can't just debunk the numbers. They have to propose an alternative, reliable method, otherwise this estimate has to be viewed as the best available estimate. (I can't wait to hear Bill O'Reilly talk about statistics...) From the NY Times: A team of American and Iraqi public health…
Adam Eyre-Walker has published a review of adaptive evolution in a few well studied systems: Drosophila, humans, viruses, Arabidopsis, etc. These organisms have been the subject of many studies that used DNA polymorphism, DNA divergence, or a combination of the two to detect natural selection in both protein coding and non-coding regions of the genomes. Now that we have whole genome sequences for multiple closely related species from a few different taxa, many researchers are interested in determining the role of natural selection in the evolution of DNA sequences. Eyre-Walker claims that the…
It's Saturday on the second weekend of the college football season. Tomorrow (Sunday) marks the opening of the NFL season (okay, the season really kicked off Thursday night). Also, we're hitting the home stretch of the major league baseball season, and the playoffs are just around the corner. With all of that in mind, this marks a good time to ask, How good is my city when it comes to sports? If you live in Cleveland, you don't need any scientific study to tell you you've suffered through some miserable seasons. But what about the rest of the United States and Canada? The blog Urban Sports…
ABC News has an article by mathematician John Allen Paulos on how creationists misuse probability in their anti-science arguments. This article is inspired by the Science article on public acceptance of evolution. I especially like how he distinguishes between a priori and a posteri probabilities: Now if we shuffle this deck of cards for a long time and then examine the particular ordering of the cards that happens to result, we would be justified in concluding that the probability of this particular ordering of the cards having occurred is approximately 1 chance in 10 to the 68th power. This…
While I was on vacation, I got some email from Chris Noble pointing me towards a discussion with some thoroughly innumerate HIV-AIDS denialists. It's really quite shocking what passes for a reasonable argument among true believers. The initial stupid statement is from one of Duesberg's papers, [AIDS Acquired by Drug Consumption and Other Noncontagious Risk Factors][duesberg], and it's quite a whopper. During a discussion of the infection rates shown by HIV tests of military recruits, he says: >(a) "AIDS tests" from applicants to the U.S. Army and the U.S. Job >Corps indicate that…
Given the expected frequency of a certain outcome of a replicate in an experiment, we can estimate the expected variance around that mean (either by deriving it or performing simulations). I have heard that laboratory experiments tend to have greater variances than expected due to conditions not included in the model (ie, we can't control for every variable in an experiment) when determining the expected variance. I am looking for a citation that addresses the issue of variance in laboratory experiments. Specifically, I am interested in an article that deals with higher than expected variance…
After yesterdays post about the sloppy probability from ann coulter's chat site, I thought it would be good to bring back one of the earliest posts on Good Math/Bad Math back when it was on blogger. As usual with reposts, I've revised it somewhat, but the basic meat of it is still the same. -------------------- There are a lot of really bad arguments out there written by anti-evolutionists based on incompetent use of probability. A typical example is [this one][crapcrap]. This article is a great example of the mistakes that commonly get made with probability based arguments, because it makes…
A reader sent me a copy of an article posted to "chat.anncoulter.com". I can't see the original article; anncoulter.com is a subscriber-only site, and I'll be damned before I *register* with that site. Fortunately, the reader sent me the entire article. It's another one of those stupid attempts by creationists to assemble some *really big* numbers in order to "prove" that evolution is impossible. >One More Calculation > >The following is a calculation, based entirely on numbers provided by >Darwinists themselves, of the number of small selective steps evolution would >have to…