Now on ScienceBlogs: Father and Mother and Uncle John...: Tribalism and a Place at the Table

Enter to Win

Profile

I am a professor of statistics and political science at Columbia University and author of Bayesian Data Analysis (with John Carlin, Hal Stern, and Donald Rubin), Teaching Statistics: A Bag of Tricks (with Deborah Nolan), Data Analysis Using Regression and Multilevel/Hierarchical Models (with Jennifer Hill), and, most recently, Red State, Blue State, Rich State, Poor State: Why Americans Vote the Way They Do (with David Park, Boris Shor, Joe Bafumi, and Jeronimo Cortina).

Search

Recent Posts

Recent Comments

Archives

Blogroll

Other Information

« Deciding the conclusion ahead of time | Main | Everybody's a critic »

Why most discovered true associations are inflated: Type M errors are all over the place

Posted on: November 21, 2009 3:22 PM, by Andrew Gelman

Jimmy points me to this article, "Why most discovered true associations are inflated," by J. P. Ioannidis. As Jimmy pointed out, this is exactly what we call type M (for magnitude) errors. I completely agree with Ioannidis's point, which he seems to be making more systematically than David Weakliem and I did in our recent article on the topic.

My only suggestion beyond what Ioannidis wrote has to do with potential solutions to the problem. His ideas include: "being cautious about newly discovered effect sizes, considering some rational down-adjustment, using analytical methods that correct for the anticipated inflation, ignoring the magnitude of the effect (if not necessary), conducting large studies in the discovery phase, using strict protocols for analyses, pursuing complete and transparent reporting of all results, placing emphasis on replication, and being fair with interpretation of results."

These are all good ideas. Here are two more suggestions:

1. Retrospective power calculations. See page 312 of our article for the classical version or page 313 for the Bayesian version. I think these can be considered as implementations of Iaonnides's ideas of caution, adjustment, and correction.

2. Hierarchical modeling, which partially pools estimated effects and reduces Type M errors as well as handling many multiple comparisons issues. Fuller discussion here (or see here for the soon-to-go-viral video version).

Share this: Stumbleupon Reddit Email + More

TrackBacks

TrackBack URL for this entry: http://scienceblogs.com/mt/pings/125224

Comments

1

Hi Andrew,

I read Russ Lenth's article

(Lenth, R. V. (2001), ``Some Practical Guidelines for Effective Sample Size Determination,'' The American Statistician, 55, 187-193)

Where he described retrospective power calculations with observed effect and sample sizes as an "empty question" b/c if the study was powerful enough the result would have been significant (not necessarily scientifically/clinically significant).

Are you talking about retrospective power calculations with the observed effect and sample size? If so how is this useful?


Posted by: PeterT | November 24, 2009 12:58 PM

2

I think Lenth means something else that I do. Basically all I know about retrospective power calculations is in my article with Weakliem.

Posted by: Andrew Gelman | November 25, 2009 2:40 PM

3

Retrospective power calculations are conceptually (and computationally) very straight forward. Counterfactually, if everything about the study was exactly the same except the sample size was different... So if I had twice the sample size would the result have been statistically significant? OK copy the data twice and redo the analysis (or just change the recorded "n" appropriately in the formulas.) Getting some sense of the fraction of the sample size that would have given statistical significance might be a useful metric for some.

And with a study in hand, its a quick and dirty way to do a power calculation for a future study (especially when there multiple covariates that need to be adjusted for). Stephen Senn has published something on finessing the calculations and someone else a paper on refining for power calculations for a future study.

To me its of interest in that the differential reaction of people to the copying of the data twice versus changing the "n" in formulas demonstates the difficulty of understanding random and sytematic approximation (i.e. simulation versus quadrature)

Keith

Posted by: Keith O'Rourke | November 28, 2009 9:46 AM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)





ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Collective Imagination
Enter to win the daily giveaway
Advertisement
Collective Imagination

© 2006-2009 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.