Via Amy Perfors at the Harvard statistics blog, Social Science Statistics Blog, I learned of the Jeffrey-Lindley Paradox in statistics. The paradox is that if you have a sample large enough, you can get p-values that are very close to zero, even though the null hypothesis is true. You can read a very in depth explanation of the paradox here.
I don't find this either surprising or worrisome, as Perfors does. While I'd never heard of the paradox before (it's really pretty cool, if you're into statistics or Bayesian reasoning), everyone who's taken a statistics course understands the perils of large sample-sizes. The fact is, if you have two different groups, even from the same population, they are, by definition, two different groups, as they are composed of two different sets of individuals. As a result, where measures that are influenced by random variables are concerned, the means of the groups will be different, and if you get a large enough sample size, that difference will be statistically significant. Since everyone is aware of this, I can't imagine it's a problem. If it looks like someone's using a sample that's too large, so that any significant differences he or she might find are likely to be theoretically and practically uninteresting, people will pick up on it, either through effect size calculations or through subsequent research.
I just noticed this post of yours, so feel compelled to comment. :)
I definitely agree with you that to the extent people keep effect sizes in mind, this isn't a worrisome result; what I'm more worried about -- and failed to say in my post -- is related to, for lack of a better word, "meta" cognitive-science, or sociology of science. Because (as we know from much cognitive science research) people tend to think categorically, and because a significance level gives a nice "category" to fit results in, even if effect size is reported and it's small it's easy to just notice and remember the significance level. This tendency is made worse by the fact that sometimes there is no accepted notion of how big of an effect is "interesting", in the same way that there are accepted p-value thresholds. Thus, if we often run subjects until getting a significant p-value -- even if we report effect size -- what ends up staying in memory is just the result and the knowledge that it was significant. It might be better to stop collecting data earlier, thus possibly overlooking findings with small effect sizes, in order to just focus on and pinpoint the interesting and robust results.
Honestly, I don't think this is a big worry in practice, at least for a lot of reported work. I made the post mainly because I think the paradox is cool and wanted to talk about it. :) But effect size does matter, for many diverse reasons: and the more salient we make this point, the more often we emphasize it, the less I worry about being led astray because of cognitive factors like those I detailed in the last paragraph.
Hi Amy, thanks for commenting. I agree with your point about cognitive factors, though I think that's where further research sorts things out. Howver, like you, I mostly wrote this post because I thought the paradox was cool.