A month ago, Eric Schwitzgebel wrote a post critical of meta-analysis, suggesting that studies finding null results don't tend to get published, thus skewing meta-analysis results. I objected to some of his reasoning, my most important point being that the largest studies are going to get published, so most of the data collected actually does appear in the literature.
Now Schwitzgebel's got a new post about meta-analysis, again taking a critical stance. First, he discusses experimenter bias:
An experimenter who expects an effect of a certain sort is more likely to find such an effect than another experimenter, using the same method, who does not expect such an effect. For example, Rosenthal found that undergraduate experimenters who were told their rats were "bright" recorded better performance when testing the rats than others who were given "dull" rats (though the rats were from the same population). Experimenter effects can be surprisingly resistant to codification of method -- showing up, for example, even in subjects' reaction times recorded by computer.
But isn't experimenter bias a potential problem in any line of research, whether it's meta-analyzed or not? Schwitzgebel explains why it might be a bigger problem with meta-analysis:
Reviews and meta-analyses are typically performed by experts in the subfield. You might think this is good -- and it is, in several ways. But it's worth noting that experts in any subfield are usually committed to the value of research in that subfield, and they are commenting on the work of friends and colleagues they may not want to offend for both personal and self-serving reasons.
So experimenter bias can reveal itself in meta-analysis itself, not just in the initial experiment. Fair enough criticism, although I'd argue that it's also possible for this sort of bias to appear in literature reviews. I'm still not convinced that meta-analysis is any more vulnerable to experimenter bias than any other method.
One thing Schwitzgebel does offer is a pattern that he expects to find when there's really no effect:
(1.) Some null results, but maybe not even as many as half the published studies.
(2.) Positive results, but not falling into a clearly interpretable pattern.
(3.) Some researchers tending consistently to find positive results, across a variety of methods and subtopics, while others do not.
(4.) A substantial number of methodologically dubious studies driving the apparent effect.
(5.) (Maybe) a higher rate of null effects found in the sophomore years of the research (after the first experiments that generated excitement about the area but before referees start complaining that there have already been a number of null effect studies published).
Now we're getting somewhere. This can be an excellent diagnostic tool: when you see a pattern like this in a research subfield, you can make a pretty solid case that a particular meta-analysis is dubious. Of course, not all research follows this pattern, which is why I still think meta-analyses can often be useful. Clearly meta-analysis needs to be viewed with skepticism, but consider effects like the link between smoking and lung cancer. No single study demonstrates a causal relationship in humans, but converging evidence from a number of different studies makes it clear that smoking is extremely dangerous.
- Log in to post comments
" Experimenter effects can be surprisingly resistant to codification of method -- showing up, for example, even in subjects' reaction times recorded by computer."
Would someone please explain to me how a computer is capable of skewing reaction time data in favor of a particular hypothesis? Am I going to have to send my lab's computers to a research methods class?
I think he must mean that the human experimenter's attitude can have an effect on the subject's responses, thus skewing results. Training rats to navigate mazes, for example, can be biased by the physical position the experimenter takes when placing the rats in the maze.
This is really interesting. It makes me wonder what kinds of psychological biases come in to play. There are probably individual factors, not just in terms of personal beliefs, but also in terms of the interaction between publisher and researcher.
There are actually statistical tools available, and used fairly frequently, for dealing with each of the problems Schwitzgebel mentions. For example, you can calculate the number of null results necessary to reduce a particular finding in a meta-analysis to non-significance (sometimes called the "failsafe N"). And you can include the quality of a study as a variable, using various subjective and objective measures of quality. Schwitzgebel's criticisms are well known to people who use meta-analyses, and are generally taken into account. If they aren't, reviewers are likely to complain pretty loudly.
I'm not sure, Chris. There is of course the particular analysis you mention, but that's I think a false comfort. Let's say I'm right that about half the studies will show a significant positive relationship even when there is no real relationship, and let's say there are 50 studies. If you just mindlessly do the calculation you mention, you might find that you need 500 null studies to counterbalance the 25 positive studies (since some of those studies will have high z values). That sounds comforting, but to my way of thinking the overall pattern of results may still be best interpreted as showing no effect. Such an analysis excludes the possibility that it's mere *statistical chance* that the published studies show positive relationships; but that's not what I'm positing. Experimenter effects, results due to spurious relationships unconnected with the hypothesis under study, etc., are not simply randomness that can be eliminated by a simple statistical procedure of the sort you mention.
The meta-analyses I have seen are often *not* sufficiently attuned to the likelihood of a substantial number of positive findings even if there's no relationship between the variables. I don't even see Rosenthal (1991) being very careful about it, in his classic text, though he was the discoverer of the experimenter effect.
P.S. -- Thanks for the comments and discussion, Dave, by the way!
Eric:
any meta-analysis text has a chapter (or several) on diagnostics, ranging from the simple (like funnel plots) to more complex. As for your example, 50 studies, 25 showing no effect and 25 showing an effect, that's actually evidence for an effect (but a small one). Using a single variable, and a p vales of .05, you'd expect 2 to 3 studies to show an effect. Your reasoning is intuitive but wrong, hence why we have statistics.
Look at data from studies of homeopathy (e.g., British Journal of Clinical Pharmacology,v.54(6), 577-582, December 2002) - there's a clear bias, yet the decent meta-analysis reveals this, and points to an overall effect size of 0. Similarly, look at the effects of intercessory prayer. Multiple studies claim effects, but when you analyze and control for criterion-switching, there is evidence for no effect.
Stewart, it seems to me that your comments reveal exactly the problem I'm complaining about. Meta-analyses often assume that the only reason for a positive result if the null hypothesis is true is random sampling error. That's evidently how you get 2-3 studies positive out of 50 with a p less than .05. The problem with your suggestion is that there are many reasons studies might skew positive other than sampling error! Which was exactly my point.
Your suggestion seems to me to be a knee-jerk mathematical reaction, rather than actual thinking. There's nothing like publishing regularly on a blog to reveal to oneself one's capacity for amazing stupidity, but I'd have hoped for a *little* more charitable reading (as Dave gave me, bless him!) than to assume that I simply don't understand p values (per Stewart) or basic meta-analytic methodology (per Chris).
Thanks for the tip on the homeopathy article and the thought about intercessory prayer. I do have two thoughts, though, why non-effects might look a little different in these areas than in the areas I study:
(1.) It's much easier to do random assignment and blinded studies than when you're looking at uncontrollable and unblinded variables like religiosity and self-reported imagery skill.
(2.) Homeopathy and intercessory prayer are outside the mainstream scientific view so (a.) there might be a bias toward viewing positive articles purporting to show effects much more critically and (b.) reviewers aren't likely to be alienating a community of peers by claiming no effect.