Respectful Insolence

ResearchBlogging.orgThe single most necessary task for a physician practicing science- and evidence-based medicine is the evaluation of the biomedical literature to extract from it just what science and the evidence support as the best medical therapy for a given situation. It is rare for the literature to be so clear on a topic that different physicians won’t come to at least somewhat different conclusions. Far more common is the situation where the studies are conflicting, although usually with a preponderance of studies tending to support one or two interventions more than others, or where there are few or only low quality studies (usually for diseases and conditions that are not very common and thus not easy to study in large randomized clinical trials). The entire paradigm of evidence-based medicine is one manner of ranking the quality of evidence supporting a therapy, but even that is not without its problems, not the least of which is the relative low weight it gives to basic scientific principles and prior probability. In essence, the paradigm of EBM ranks equivocal clinical trial evidence over well established basic science, even when that basic science demonstrates a proposed intervention to be utterly ridiculous on the basis of well-established physics and chemistry, as in the case of homeopathy (1, 2, 3). That is why I have become more interested in the concept of “science-based” medicine, in which medicine is informed not just by clinical trial data, but by science as well.

One increasingly common method of trying to make sense of the morass of data addressing various clinical questions is the medical literature phenomenon known as meta-analysis. A meta-analysis is different from a clinical trial in that it is a statistical reanalysis of data from existing trials. Generally the highest quality trials are chosen in accord with the principles of EBM, and the data from these trials is all lumped together and analyzed in order to provide in essence a more rigorous treatment of existing data than a systematic review of the literature. To be subject to meta-analysis, a medical question must have multiple studies addressing it, and the results must be quantitative. Better still is if the studies analyzed are of high quality (randomized, placebo-controlled, double-blind). Of course, when this is the case, if the studies trend in the same direction, a meta-analysis is not necessary. Most of the time, meta-analyses are done when there is conflicting data in the hope that amalgamating the various studies will result in a statistically significant trend one way or another that will allow one to draw inferences over whether an intervention works.

Unfortunately, meta-analyses are a favorite of mavens of so-called “complementary and alternative medicine” (CAM), where they are frequently used to take several weak studies and try to make them strong by lumping them together, based on the apparent belief that lumping a bunch of weak studies together will somehow produce a strong result. It seldom works that way. Worse, one weak study with a strong result can have inordinate influence on the results, which makes study selection important–another huge potential source of bias if selection criteria are not spelled out prospectively or are insufficiently rigorous. More often, we get dubious meta-analyses purporting to show that acupuncture helps fertility. Alternatively, ideologues sometimes use meta-analyses to push ideologically-motivated “science,” such as this meta-analysis claiming that oral contraceptive pills increase the risk of breast cancer.

For these reasons and others, I’ve always been leery of meta-analyses, preferring instead to rely on my own reading of the literature. However, that isn’t always possible, particularly for questions that are not part of my expertise. Unfortunately, I’ve just come across a study that provides quantitative data about how prone to bias met-analyses are. The study is a few months old, but DB’s Medical Rants pointed it out.

The investigators from McGill University took a clever approach. From the abstract:

We searched the literature for all randomized clinical trials (RCT) and review articles on the efficacy of intravenous magnesium in the early post-myocardial infarction period. We organized the articles chronologically and grouped them in packages. The first package included the first RCT, and a summary of the review articles published prior to first RCT. The second package contained the second and third RCT, a meta-analysis based on the data, and a summary of all review articles published prior to the third RCT. Similar packages were created for the 5th RCT, 10th RCT, 20th RCT and 23rd RCT (all articles). We presented the packages one at a time to eight different reviewers and asked them to answer three clinical questions after each package based solely on the information provided. The clinical questions included whether 1) they believed magnesium is now proven beneficial, 2) they believed magnesium will eventually be proven to be beneficial, and 3) they would recommend its use at this time.

What makes this study interesting is that the reviewers to which these packages were sent all had published meta-analyses themselves and were thus experienced in interpreting meta-analyses. In addition, each package was constructed based only on what was known at the time of the most recent randomized clinical trial in each package. Moreover, the analyses of the articles in each package were performed by strict criteria:

Data were abstracted from the articles by a trained research assistant using standardized data abstraction forms, and verified by a second trained person. Differences were resolved by consensus. We assessed the quality of original manuscripts using the Jadad scale [16,17] and included the information in the reports to the reviewers (there was no a priori exclusion criteria or subgroup analysis). We had initially also used the Chalmers scale [17,18] but abandoned it when the reliability between data abstractors was very poor. After data abstraction, we conducted separate meta-analyses (comparison treatment was always placebo) based on the first RCT, the first 3 RCTs, 5 RCTs, 10 RCTs, 20 RCTs and 23 RCTs. At each time point, the reviewer was given a meta-analysis for mortality, and a separate meta-analysis for arrhythmias. Each meta-analysis included random and fixed effects analyses, a forest plot [8], cumulative forest plot [8], Galbraith plot [19], L’Abbe plot [20] and publication bias statistics and/or plots [8].

All of these aspects are standard statistical measures of the quality of studies included in the meta-analysis. If meta-analyses are objective analyses of the studies included in them, then we would expect that skilled investigators who do meta-analyses routinely would look at the same data, the same papers, and the same analyses and come to fairly similar conclusions. That’s not what happened, though. In fact, there was considerable heterogeneity in the interpretations of the same data, with some subjects concluding that magnesium was effective and some that it wasn’t, with a couple calling the evidence equivocal. Moreover, as the number of studies increased, so did the heterogeneity in interpretation and conclusion:

The discrepancies increased after 20 RCTs, when heterogeneity increased and the OR from the fixed effects and random effects models diverged; 1 reviewer strongly agreed the effect was beneficial, 4 reviewers agreed it was beneficial, and 3 reviewers disagreed it was beneficial. Similar discrepancies were observed when the reviewers were asked if they believed the treatment would eventually be proven beneficial. Finally, when asked if they would recommend the treatment, 4 reviewers fairly consistently said yes (excluding the meta-analysis based on 1 RCT), and 4 reviewers fairly consistently said no.

In other words, given the same studies and the same extracted and abstracted data, different investigators came to very different conclusions. This is in contrast to the dogma that tells us that meta-analyses represent the most objective method of reviewing large bodies of studies. This study casts considerable doubt on this contention, as the authors point out:

Although systematic reviews with meta-analyses are considered more objective than other types of reviews, our results suggest that the interpretation of the data remains a highly subjective process even among reviewers with extensive experience conducting meta-analyses. The implications are important. The evidence-based movement has proposed that a systematic review with a meta-analysis of RCTs on a topic provides the strongest evidence of support and that widespread adoption of its results should lead to improved patient care. However, our results suggest that the interpretation of a meta-analysis (and therefore recommendations) are subjective and therefore depend on who conducts or interprets the meta-analysis.

The significance of this study is that it doesn’t look at differences in the selection of studies for the meta-analysis or the interpretation of or extraction of data from the studies included in the meta-analysis. Every reviewer was given the same package, the same data, and the same statistical analyses of the included studies, thus eliminating this issue. Even given that, reviewers still interpreted the results of the meta-analyses very differently. Not surprisingly, the more studies with more heterogeneity between them, the more divergent the interpretations of the reviewer became. The results of this clever exercise provide just one more bit of evidence that leads me to believe that meta-analyses are nothing more than systematic reviews of the literature with attitude. That’s not to say that meta-analyses of the literature aren’t often useful, just as systematic reviews of the literature, are. They are in the same way that systematic reviews are: They boil down a large number of studies and suggest an interpretation. Let’s just not pretend that meta-analyses are so much more objective than a systematic review as to be considered anything more.


Shrier, I., Boivin, J., Platt, R.W., Steele, R.J., Brophy, J.M., Carnevale, F., Eisenberg, M.J., Furlan, A., Kakuma, R., Macdonald, M., Pilote, L., Rossignol, M. (2008). The interpretation of systematic reviews with meta-analyses: an objective or subjective process?. BMC Medical Informatics and Decision Making, 8(1), 19. DOI: 10.1186/1472-6947-8-19


  1. #1 Markk
    August 29, 2008

    100% agreement. Any meta-analysis always has two strikes against it for me. Just like the term says, a good meta-analysis is more about getting information about the studies that make it up rather than the underlying variables. In that case, they are justified. The easy availability of R’s metabin and other similar packages make then relatively easy to do mechanically but oh those details.

    The underlying studies never are measuring exactly the same thing in the same way (or they wouldn’t need to be “meta”). Combining these underlying data sets introduces whole new classes of errors and they are generally not categorized or even understood by anyone.

    I don’t interact with them anymore except as an interested layman, but when I do read papers of meta-analysis I always feel that a good review article going into depth about the data of each study would be more useful. But those kind of papers aren’t “original” so I guess this is what we get.

  2. #2 Danimal
    August 29, 2008

    Yawn. I am still a second hand smoke denialist. Read here (see last posts) and yes I am as leery as you. Signed formally Dan, now Danimal.

  3. #3 wilsontown
    August 29, 2008

    Very interesting post. You say “Unfortunately, meta-analyses are a favorite of mavens of so-called “complementary and alternative medicine” (CAM), where they are frequently used to take several weak studies and try to make them strong by lumping them together…”.

    I’ve also found the opposite, where CAM advocates attack meta-analyses that seem to provide evidence against CAM. The obvious example is the Shang et al. meta-analysis of homeopathy, which showed that the largest and highest quality trials of homeopathy are negative. But the article is never criticised on the sensible grounds you give for being cautious of meta-analyses in general. Instead, the authors are accused of research misconduct, or of peddling junk science, based on transparent misunderstandings of the paper or stuff that is just blatantly made up. The problem here is misinterpretation, whether deliberate or not, rather than a disagreement among reasonable people.

  4. #4 Orac
    August 29, 2008

    Actually, when a meta-analysis comes up with a negative result, I’m more inclined to believe it, still recognizing the shortcomings of meta-analyses. In the case of homeopathy, of course, there are so many other reasons to conclude that homeopathy does not work that a meta-analysis is nice simply as confirmatory data. However, the same could be accomplished with a good quality systematic review of the literature.

  5. #5 Danimal
    August 29, 2008

    wilsontown one of the things that I frequently comment about is the failure of public health to communicate with the public. I do not mean Orac or PalMD as they do good jobs, but they really do not have the exposure they need to make it happen.

  6. #6 Eric
    August 29, 2008

    I on the other hand am a huge fan of the meta-analysis, if nothing else than for its ability to examine publication bias (something you *cannot* do with a non-systematic literature review, no matter how good) and to uncover patterns in a body of publication that exist either outside the language of the paper, or in in the statistical underpinnings of a body of literature.

    Faulting its use by various woo-hucksters is hardly a condemnation of the technique itself, but rather its inappropriate use by researchers. If we use the ‘used by woo’ criteria, RCTs and basic statistics are out the window too, considering how vulnerable to bias and distortion they are.

  7. #7 D. C. Sessions
    August 29, 2008

    The problem here is misinterpretation, whether deliberate or not, rather than a disagreement among reasonable people.

    Let us distinguish “misinterpretation” from other interpretations-to-invalid-conclusions:

    * noninterpretation: the refusal to draw reasonable conclusions
    * misinterpretation: (possibly innocent) invalid interpretation.
    * malinterpretation: the deliberate drawing of invalid conclusions

  8. #8 daedalus2u
    August 29, 2008

    Orac, a slight correction. A meta-analysis isn’t “data”, it is analysis. It is a model of reality based on data. The “data” may be the trials that have been done and the papers that have been written, but the meta-analysis itself is analysis, not data.

    There is a comment on the website following the paper (worth reading). It notes that the two cardiologists came up with essentially the same and (presumably the correct) response because they knew more about the physiology of MIs.

    I don’t think it is so much a question of the others being “subjective” in their analysis, simply that they didn’t have the background in cardiology to be objective and to include important data that was not bundled together in a nice little package that could be read in a few hours.

    You really can’t give an “expert” opinion unless you are an expert. The only way to become an expert is to read and understand a large fraction of the relevant literature. There is no short-cut to that. Anyone who thinks they have a short-cut way of becoming an expert is afflicted by the arrogance of ignorance.

  9. #9 Dr. T
    August 29, 2008

    Questions about the study:

    1. Were the reviewers paid for their time? I suspect not, and some of them may not have been thorough.

    2. Did they measure the reviewers’ understanding of the articles? Just choosing physicians with meta-analysis experience is not enough. Other commenters pointed out that few physicians are experts in MI pathophysiology and treatment.

  10. #10 TherExtras
    August 29, 2008

    Several cogent comments.

    “Actually, when a meta-analysis comes up with a negative result, I’m more inclined to believe it, still recognizing the shortcomings of meta-analyses.”

    This statement and, I’ll say – Eric’s comment -have caused you to statistically-woo, Orac.

  11. #11 WotWot
    August 30, 2008

    If the results of meta-analysis are appropriately qualified in relation to any deficiencies in the available database (including in the basic experimental and clinical trial methodologies used to construct that database), then they work well.

    If they do not explicitly and fully take account of deficiencies, and temper their conclusions accordingly, then they are a real problem.

    Unfortunately, I have read far too many that fall into the second category.

  12. #12 Charlotte
    August 31, 2008

    From the paper: “In order to ensure that all reviewers based their decisions on the same information, they were instructed to ignore any knowledge they might have through their personal experience or other readings and to base their responses only on the information provided to them through the review articles, original research articles and meta-analyses.”

    Since the researchers chose the treatment because it was contentious, I’m not convinced that the reviewers would have been able to simply ignore their prior knowledge. I’m sure they all approached it honestly, but the importance of blinding revolves around the way in which biases manifest themselves in subtle ways.

    I think there’s work to be done in improving meta-analyses, but I’m not as suspicious of them as Orac. In the acupuncture example, the problem seems to me to be more GIGO than a reflection on the concept of meta-analysis.

  13. #13 NM
    August 31, 2008

    I’ve written a meta-analysis before. One important thing I’ve learned doing MAs and RCTs is that all clinicians ought to have an epidemiologist/biostats person stapled to them when undertaking clinical research. But the reverse statement is also true. You need information that is both methodologicaly exact but also clinically interpretable.

    The issue with this study of MA interpretation is that the sample size is 8. The reviewers are repeatably sampled (with the mail out packages- and thus the observations are not independent) and they are not going to be initially naive to the debate (as Charlotte helpfully points out). Having said that it is a little worrying that there was such divergence.

    The issue may not be an MA issue. The issue might be the compatability of data from different RCTs. And this is a major issue in MA research. Overcoming it seems to be possible through the establishment of global cooperative networks that agree to trial harmonisation. This allows the development of prospective MAs that await the addition of new trial data.

    Through this blog (and others) I’ve observed that sCAM will misuse any study type in order to ‘prove’ anything they have already concluded is true. It’s not just a problem with MAs. As with anything GIGO.

    Meta-analyses are no different from any other epidemiological research tool. If you are dishonest or biased you can manipulate the study findings. This is true of all science- it depends on it’s users being scrupulously honest with each other and themselves. They are still s step up from the old qualitative reviews because they are auditable.