Respectful Insolence

Pity poor John Ioannidis.

The man does provocative work about the reliability of scientific studies as published in the peer-reviewed literature, and his reward for trying to point out shortcomings in how we as scientists and clinical researchers do studies and evaluate evidence is to be turned into an icon for cranks and advocates of pseudoscience–or even antiscience. I first became aware of Ioannidis two years ago around the time of publication of a paper by him that caused a stir, entitled Contradicted and Initially Stronger Effects in Highly Cited Clinical Research. In that study, Ioannidis concluded that approximately 1/3 of highly cited clinical studies are later found to be incorrect and that therapeutic effects initially found in clinical trials are often found in later studies to be smaller or even nonexistent. Ioannidis then followed up with an editorial entitled Why Most Published Research Findings Are False. In response, I noted how alternative medicine mavens and others who are into pseudoscience jumped all over the study and how it even explained why antivaccinationists should not be surprised that effects attributed to thimerosal in vaccines in early iterations of studies disappeared on further analysis. Indeed, I even pointed out how some prominent credulous bloggers were citing Ioannidis as evidence that most scientists are “lousy.”

It’s happening again.

Fellow ScienceBlogger Mark Hoofnagle pointed out that this resurgence of crank interest in Ioannidis’ work seems to be due to a rather poorly written Wall Street Journal editorial, apparently inspired by Ioannidis’ most recent article, that almost totally missed the point of his studies, seemingly concluding, as prominent bloggers did two years ago that the problems with published research were because of “lousy” scientists.

Predictably, quacks and advocates of pseudoscience have jumped all over Ioannidis’ findings, erroneously and foolishly representing it as “proof” that the science they don’t like is hopelessly wrong, including HIV/AIDS denialists and global warming “skeptics.” Sadly, Ioannidis’ work rather easily lends itself to being misinterpreted by the pseudoscientists as a reason to dismiss the findings of science–as if doing so would make their pseudoscience correct absent good evidence for their position.

None of Ioannidis’ work should come as any shock to clinical investigators or scientists. Indeed, it did not. Two years ago, I actually took the opportunity to present Ioannidis’ JAMA article for our weekly journal club. Although it provoked a lively discussion, not a single one of my surgical colleagues were the least bit surprised or disturbed by its findings. Of course first attempts to answer a clinical question often produce incorrect or exaggerated results! It is the totality of evidence that has to be examined, and, until it is, new findings should be treated with care and skepticism.

There are, of course, many systemic problems in biomedical research. No one “in the biz” would deny that. But, in many ways, the present system of randomized clinical trials and peer-review is, to paraphrase Winston Churchill regarding democracy, the worst system for finding the best treatments–except for all the rest. It is indeed true that there are far too many crappy studies published. There is indeed a bias towards publishing studies that actually show a correlation, treatment effect, or other positive result, rather than a negative result. There has been a push towards encouraging the publication of negative studies, but the bias has not disappeared. Despite all that, it is a big mistake to take Ioannidis’ findings as “proof” that science is not the best methodology we have for answering fundamental questions about how the universe works, the pathogenesis of disease, or for identifying the most efficacious treatments. Certainly, it far surpasses any alternatives.

Perhaps the best analyses of the real significance of Ioannidis’ findings come from Steve Novella and Alex Tabbarok. Tabbarok, for example, explains very eloquently why, even under “perfect” conditions, as many as 25% of hypotheses may be incorrectly found to be true:

Suppose there are 1000 possible hypotheses to be tested. There are an infinite number of false hypotheses about the world and only a finite number of true hypotheses so we should expect that most hypotheses are false. Let us assume that of every 1000 hypotheses 200 are true and 800 false.

It is inevitable in a statistical study that some false hypotheses are accepted as true. In fact, standard statistical practice guarantees that at least 5% of false hypotheses are accepted as true. Thus, out of the 800 false hypotheses 40 will be accepted as “true,” i.e. statistically significant.

It is also inevitable in a statistical study that we will fail to accept some true hypotheses (Yes, I do know that a proper statistician would say “fail to reject the null when the null is in fact false,” but that is ugly). It’s hard to say what the probability is of not finding evidence for a true hypothesis because it depends on a variety of factors such as the sample size but let’s say that of every 200 true hypotheses we will correctly identify 120 or 60%. Putting this together we find that of every 160 (120+40) hypotheses for which there is statistically significant evidence only 120 will in fact be true or a rate of 75% true.

Thus, even if the research is “perfect,” with no flaws in the experimental design and no biases, it is not unreasonable to predict that at least 25% of the results would ultimately found to be incorrect, assuming the standard cutoff for statistical significance as p < 0.05. I've pointed out before in the context of discussing clinical trials for homeopathy that, even under “perfect” conditions, at least 5% of trials studying homeopathy would appear to be “positive” just due to random chance alone, but I didn’t take into consideration all the factors that Tabbarok did. In fact, in retrospect, I realize that I was wildly naïve and optimistic in my estimate. Taking into account deficiencies in study design, it’s not at all difficult to see how around half of medical studies could come up with incorrect results. Unfortunately, one way to reduce that number is a way that cranks will not like at all–not one bit.

What I’m talking about is the “prior probability” (i.e., the scientific plausibility or likelihood of its being correct based on basic science and previous data) of a hypothesis. If one takes into account prior probability, we can decrease the likelihood of false positive results. As Steve Novella put it:

Tabbarok points out that the more we can rule out false hypotheses by considering prior probability the more we can limit false positive studies. In medicine, this is difficult. The human machine is complex and it is very difficult to determine on theoretical grounds alone what the net clinical effect is likely to be of any intervention. This leads to the need to test a very high percentage of false hypotheses.

What struck be about Tabbarok’s analysis (which he did not point out directly himself) is that removing the consideration of prior probability will make the problem of false positive studies much worse. This is exactly what so-called complementary and alternative medicine (CAM) tries to do. Often the prior probability of CAM modalities – like homeopathy or therapeutic touch – is essentially zero.

If we extend Tabbarok’s analysis to CAM it becomes obvious that he is describing exactly what we see in the CAM literature – namely a lot of noise with many false-positive results.

That is exactly what we see, particularly when homeopaths can rattle off studies that appear to support the efficacy of homeopathy. However, if one examines the totality of evidence, these apparent “positive” studies turn out to be just background noise. What’s instructive is to look at Tabbarok’s observations and suggestions about what can be done about the problems that plague biomedical research:

  1. In evaluating any study try to take into account the amount of background noise. That is, remember that the more hypotheses which are tested and the less selection which goes into choosing hypotheses the more likely it is that you are looking at noise.
  2. Bigger samples are better. (But note that even big samples won’t help to solve the problems of observational studies which is a whole other problem).
  3. Small effects are to be distrusted.
  4. Multiple sources and types of evidence are desirable.
  5. Evaluate literatures not individual papers.
  6. Trust empirical papers which test other people’s theories more than empirical papers which test the author’s theory.
  7. As an editor or referee, don’t reject papers that fail to reject the null.

All of these are good bits of advice for evaluating the scientific literature. From my perspective, it’s critical to evaluate the totality of the scientific evidence. Single studies may be produce questionable results, but eventually science will correct them. To me, this is the most telling difference between evidence-based medicine and “alternative” medicine, between scientific medicine and pseudoscience: the ability and willingness to change hypotheses based on the evidence. Indeed, Skeptico shows us an example of just this difference in discussing an HIV vaccine that was abandoned because studies showed that it doesn’t work:

The difference between this and complementary and alternative “medicine” (CAM) is starkly shown. Real medicine is tested for efficacy, and abandoned if it doesn’t work. When was the last time any CAM treatment was publicly abandoned by its practitioners because they discovered it didn’t work?

The answer is: Never.

Alternative medicine mavens frequently accuse us “conventional” doctors of being “dogmatic” or otherwise unwilling to consider “different” ideas (specifically their ideas) about medicine and the treatment of disease. In actuality, it is supporters of alternative medicine and pseudoscience who tend to be far more dogmatic than any conventional physician. If Ioannidis’ results are correct, the results of 1/3 of seemingly very important papers were ultimately refuted, resulting in the abandonment of accepted treatments once thought sound. Doesn’t that tell you something? It should! It tells me that “conventional” medicine changes its tests, understanding of disease, and treatments on the basis of new evidence and, more importantly, abandons treatments found to be ineffective or not as effective as newer therapies. This winnowing and optimization process may be very messy to watch. It may not happen as fast as we’d like, but happen it does eventually. Contrast this to alternative medicine, where there are still alternative medicine practitioners pushing Laetrile (despite the fact that it was shown to have no efficacy against cancer in well-designed clinical trials 25 years ago), chelation therapy for coronary artery and peripheral vascular disease (despite multiple randomized studies during the 1990′s showing it to be no better than placebo), and homeopathy (despite 200 years of science showing that it, too, is no better than an elaborate placebo). Unlike conventional medicine, alternative medicine does not change, other than to gussy up its woo with “science-y”-sounding terminology, especially quantum theory or to add another scientifically highly improbable “treatment” to its panoply of scientifically improbably treatments. (Hulda Clark’s Zapper, anyone?) More importantly, it almost never abandons therapies found by sound research to be ineffective, as Skeptico so ably (and sarcastically) pointed out.

In the end, contrary to the best efforts of cranks, pseudoscientists, and quacks to portray its conclusions as indicating that science is “basically fraudulent,” remember that Ioannidis’ work does not give any succor at all to advocates of pseudoscience, be they alternative medicine mavens, HIV/AIDS denialists, or any other. In fact, it is work like his that differentiates science and evidence-based medicine from pseudoscience and alternative medicine. Ioannidis looks at how we as a profession do biomedical research and clinical trials and finds the faults even in studies thought to be the gold standard, all with a mind to improve how we do research, suggesting more replication, more care, and to be cautious about initial findings. There’s also an irony in this, given what Ioannidis is saying and how cranks are representing it. That’s because Ioannidis would have no way of determining what percentage of scientific findings are “wrong” if science weren’t a self-correcting enterprise, if scientists hadn’t continued to work on the same problems and published findings that contradicted and ultimately refuted the early findings. Again, contrast that to how alternative medicine operates, where, once a treatment becomes popular (homeopathy, for example), no matter how much science shows it to be implausible or not to work, it is defended for over 200 years, even to the point of trying to explain it based on torturing the findings of the latest science.

The misuse and abuse of Ioannidis’ work is, when you come right down to it, nothing more than a variant of the old crank chestnut of “science has been wrong before.” My usual response is: So what? Science has been wrong before, but it was generally scientists, not pseudoscientists, who found the error and corrected it. Moreover, they did it based on the evidence, not on cranks’ favored techniques of logical fallacies, cherry picking data (HIV/AIDS denialists’ and creationists’ favorite technique), and misrepresenting what science actually says (creationists’ favorite technique). It does not follow from the past mistakes of science that the science now is necessarily wrong. If you want to show that, then you need evidence, not appeals to past findings of science that were later found to be incorrect.

Physicians and scientists are generally aware of the shortcomings of the biomedical literature. Most, but sadly not all of us, know that early findings that haven’t been replicated yet should be viewed with skepticism and that we can become more confident in results the more they are replicated and built upon, particularly if multiple lines of evidence (basic science, clinical trials, epidemiology) all converge on the same answer. The public, on the other hand, tends not to understand this. Where we in science see changes in proclamations about a health question as the natural outgrowth of the self-correcting nature of science, the public sees them as confusing, particularly when the evidence is not yet as clear as we would like and science shifts back and forth between two positions. This problem is frequency exacerbated by shoddy science reporting, where each new study is breathlessly reported as the latest, greatest, and seemingly final word on a topic, even though the paper itself will often contain a detailed discussion of the uncertainties and possible sources of error in its results. Somehow, this uncertainty gets lost in the reporting. Certainty is so much more interesting and satisfying. That’s why it’s a good thing that Ioannidis and others like him remind us from time to time of the uncertainty inherent in biomedical science. Even if cranks have a field day with these sorts of findings, that’s just the price of good science.

Comments

  1. #1 hoary puccoon
    September 24, 2007

    I think Tabbrock may be overstating the extent to which scientists fail to reject the null hypothesis when they have a real effect. Often, the official alpha may be <.05, but the actual alpha (probability of type 1 error) can be something more like <.0001. In fact, I wonder whether clinicians really pay as much attention to studies skimming in at just under <.05 as they do to studies where the effect is much stronger.

    Also, does it matter what the available alternatives are? I can’t see physicians using herbal rememdies instead of insulin to treat type 1 diabetes, even if the herbals did come in at <.05. I suspect the problem with false positives is much stronger when there isn’t an accepted remedy and physicians are flailing around, trying anything that might work.

  2. #2 blf
    September 24, 2007

    Apropos to the point of “not rejecting the null hypothesis”, something I learned a long time ago when writing reports or proposals (I’m a software engineer, so this is in the context of software and/or hardware projects, issues, et al.) is always include the option of “Do nothing” (or “Didn’t work” or whatever is appropriate). And seriously evaluate it. I have also learned it’s a good to explicitly state I’m not necessarily enumerating all the possibilities and what meaning (if any) should be attached to the order of enumeration. For instance:

    … Possible actions include, but may not be limited to (and are not listed in any particular order):

    1. Do nothing. Advantages: … Disadvantages: …

    2. …

    Recommendation and rationale: …

    I’m still amazed at how some people, people who should know better, will take a mere listing or discussion as complete.

  3. #3 Alvaro
    September 24, 2007

    Great post. Ioannidis work is very useful, but am not sure why he’d use the title “Why Most Published Research Findings Are False”.

    Did you read the NYT magazine’s criticism of epidemiological studies last week? http://www.nytimes.com/2007/09/16/magazine/16epidemiology-t.html?_r=1&em&ex=1190260800&en=6e827ef7fafd4895&ei=5087&oref=slogin

    It is no wonder that there is a disconnect between scientists, health practitioners and journalists-even among the ones with science-based predispositions. What we are saying is that peer-review is not enough. That large epidemiological studies aren’t, either. What is? how do we explain it clearly to people who aren’t professional scientists? do we need an additional “threshold” to consider some claims as validated/ ready to influence policy? who assesses that threshold? maybe the NIH should publish such a journal? is there something like this?

  4. #4 Sastra
    September 24, 2007

    Pity poor John Ioannidis.

    Oh. I have a good suggestion for poor John. He should take a nice vacation where he can lay back on a lounge chair in the sun and — for fun — turn his razor-sharp analysis skills on alt med and CAM studies. Other than the effort of punctuating the air with occasional giggles, I foresee a relaxed but productive two weeks for him. It’ll be like reading science fiction, after all that brain work. Comic relief.

    Then he can publish those results. Suggested title: “Hey, If You Thought Mainstream Medicine Was Bad…Getta Loada This!”

  5. #5 Monado
    September 24, 2007

    Very nice! I’m linking to this one.

  6. #6 James
    September 25, 2007

    I’m a regular reader of Marginal Revolution, where Tabbarok blogs. I’ve always thought he was a sharp customer.

  7. #7 Dan
    September 25, 2007

    Part of the problem with medical science his that research appears in the press which has not been peer reviewed and is used to advance a political objective. Dr. Michael Siegel discusses this as Science By Press release.

    http://tobaccoanalysis.blogspot.com/2007/09/science-by-press-release-new-tobacco.html

    These studies may never be published because they are flawed, but the objective of the study, political, will have been occomplished. I have warned about this stuff before.

  8. #8 jspreen
    September 25, 2007

    The difference between this and complementary and alternative “medicine” (CAM) is starkly shown. Real medicine is tested for efficacy, and abandoned if it doesn’t work. When was the last time any CAM treatment was publicly abandoned by its practitioners because they discovered it didn’t work?

    So, if I get it well, the fact that a certain kind of treatment was never publicly abandoned by the people who practice them, is hold as an argument AGAINST those treatments. Am I the really only one who thinks we might consider things the other way around? That maybe they were never publicly abandoned because no practitioner ever discovered they didn’t work?

    This article sure is a fine example of scientific goal keeping against the herds of presumed charlatans who threaten to take over control and increasingly find the ear of many people who might not be so totally ignorant as the promoters of real medicine (whatever that may be) seem to think they are.

  9. #9 Joe
    September 25, 2007

    No, jspreen, you are not the only person to note that quacks do not abandon their methods because they do not discover the methods don’t work. However, the reason they never make that discovery is incompetence. That is why they subscribe to quack notions in the first place.

  10. #10 jspreen
    September 25, 2007

    However, the reason they never make that discovery is incompetence.

    Yeah, any person on earth subscribing methods other than chemo poison and the like is automatically put on the list of incompetent people. Like people who question the HIV=Aids nonsense. Because, and if that’s not a catch then I don’t know what a catch is: Somebody who questions when there is nothing to question can only be an inferior creature.

  11. #11 Jonathan Ramlow
    September 25, 2007

    The recent New York Times article by Gary Taubes, cited above, is a very useful discussion of some very real problems that resurface time and again and genuinely puzzle everyone except “true believers”, for whom no evidence of any kind will ever lead them to change their minds. This is more of an issue for epidemiologists than clinicians and bench scientists, perhaps, because we epidemiologists have come to depend so heavily on observational studies that cost a lot of money and take a long time to complete.

    Taubes’ major point is one that has been made over and over again for years: even the largest, most well-executed prospective studies are still observational studies and distressingly likely to yield erroneous results even with the best possible data collection methods and statistical controls for those potential confounding factors that have in fact been considered. Taubes describes the (fortunately) small number of circumstances in which purely observational studies yield unambiguous results from which causality can reasonably be inferred–a very strong association between an exposure and an outcome (cigarette smoking and lung cancer) and a “bolt from the blue” (maternal DES exposure and vaginal cancer) are good examples–but he correctly points out that when associations observed in observational studies are weak or only moderately strong and/or are statistically convoluted, randomized controlled trials are the only way to confirm or refute causality. The fact that such trials often fail to support previously observed associations naturally produces confusion among professionals and laypersons alike when those persons have been led to believe by epidemiologists and/or journalists that the original association was more or less bullet-proof.

    Epidemiologists, like other professionals, sometimes forget to be scientists first and advocates only after the evidence has become unambiguous. That’s why Bradford Hill and others put so much time and effort into trying to impress upon their peers the importance of establishing–and sticking to–a set of logically and scientifically rigorous criteria for causal inference in observational studies. It’s quite discouraging to see these fundamental principles of epidemiologic research either ignored altogether (e.g. the so-called “precautionary principle”), underemphasized (e.g. HRT had to be a good thing because so many people already thought so), or hijacked by mercenaries (e.g. an apparent relative risk of 2.0 can be said to meet the standard of “more likely than not” in toxic tort cases). It’s no wonder that epidemiologists working in many areas are increasingly skeptical about the real value of the results we produce, with some, as Taubes mentions, going so far as to suggest that it might be time to “call it a day”.

  12. #12 hoary puccoon
    September 26, 2007

    Arggh. Only half my post appeared. My point was, alpha levels (probabilities of type 1 error) of officially “not more than .05″ are frequently much lower–.0001 and less. I know in social science effects that skim in under the wire are less regarded than stronger effects. The number of studies where an alpha of .0001 appears when there is no effect is, of course, one in ten thousand.

  13. #13 Paul Power
    September 26, 2007

    “Small effects are to be distrusted”.

    What is a “small” effect ? Why should it be mistrusted more than any other?

  14. #14 Linda
    September 27, 2007

    The difference between this and complementary and alternative “medicine” (CAM) is starkly shown. Real medicine is tested for efficacy, and abandoned if it doesn’t work. When was the last time any CAM treatment was publicly abandoned by its practitioners because they discovered it didn’t work?

    So, if I get it well, the fact that a certain kind of treatment was never publicly abandoned by the people who practice them, is hold as an argument AGAINST those treatments. Am I the really only one who thinks we might consider things the other way around? That maybe they were never publicly abandoned because no practitioner ever discovered they didn’t work?

    I think that’s the point. Practitioners don’t test whether or not they work, so they are unable to discover the truth of the matter. Instead they rely on a method of evaluation that is essentially unfalsifiable, which makes the passing of such a test meaningless.

    This article sure is a fine example of scientific goal keeping against the herds of presumed charlatans who threaten to take over control and increasingly find the ear of many people who might not be so totally ignorant as the promoters of real medicine (whatever that may be) seem to think they are.

    I’m not sure you’re helping your cause by displaying ignorance of just how we fooled ourselves in the past by attributing the effects of chance and bias to a specific treatment effect. Despite the tremendous success achieved only after we began to remove those effects, the herds of presumed charlatans depend upon keeping the general public stuck on using pre-twentieth century methods of evaluation. That these methods resulted in blood-letting as the treatment of choice for infection, rather than antibiotics, doesn’t seem to register.

    Linda

  15. #15 progenitor4life
    September 27, 2007

    Slamming the WSJ is missing the point of the article. It is not that scientists are lousy but that the system is lousy. As Ioannidis in his PLoS article and you point out, it is the totality that counts, not only what comes after the hypothesis but before. Instead of pooling data, how about better studies?

    There are brilliant hypotheses (some right some wrong) and there are lousy, self-serving hypotheses fabricated just to “prove a point” and then there are the flailing around hypotheses where every idea has equal merit unless proven otherwise. NOT.

    Contrary to Clintonian (Bill not Hill) thinking, the current scientific literature is crafted towards spin. Say it enough times and it becomes the “truth.” Not every finding is meaningful but to get published it has to be, leading to intellectual overreach and persistent specious reasoning that becomes part of the vernacular. Good scientists now find themselves spending time, energy and money debunking the bunk. They also run the real risk of not being heard at all.

    The trouble with all this is that true medical progress gets lost in the muck and mire. Forget debating about the quacks,I can’t help but be worried about how this bias and intellectual overreach is preventing us from achieving real medical progress. Whether it is stem cells or global warming, the truth lies only one place, nature, no matter what we say or feel about it. The question is, how can the system be fixed so that those with a true contribution to make can be heard and noticed? Ioannidis himself thinks that changing the system at its root is unlikely. I still hold out hope that support of high intellectual, analytical and statistical standards could do wonders. Ban the university press release!

  16. #16 Linda
    September 27, 2007

    Posted by Paul Power
    “Small effects are to be distrusted”.

    What is a “small” effect ? Why should it be mistrusted more than any other?

    It is always possible that differences between groups may be due to factors unrelated to the treatment, like unrecognized biases in the formation of the groups. If we look carefully, we are pretty good at discovering anything that might have an effect, but it’s not perfect. If we see a medium or large effect (standardized mean difference of 0.5 or 0.8), the chance that we missed something of that magnitude is very small. But if we see a small effect (standardized mean difference of 0.2), the cummulative effect of small (and therefore unrecognizable) differences unrelated to the treatment being tested, could mean the result is spurious.

    Linda

  17. #17 Alan Viech
    September 28, 2007

    Hmmm, was it the scholarly doctor who wrote this article or his grandfather of ”fabricated” and then foisted the idea that stomach ulcers originated in the patients mind through the expression of ”bad thoughts”?
    Let’s see, stomach ulcers are the cause of over 80 pct of all stomach cancers. There are over a million cases of stomach cancer annually. That’s about 1 million dead people , screaming in pain while they are dying and conventional medicine told us for a century that ”it was all in your head”. Just who was it that thought up this sick , malicious and vicious diagnosis and then ”forced” us to accept it under pains and penalty of being shipped off to the medical Gulag Archipelligo (i.e. The psychiatrist). Think of the beauty of the evilness that these sick doctors who ”made up this diagnosis” in their heads accomplished. If your disease was a figment of your imagination and workings of your own mind, if you protested that it was not, it was further evidence of your mental sickness!!!! Beautiful. He must have been a Harvard doctor. Let’s see, one million dead stomach cancer patients annually for 100 years of ”lying and fabricating false evidence” , that’s murdering about 100 million people. More than perished globally in the entire conflict of the second world war. Think about that. A bunch of doctors just got together and made it all up and forced the sheep of America and the world to accept it. Can’t get any more evil than that. And, the Australian doctor who discovered the cause of stomach ulcers by accident? First they tried to run him out of the medical profession and then they gave the Nobel prize to an American for the discovery , which he stole from the Australian. Yes, the lofty ivy towers of our medical schools. If it were not so sick and evil it would be funny. But , millions dead on a fabricated diagnosis isn’t funny. The next time they pull that one on you, that something’s in your head, ask them if they speak Rommany and have crystal balls instead of real ones.

  18. #18 Linda
    September 28, 2007

    Excellent parody!