Randomized trial versus observational study challenge, IV: causation, concluded

Continuing our discussion of causation and what it might mean (this is still a controverted question in philosophy and should be in science), let me address an issue brought up by David Rind in his discussion of our challenge. He discussed three cases where a rational person wouldn't wait for an RCT before taking action, even though there remained uncertainty. The first was a single case report of rabies survival after applying an ad hoc protocol. The next was use of parachutes while sky diving. The third was the first reports of antiretroviral therapy for AIDS. Here's what Rind said in his post:

For instance, rabies is historically a 100% fatal illness once clinical symptoms appear in someone who has never been vaccinated. In 2005, a 15-year-old girl survived rabies after treatment with a novel experimental regimen. You could imagine that anything can happen once, and it may have been coincidence that this girl survived and received this novel regimen. However, if one more person with rabies were to receive the regimen and survive, it would seem spectacularly unlikely that the explanation would be anything but that the regimen works. A rational clinician would treat any new patient with rabies with that regimen from that moment until a superior regimen was found. (As far as I know, no other patient has survived rabies on this regimen.)

Similarly, in response to the humorous attack on EBM, asking what the RCT data are supporting parachutes when skyjumping, the GRADE group would respond that the magnitude of effect of parachutes is sufficient to constitute high quality evidence for their use (okay, clinical epidemiologists can be somewhat humor-challenged). That is, we have all sorts of historical evidence about what happens when people fall from great heights (and that we might consider only slightly indirect to falling from an airplane), as well as lots of observational data about what happens when people fall from airplanes wearing parachutes. Not everyone who falls from a great height without a parachute dies, and not everyone wearing a parachute lives, but the effect size is so large that we have high quality evidence for parachutes in the absence of a clinical trial.

To give one example that might feel more real, I was doing a lot of AIDS care in the 1990s. In 1995 an abstract was published from one of the ID meetings about the effects of ritonavir in about 50 people with late AIDS. (I've tried in vain in the recent past to find this abstract -- if anyone can point me to it, I would be grateful.) The results were like nothing we had seen before -- patients' CD4 counts rose dramatically, opportunistic infections improved, and many patients who would have been expected to die improved instead. We did not know what would happen long-term, but it was obvious, without any RCT, that ritonavir was effective therapy for AIDS, at least in the short term. By 1996, we were treating people with triple therapy "cocktails" for HIV, again without any RCTs with clinical endpoints, and watching people who had been dying walk out of hospitals and hospice care as their OIs resolved. The magnitude of effect was such that we had high quality evidence for these cocktails based on observational data alone. (Not that this actually prevented researchers from proceeding with an RCT of triple therapy, but that's a post for another day.) (David Rind, Evidence in Medicine)

In all three cases Rind identifies the salient feature as the size and clinical importance of the effect. Which prompts me to bring up the Hill viewpoints (dearly beloved by epidemiologists although they rarely use most of them in practice). The eponymous Hill here is A. Bradford Hill, an influential biostatistician of the mid twentieth century who has a double connection to our subject, one through his 9 view points an epidemiologist should consider when trying to judge if an association has characteristics of one that is causal (or, as he put it, after considering these factors, "is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?" [emphasis in original]). The other is that Hill is credited with successfully promoting the introduction of randomized trials into medicine. (That was in 1948, so it is surprisingly recent. Note that R.A. Fisher developed the theory of randomization in statistics but scholars have shown he had little influence in medicine).

We intend to discuss randomization later and won't discuss all of Hill's 9 viewpoints (often mistakenly referred to as criteria and even more mistakenly used in checklist form). In fact I'm only going to discuss one of them. If you want a probing and in depth discussion of the Hill viewpoints you can't do better than epidemiologist Kenneth Rothman's text (with Greenland and Lash) Modern Epidemiology (now in its third edition, but all three have essentially the same version of the Hill viewpoints).

The one aspect of Hill's list of nine I want to discuss (often shortened to a sublist of five) is biological plausibility. Interestingly Hill gave this in a highly qualified version:

(6) Plausibility: It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.

To quote again from my Alfred Watson Memorial Lecture (Hill 1962),

there was ‘no biological knowledge to support (or to refute) Pott’s observation in the 18th century of the excess of cancer in chimney sweeps. It was lack of biological knowledge in the 19th that led to a prize essayist writing on the value and the fallacy of statistics to conclude, amongst other “absurd” associations, that “it could be no more ridiculous for the strange who passed the night in the steerage of an emigrant ship to ascribe the typhus, which he there contracted, to the vermin with which bodies of the sick might be infected.” And coming to nearer times, in the 20th century there was no biological knowledge to support the evidence against rubella.’

In short, the association we observe may be one new to science or medicine and we must not dismiss it too light-heartedly as just too odd. As Sherlock Holmes advised Dr. Watson, ‘when you have eliminated the impossible, whatever remains, however improbable, must be the truth.’ (Austin Bradford Hill, “The Environment and Disease: Association or Causation?,” Proceedings of the Royal Society of Medicine, 58 (1965), 295-300 [available here]

Despite Hill's strong caveats, biological plausibility is an extremely powerful factor. Rind's parachute example is probably more related to biological plausibility than size of the effect. Here's another example. For many years it was my pleasure(?) to teach epidemiology to graduate students. When we came to the subject of clinical trials I used to use this example, which I thought I got from a PBS Nova show on plants, called "The Green Machine" (truthfully, I'm not sure I got this from the show, but it doesn't matter for purposes of illustration here). Anyway, a part of the show was an experiment involving a man (I seem to remember he was a clergyman) who said that if he prayed over plants they would grow taller. So they set up two hoods, one with plants that were prayed over by this gentleman and one which were just given the usual care. After 3 months it was found that indeed the objects of prayer were taller. I asked my students to explain this.

There was always a lively discussion and year after year the same explanations were advanced: the carbon dioxide exhalations involved in praying made a difference; more care was taken over these plants than the control plants; it was a chance event -- some plants will grow taller for random reasons; etc. The one explanation that year upon year was almost never advanced was the one the experiment set out to test, i.e., that prayer makes plants grow taller.

There could be several reasons for this (including belief on the part of a student that his professor didn't believe it), but it was clear that one very strong reason was that graduate students at a health sciences school didn't find this a biologically plausible explanation. There is more to this than the prejudice of scientists. If you were to accept a biologically implausible explanation (think homeopathy, where the number of molecules left after 25 dilutions is essentially zero), then you would have to make quite a lot of adjustments to other scientific facts, perhaps whole disciplines, to accommodate it. Biological plausibility exists within a very complex web of evidence, theory and experience, all of which might well need to be rethought on the basis of a single, potentially flawed experiment or observation. Given that, there's no wonder it is a powerful factor in causation judgments.

There are some implications to this. One is that a seemingly adequate clinical trial that produced biologically ("scientifically") implausible results wouldn't be taken as seriously as one that did, even though the design might be identical. The fault isn't with the judgment but with the notion that a particular design trumps all other evidence. By the same token, as scientific knowledge changes, how evidence is viewed might change. Suppose someone found a reproducible and biologically plausible mechanism for certain kinds of acupuncture anesthesia. That would likely alter the way existing clinical trials of acupuncture are viewed (NB: this is not a defense of acupuncture; it's an example that is imaginable as a scenario). Another is that taking RCTs or any kind of study in isolation, even if adding them together via a meta-analysis, doesn't tell the whole story, nor should it. There is an interlocking body of evidence that exists as context. It might be just as reasonable to say that a clinical trial which showed no effect of a vaccine that clearly raised neutralizing antibodies to was not biologically plausible.

The reason for raising this question in the context of the challenge was to indicate that when we interpret studies, of any design, there is (or there should be) a lot going on besides looking at p-values and what label the study has. Most of you know this, but it is hard not to be seduced by the latest headlines that say, "Drug X shown not to work for disease Y" or "Clinical trial shows efficacy of latest anticancer drug" or the ones that don't meet some set of standards.

I'll move on to other things in the next post, so now is your chance to weigh in on this subject.

More like this

Great series, Revere, thank you.

I think 'plausibility' is the proper framework to view the causality issue here. We could reframe questions like your challenge in terms of the 'interlocking body of evidence that exists as context'. We don't need a black and white answer to 'Does the drug work, based on the experiment (or based on a RCT)?'. Given the interlocking body of evidence, does the experiment (with all its faults) lower the uncertainty about what treatment is best for my refractory patients? It is a new piece of evidence. How much it affects my practice will probably depend on that context (for example, is there anything else that lowers my uncertainty more?)

This is the way we judge plausibility (biological or otherwise), is a given explanation 'A' more plausible because of new evidence 'B'? Bayesian statistical methods even allow us to put numbers on the levels of uncertainty before and after 'B'. So the argument will be about how much weight you give to, say, a meta-study based on RCTs versus, say, centuries of of clinical experience of medical pratitioners. Both are evidence in the interlocking body context.

The prayer example is great. Even doing a direct experiment doesn't lower our uncertainty about the central question. Why not?

Plausibility can lead us astray when there are gaps in our biological knowledge. When cholera was rampant in London, and was found especially in sections of town with bad sanitation (where the cholera miasmas would emanate from the stagnant sewage) it made good sense to get rid of the source of miasmas by constructing sewers that would flush it all into the Thames. Unless you had good data on water use at the household level, the theory of miasmas made very good sense, and the measures needed to deal with the situation seemed clear.

The other examples (rabies and parachutes) are related to Hillâs viewpoint of the strength of the observed association; there are also consistent associations seen in a variety of populations and settings. How many studies are needed to satisfy the consistency viewpoint depends on the strength of the association.

In the early 1980s, I was a member of Physicians for Social Responsibility, one of whose policy positions was that the detonation of a 20 megaton hydrogen bomb over a major American metropolitan area would have adverse public health consequences. There were really only two observations in human populations to support this position, both of them done with nuclear fission bombs in an Asian setting 40 years earlier, and neither of which involved dropping a placebo bomb over a similar sized city. However, two such âexperimentsâ were enough, considering the fact that many principles of biology, together with studies in the Bikini Atoll and elsewhere, made it reasonable to issue strong precautions against repeating the human experiment with current weapons.

Hill summarized his discussion of causality by reflecting on the consequences of our evidence-based decisions, and on the need for introducing differential standards before we arrive at a verdict. His example was restricting the use of a drug for morning sickness in early pregnancy. âIf we are wrong in deducing causation from association no great harm will be done. The good lady and the pharmaceutical industry will doubtless survive.â Other kinds of decision would require stronger evidence. But he added, âIn asking for very strong evidence I would, however, repeat emphatically that this does not imply crossing every âtâ, and swords with every critic, before we act.â

So, in addition to considering the variability of refractory hypertension, and the adequacy of the evidence, you will have to end by deciding on the consequences of making a wrong decision to introduce a new treatment. One doctor may arrive at one decision, based on the perceived balance of benefits and consequences; another may arrive at a different decision, depending on a different balancing of factors for which there are no Bureaus of Weights and Measures to determine the exact weight of each factor.

By Ed Whitney (not verified) on 11 Jan 2010 #permalink

Ed: Yes, plausibility can lead us astray. Many things can lead us astray. And lack of plausibility can also lead us astray. The lack of any demonstrable mechanism for EMF may or may not be relevant in dismissing those concerns. But it is very powerful, nonetheless.

As for London sewers, I think they came after the cholera pandemics. Miasma theory (in the US) was instrumental for piped water supplies in towns (you used the water to flush the streets). The sewering came when they realized they didn't have a way to remove the water they brought in (the history is obviously more complicated, of course, and had also to do with firefighting).

Hill comments on the potential weakness of plausibility but I don't think gave its power over our judgments enough credit.

Regarding plausibility and the parachute example, let's not forget that we do have more than just biological plausibility to consider. There is a great deal of actual science-based physical evidence for what goes on when someone splats and when someone avoids it by parachute. Statistical mechanics tells us that the air will not hold me up but that it will do a pretty decent job if I use a parachute. No need to bring biological plausibility into it when there is good physical law to back it up.

MIkeS: Interesting you should put it that way. I would have said the reverse: no reason to bring physical law into it when we know what happens when you go splat.

Revere: I agree with you. But if we're going to try to apply science to decide if a parachute is better than nothing, I'd save myself a lot of time and use physics instead of medicine :-)

Fascinating discussion and very valid points.

Some years ago I had cause (for the first time) to take a look at studies involving nutritional interventions, as opposed to pharmacological interventions.

1. I was surprised at the volumes of them (significantly more in the last decade)
2. I was surprised at the level of significance achieved in many of these trials despite the fact that they were small,and often used (but not always) the RCT model.

As studies involving RCT studies and nutritional interventions are generally deemed too small to be valid, little significance is placed on their findings. In recent years there have been large numbers of meta-analyses carried out which to me at least, usually seem designed to show no effect of interventions (sole exception the recent Vitamin D studies)

We would not accept the standards or criteria applied in these meta-analyses for a pharmacological intervention assesment. Yet when metanalyses are published for non pharmacological studies we seem to take results at face value and don't examine the standards and study exclusion crieteria all that closely - and I wonder - why are we doing this?

It seems to me that meta-analyses have become synonymous with researchers proving whatever point the study funders wish to prove through great selectivity in study inclusions and exclusions, and yet so much weight is placed on thier findings. I could give loads of examples but here is not the place for it.

The other question I have is why we are so wedded to the 'large' scale clincial drug trial, suitably blinded and crossed over. In the pharma sector (I used to work in this sector) years ago, one company would dismiss other companies trials on the grounds that they were 'too small' or insufficiently powered to detect variances in all population groups, and therefore were invalid. The driving force here was not science - it was competitive economics. Small companies could not afford to fund trial work of a size that was deemed sufficiently large, and so big pharma could lock out small or medium pharma competition - and now this mantra is unquestioned and unchallenged, and studies have to be huge to have any validity it seems.

When a study arm is failing to show statistically different results, it is either usually stopped on the grounds of economics (not science) or, powering is elevated to a level where differences can emerge if it can be funded.

Now, even few of the large companies can afford the trials that meet the standards that scientists and medics have come to expect - their global 'blockbuster' model is dead in the water and even they cannot commit that level of funding to run huge trials.

I am certainly not an epidemiologist, but it is my sincere hope that those of you that are, gather and solve this conumdrum. We need fresh thinking and a new approach.

By being so wedded to the blinded RCT of size 'x' and powering 'y' we may be missing the woods for the trees, not just here but in a host of 'emerging' bio-sciencies. We need a new paradigm.

Example - Emerging sciences of bioenergetics (to name but one field) does not fit the old reseacrch and RCT models, and funds for traditional trial work of the size and scale we have come to expect are not avialbale to these sectors - which holds back new advances in medical science and understanding, and continues to lock thinking within the existing pharmacological paradigm.

I dont have the answer, but IMHO its time to rethink how we evaluate research to accomodate smaller companies (who may have new and innovative approaches and developments that should be worthy of our attention) and find ways to assess benefits of health approaches that fall outside the traditional standard pharmacological models.

Excellent points on meta-studies. I have also had that impression.

Hi, I'm really interested to find your blog, and specifically this post!! I dont know if you will even know that I have written here, I hope so. I am writing a thesis at the moment for my MD (I'm in the UK) and I am interested in clinical trial methodology. I am currently trying to find some literature about this lack of need of controlled trial evidence in the face of large treatment effects (like those see with the introduction of penicillin). My supervisor tells me this is called the parachute effect, and that there are some papers about it. Can you point me in the right direction?!



Alas, Caroline, we have closed down the blog so we could do more science rather than write about it. Check the Categories in the left sidebar under "scientific method" for more on this topic. I recommend both of Paul Rosenbaum's books on observational study design, published by Springer Verlag. He spends a good deal of time on the subject of randomization, despite the titles. Good luck.