Let me start with an apology. This post is again fairly long (for a blog post). Blog readers don’t like long posts (at least I don’t). But once I started writing about this I was unable to stop at some intermediary point, although I might have made it more concise and less conversational. I haven’t done either. Even worse, I didn’t quite finish with the single point I wanted to make, so it will be continued in the next post. Hence the apology. Now to recap a bit and then get down to business.
My “challenge” from 10 days ago has drawn quite a response: over 40 quite substantive comments on the two posts (not counting mine) and a long and substantive post on the issues I raised from David Rind on his blog, Evidence in Medicine. It has also challenged me, since you have all raised many issues that deserve comment. Since I can’t comment on them all, I am going to pick certain themes that appear to me as more fundamental or that implicitly underlie many of the concerns you have raised. I’m going to have to do multiple posts, so if you don’t see your issue addressed, even obliquely in this post, it may very well appear in a subsequent one. If you are coming in in the middle of this and want to catch up, you can go back to the two previous posts (here and here) and Rind’s observations (here). But if you don’t want to do that or want a reminder, here’s the basic idea.
In my challenge I described a fictitious inquiry made by a cardiologist who had reason to believe, on the basis of emerging scientific literature, that an acceptably safe and FDA approved drug for epilepsy might be useful in treating refractory hypertensives (people with high blood pressure who do not respond to any of the conventional treatments). The science suggested this would work for a certain subset of refractory hypertensives, so she drew up a list of criteria obtainable from her practice’s medical records and tried it by giving the drug to patients who met the criteria, measuring their blood pressure at baseline and after a month or so of taking the medication. There was a statistically and clinically relevant drop in blood pressure in her convenience sample of 29 patients. If you read the original post you will find more details of how this fictitious cardiologist went about making information collection systematic, reliable and relatively unaffected by the fact that this was an unblinded trial — both the blood pressure measurer and the patients knew they were being treated. There was no placebo group. My challenge was for you to say whether it was reasonable for the cardiologist to alter her practice on the basis of this single arm, non blinded, uncontrolled, non randomized small trial of a convenience sample. I raised the stakes by saying she later published this clinical trial in a peer reviewed medical journal and hence it might also affect the practices of others.
I had two motives for this particular construction, one of which I will keep for later. The other was to use the polar opposite of what is considered the gold standard for medical evidence — a double-blind randomized controlled trial (RCT). The question was not whether an RCT was the best form of evidence (something that seems to be assumed by many people although it isn’t always true). It was whether you could rely at all on anything from an inquiry which had none of these characteristics.
My first move in this game will probably be unwelcome to some (most?) readers. I intend to make a tiny foray into the field known as the philosophy of science. I hope it will be interesting and not to onerous. A disclaimer: I am not a philosopher of science (although I confess to have given a couple of papers in venues inhabited by this academic species). To the extent that it informs my practice (and it does), it is always from the perspective of a scientist. But there are some philosophical questions that should not be avoided, if only because they are always there and avoiding them actually commits you to one or another view, sometimes multiple and incompatible ones. The philosophical question I want to raise is fundamental to science and the challenge itself, the nature of causation. There is a big philosophical literature on this, only a fraction of which I am familiar with, but much of it is so abstract and technical I can’t relate to it as a practicing scientist. So instead I will reframe the questions in the terms we meet them everyday in the clinic or laboratory or daily life.
First let me anchor the question in the challenge example. The underlying question is whether this anti-epilepsy drug is “doing anything” to lower blood pressure (we will have to deal later with the ultimate objective, helping the patient). At first blush, it appears to do something, but many of your comments and critiques give reasons for why what seems to be true (after patients took the drug their blood pressure went down) may not in fact have anything to do with the drug. This is a causation question. Did the drug cause the decrease in blood pressure or is there some other explanation? What do we mean by this question?
Let me pause for a moment to call your attention to something that is seldom mentioned. What philosophers call “causal necessity” is a very weird animal. We are used to hearing that epidemiology can only show association, not causation. This always infuriates me, because while not false, it is true of every empirical science. Whether it is chemistry, physics, molecular biology or whatever, all we see is associations. The correlations or associations we see in the field, clinic or lab don’t come with labels on them that say “causation.” We have to infer, i.e., make a judgment, that certain associations are of a particular kind we like to call “causal.” The philosophy of science, and more specifically what we call “scientific method,” is about how we make those judgments and what warrants making them. Said another way, scientific method is about how to trick Nature into revealing to us which associations are causal and which aren’t.
There is both a way in which the meaning of causality seems obvious and a way that it isn’t obvious and we are able to see them both simultaneously in the form of a “yes-but” response. One of the (many) problems with the notion of “cause” is that the word has multiple meanings and connotations and we often don’t keep them separate. Consider this from a famous philosopher of the law, HLA Hart (in work with Tony HonorĂ©):
Human beings have learnt, by making appropriate movements of their bodies, to bring about desired alterations in objects, animate or inanimate, in their environment, and to express these simple achievements by transitive verbs like push, pull, bend, twist, break, injure. The process involved here consists of an initial immediate bodily manipulation of the thing affected and often takes little time. (Hart HLA and Honoré A, Causation in the Law, Oxford University Press, 1959; h/t DO for cite)
While this is a very intuitive notion of causation it isn’t sufficient (or necessary) for scientific investigation. But it is often lurking in the background and carries with it both connotations of “agency” (an agent that does something) and responsibility (what got done is due to something). Assuming that cause includes natural laws and not just human agency, it also raises the question of whether every event has a cause. Philosophers and scientists have tried to sharpen the notion of causation for better application at least since Hume’s time (18th century and the “scientific revolution”) and arguably before. I’ll just add two other versions, since they are the ones that we scientists seem to use most as our unspoken mental models (if we have a mental model; a lot of science is done by rote, unfortunately). Both of these versions are directly applicable to the example. Unfortunately when you lift up the hood to see what’s inside, you immediately see deep problems, a realization which makes most scientists quickly slam the hood down again. Nothing to see here. Move along. We choose to assume there are or have been experts who know how that mechanism works and we don’t want to be bothered with metaphysical nitpicking. But I want to bother you with it, at least a little.
So consider these to typical scientist models of causation:
Version I: A causes B if, all things otherwise the same, a change in A brings about a change in B.
Version II: A causes B if, all things otherwise the same, but for A, B wouldn’t happen.
Version I. is the classic form of an experiment. Version II. is often the way we think about observational studies, although it also applies to experiments. In either case we have the problem of “all things otherwise the same.” Version II. also has the counterfactual problem: it relies for its content on an event that didn’t happen (that A didn’t occur and therefore B didn’t occur but everythng else was the same). Neither version requires placebo controls, randomization or blinded measurement, except insofar as they try to deal with either the association itself, “all things otherwise the same” or the counterfactual condition.
Note that there are two comparisons here: one is the whether A and B are associated (move together); the other is related to whether “all things are otherwise the same” (where the opposite of “the same” is “different,” that is, a comparison between two states of affairs). Placebo control is more related to the A and B association, especially in the sense of Version I., as is blinding (it relates to the validity of measurement). Randomization is more related to the second association (are all things otherwise the same). However there are interacting influences, so this isn’t a hard and fast distinction.
The most mind bending problem here for most people (once they think about it) is the counterfactual condition. Suppose patient A has refractory hypertension. I give him the anti-epilepsy drug. There is no meaningful change in blood pressure. What are the possibilities? One is that the drug has no effect on this patient. Another is that something else raised his blood pressure so the effect of the drug was masked (the something else could be measurement error or some external real effect). On the other hand, suppose his blood pressure changes. This could be because he is responsive to the drug (the drug “works”) or because of some other factor (which our commenters have been quite ingenious in positing). In either case we have the possibility of a causal effect or a non-causal one. How do we decide?
The two favored techniques by commenters are placebo control or a cross-over design. A placebo control doesn’t completely address the “all things otherwise the same” criterion of an experiment because there is one other thing that is extremely different: the placebo is given to a different person. A randomized placebo controlled trial doesn’t get around this. Instead of giving it to one different person it just gives it to a whole group of different people. Randomization doesn’t make things otherwise the same, either. It just makes the two groups approximately the same on average. But the counterfactual or “otherwise the same” criterion isn’t about groups, it is about individuals. That’s why a persistent concern about RCTs in general and meta-analyses in particular is that they might hide responsive or susceptible groups in the average.
A cross-over design is one where the person stays the same but the treatment changes. In this case we might wait a month or two for drug’s effect to dissipate and then see if the patient’s blood pressure goes back up. This has the advantage that it is the same person so more “other” things are the same. But of course over the span of time other things might have changed and there are still those concerns about whether lack of blinding might affect how good the measurements are. Also observe that the original challenge design is implicitly a cross-over design and is in an important sense also controlled for alternative treatments, because these were people who had been treated with medications by the same cardiologist but the medications didn’t work. One might argue there is no need for a placebo because we have evidence of something even better, comparison with a medication known to work for blood pressure control.
Note also that a cross over design requires the ability to see what happens if you don’t treat someone. For example, consider a new chemotherapy drug where we are measuring effect by 5 year survival. If we treat someone with the drug and they die much later than expected, we don’t know if was the drug or they were just lucky. We don’t have a do-over where we don’t treat them with the new agent to see of they would be a long term survivor regardless. That’s why the counterfactual is a bit mind bending. It depends on something we can’t ever see directly. We have to infer it by trying to find a “substitute” for the treated patient to see what happens if they aren’t treated.
I was hoping to finish this subject in one gulp, but this post is already too long, so I’ll have to finish the subject of causation in the next installment, where I will connect it up with some comments of Rind’s on the importance of size of the effect, although I will take a different tack.
But that’s for next time. Assuming you are still hanging in there with me.