Randomized trial versus observational study challenge, III: metaphysics

Let me start with an apology. This post is again fairly long (for a blog post). Blog readers don't like long posts (at least I don't). But once I started writing about this I was unable to stop at some intermediary point, although I might have made it more concise and less conversational. I haven't done either. Even worse, I didn't quite finish with the single point I wanted to make, so it will be continued in the next post. Hence the apology. Now to recap a bit and then get down to business.

My "challenge" from 10 days ago has drawn quite a response: over 40 quite substantive comments on the two posts (not counting mine) and a long and substantive post on the issues I raised from David Rind on his blog, Evidence in Medicine. It has also challenged me, since you have all raised many issues that deserve comment. Since I can't comment on them all, I am going to pick certain themes that appear to me as more fundamental or that implicitly underlie many of the concerns you have raised. I'm going to have to do multiple posts, so if you don't see your issue addressed, even obliquely in this post, it may very well appear in a subsequent one. If you are coming in in the middle of this and want to catch up, you can go back to the two previous posts (here and here) and Rind's observations (here). But if you don't want to do that or want a reminder, here's the basic idea.

In my challenge I described a fictitious inquiry made by a cardiologist who had reason to believe, on the basis of emerging scientific literature, that an acceptably safe and FDA approved drug for epilepsy might be useful in treating refractory hypertensives (people with high blood pressure who do not respond to any of the conventional treatments). The science suggested this would work for a certain subset of refractory hypertensives, so she drew up a list of criteria obtainable from her practice's medical records and tried it by giving the drug to patients who met the criteria, measuring their blood pressure at baseline and after a month or so of taking the medication. There was a statistically and clinically relevant drop in blood pressure in her convenience sample of 29 patients. If you read the original post you will find more details of how this fictitious cardiologist went about making information collection systematic, reliable and relatively unaffected by the fact that this was an unblinded trial -- both the blood pressure measurer and the patients knew they were being treated. There was no placebo group. My challenge was for you to say whether it was reasonable for the cardiologist to alter her practice on the basis of this single arm, non blinded, uncontrolled, non randomized small trial of a convenience sample. I raised the stakes by saying she later published this clinical trial in a peer reviewed medical journal and hence it might also affect the practices of others.

I had two motives for this particular construction, one of which I will keep for later. The other was to use the polar opposite of what is considered the gold standard for medical evidence -- a double-blind randomized controlled trial (RCT). The question was not whether an RCT was the best form of evidence (something that seems to be assumed by many people although it isn't always true). It was whether you could rely at all on anything from an inquiry which had none of these characteristics.

My first move in this game will probably be unwelcome to some (most?) readers. I intend to make a tiny foray into the field known as the philosophy of science. I hope it will be interesting and not to onerous. A disclaimer: I am not a philosopher of science (although I confess to have given a couple of papers in venues inhabited by this academic species). To the extent that it informs my practice (and it does), it is always from the perspective of a scientist. But there are some philosophical questions that should not be avoided, if only because they are always there and avoiding them actually commits you to one or another view, sometimes multiple and incompatible ones. The philosophical question I want to raise is fundamental to science and the challenge itself, the nature of causation. There is a big philosophical literature on this, only a fraction of which I am familiar with, but much of it is so abstract and technical I can't relate to it as a practicing scientist. So instead I will reframe the questions in the terms we meet them everyday in the clinic or laboratory or daily life.

First let me anchor the question in the challenge example. The underlying question is whether this anti-epilepsy drug is "doing anything" to lower blood pressure (we will have to deal later with the ultimate objective, helping the patient). At first blush, it appears to do something, but many of your comments and critiques give reasons for why what seems to be true (after patients took the drug their blood pressure went down) may not in fact have anything to do with the drug. This is a causation question. Did the drug cause the decrease in blood pressure or is there some other explanation? What do we mean by this question?

Let me pause for a moment to call your attention to something that is seldom mentioned. What philosophers call "causal necessity" is a very weird animal. We are used to hearing that epidemiology can only show association, not causation. This always infuriates me, because while not false, it is true of every empirical science. Whether it is chemistry, physics, molecular biology or whatever, all we see is associations. The correlations or associations we see in the field, clinic or lab don't come with labels on them that say "causation." We have to infer, i.e., make a judgment, that certain associations are of a particular kind we like to call "causal." The philosophy of science, and more specifically what we call "scientific method," is about how we make those judgments and what warrants making them. Said another way, scientific method is about how to trick Nature into revealing to us which associations are causal and which aren't.

There is both a way in which the meaning of causality seems obvious and a way that it isn't obvious and we are able to see them both simultaneously in the form of a "yes-but" response. One of the (many) problems with the notion of "cause" is that the word has multiple meanings and connotations and we often don't keep them separate. Consider this from a famous philosopher of the law, HLA Hart (in work with Tony Honoré):

Human beings have learnt, by making appropriate movements of their bodies, to bring about desired alterations in objects, animate or inanimate, in their environment, and to express these simple achievements by transitive verbs like push, pull, bend, twist, break, injure. The process involved here consists of an initial immediate bodily manipulation of the thing affected and often takes little time. (Hart HLA and Honoré A, Causation in the Law, Oxford University Press, 1959; h/t DO for cite)

While this is a very intuitive notion of causation it isn't sufficient (or necessary) for scientific investigation. But it is often lurking in the background and carries with it both connotations of "agency" (an agent that does something) and responsibility (what got done is due to something). Assuming that cause includes natural laws and not just human agency, it also raises the question of whether every event has a cause. Philosophers and scientists have tried to sharpen the notion of causation for better application at least since Hume's time (18th century and the "scientific revolution") and arguably before. I'll just add two other versions, since they are the ones that we scientists seem to use most as our unspoken mental models (if we have a mental model; a lot of science is done by rote, unfortunately). Both of these versions are directly applicable to the example. Unfortunately when you lift up the hood to see what's inside, you immediately see deep problems, a realization which makes most scientists quickly slam the hood down again. Nothing to see here. Move along. We choose to assume there are or have been experts who know how that mechanism works and we don't want to be bothered with metaphysical nitpicking. But I want to bother you with it, at least a little.

So consider these to typical scientist models of causation:

Version I: A causes B if, all things otherwise the same, a change in A brings about a change in B.
Version II: A causes B if, all things otherwise the same, but for A, B wouldn't happen.

Version I. is the classic form of an experiment. Version II. is often the way we think about observational studies, although it also applies to experiments. In either case we have the problem of "all things otherwise the same." Version II. also has the counterfactual problem: it relies for its content on an event that didn't happen (that A didn't occur and therefore B didn't occur but everythng else was the same). Neither version requires placebo controls, randomization or blinded measurement, except insofar as they try to deal with either the association itself, "all things otherwise the same" or the counterfactual condition.

Note that there are two comparisons here: one is the whether A and B are associated (move together); the other is related to whether "all things are otherwise the same" (where the opposite of "the same" is "different," that is, a comparison between two states of affairs). Placebo control is more related to the A and B association, especially in the sense of Version I., as is blinding (it relates to the validity of measurement). Randomization is more related to the second association (are all things otherwise the same). However there are interacting influences, so this isn't a hard and fast distinction.

The most mind bending problem here for most people (once they think about it) is the counterfactual condition. Suppose patient A has refractory hypertension. I give him the anti-epilepsy drug. There is no meaningful change in blood pressure. What are the possibilities? One is that the drug has no effect on this patient. Another is that something else raised his blood pressure so the effect of the drug was masked (the something else could be measurement error or some external real effect). On the other hand, suppose his blood pressure changes. This could be because he is responsive to the drug (the drug "works") or because of some other factor (which our commenters have been quite ingenious in positing). In either case we have the possibility of a causal effect or a non-causal one. How do we decide?

The two favored techniques by commenters are placebo control or a cross-over design. A placebo control doesn't completely address the "all things otherwise the same" criterion of an experiment because there is one other thing that is extremely different: the placebo is given to a different person. A randomized placebo controlled trial doesn't get around this. Instead of giving it to one different person it just gives it to a whole group of different people. Randomization doesn't make things otherwise the same, either. It just makes the two groups approximately the same on average. But the counterfactual or "otherwise the same" criterion isn't about groups, it is about individuals. That's why a persistent concern about RCTs in general and meta-analyses in particular is that they might hide responsive or susceptible groups in the average.

A cross-over design is one where the person stays the same but the treatment changes. In this case we might wait a month or two for drug's effect to dissipate and then see if the patient's blood pressure goes back up. This has the advantage that it is the same person so more "other" things are the same. But of course over the span of time other things might have changed and there are still those concerns about whether lack of blinding might affect how good the measurements are. Also observe that the original challenge design is implicitly a cross-over design and is in an important sense also controlled for alternative treatments, because these were people who had been treated with medications by the same cardiologist but the medications didn't work. One might argue there is no need for a placebo because we have evidence of something even better, comparison with a medication known to work for blood pressure control.

Note also that a cross over design requires the ability to see what happens if you don't treat someone. For example, consider a new chemotherapy drug where we are measuring effect by 5 year survival. If we treat someone with the drug and they die much later than expected, we don't know if was the drug or they were just lucky. We don't have a do-over where we don't treat them with the new agent to see of they would be a long term survivor regardless. That's why the counterfactual is a bit mind bending. It depends on something we can't ever see directly. We have to infer it by trying to find a "substitute" for the treated patient to see what happens if they aren't treated.

I was hoping to finish this subject in one gulp, but this post is already too long, so I'll have to finish the subject of causation in the next installment, where I will connect it up with some comments of Rind's on the importance of size of the effect, although I will take a different tack.

But that's for next time. Assuming you are still hanging in there with me.


More like this

I've noticed that whenever I have the temerity to suggest (e.g., here and here) that maybe the word of the Cochrane Collaboration isn't quite the "last word" on the subject and indeed might be seriously flowed, I hear from commenters and see on other sites quelle horreur reactions and implications…
[Previous installments: here, here, here, here] We'd like to continue this series on randomized versus observational studies by discussing randomization, but upon reviewing comments and our previous post we decided to come at it from a slightly different direction. So we want to circle back and…
[Previous installments: here, here, here, here, here, here] Last installment was the first examination of what "randomized" means in a randomized controlled trial (RCT). We finish up here by calling attention to what randomization does and doesn't do and under what circumstances. The notion of…
Continuing our discussion of causation and what it might mean (this is still a controverted question in philosophy and should be in science), let me address an issue brought up by David Rind in his discussion of our challenge. He discussed three cases where a rational person wouldn't wait for an…

One seemingly trite comment but with actual significance: In truth it is better referred to not as "The philosophy of science" but as "Science: the philosophy". Science is a branch of philosophy and in truth traces its origins back even farther. What IS the ontogeny of ontology? Part of human nature is our well developed (sometimes over-developed) desire to detect agency, and thereby to predict and perhaps even control future events. To tie in with one of your other favorite subjects, that was one of the first functions of religion: to create stories of agency and to then hope to influence those agents (with sacrifices, prayers, etc.). Science's big improvement as an epistemology was to not rely on revealed truth for those conclusions but to instead make testable hypotheses and to then modify our beliefs about agency based on the results. Science recognizes that the truth may be out there but that we are not in possession of it; we instead can only possess models that by that hypothesis generation/test/new hypothesis process increasingly resemble the truth and thereby give us better control over future events.

ecologist, great cartoon!

Again sounding the theme of variation being the phenomenon that motivates the disciplines of biostatistics and epidemiology, I would add Version III of causation: A is inferred to cause B if a change in the frequency distribution of A is associated with a change in the frequency distribution of B, when other measured variables are adjusted for and competing explanations of the association appear to be forced and implausible.

Hence the need for numbers of patients sufficient to be able to compare frequency distributions; the need for large numbers depends on the variability of the phenomena being studied. Two rabies cures make medical history; changes in blood pressure measurements for refractory HTN require larger numbers, and more comparisons.

By Ed Whitney (not verified) on 08 Jan 2010 #permalink

Ed: This is an interesting point. The idea that "causation" means that my risk went up or down as a result is pretty familiar to statisticians and epidemiologists, but whether it accords with most people's notion of causality is less clear (once they thik hard about it). Granted, if you tell me exposure to a chemical increased my risk, I want that chemical gone because I'm afraid it will "cause" me to get a disease. But as an individual either I get it or I don't. If I don't, there are still the two possibilities: I am immune or I am lucky. And you can't tell them apart. If you are a clinician you want to know if the drug works for this patient or not and you can't do that test. So you try it, but the results of trying it don't tell you what you want to know because the two possibilities remain. If there is corroborating evidence from other people you might be more likely to say that a positive result was real and not luck, and that will be some of the theme in the follow up post.

I agree that the questions about causation for specific results in specific subjects are quite interesting to discuss. But I'm not sure how relevant they are to the original challenge. I think it depends on what we're really interested in.

If are main question is whether the AE drug lowered BP in those specific patients, then the meaning of causation, the inability to observe the counterfactual, etc., are highly relevant. IMO, you probably can't ever 'know' whether the drug lowered BP in any of those patients.

But for most of us, the real question is not about causation in those specific patients during that specific study. The question is whether the results of that study make it reasonable to expect that future administrations of this drug will correlate with lowered BP (or other endpoints).

I admit I'm relatively naive about philosophy, but I think that may sidestep at least some of these philosophical issues around causation.

Ed makes a great point. Counterfactuals as they usually are presented (i.e. version I or II) don't account for random chance. In other words, if all things are equal, the outcome for the same individual may still change just by chance. Version 3 is an improvement by looking at populations and averages, which maps nicely to how we do observational studies. Talking about causation on a population level with counterfactuals version III makes sense. Talking about counterfactuals using individuals and events that occur in the counterfactual world, not the real world, with all things ths same except for A -- it's all a little Santa Claus to me.

Well-reasoned and well-written.

I appreciate your efforts to dig out the kernel of what epidemiology is about. Epi can seem jargon-heavy and complicated to an outsider or even a new student (e.g. I remember getting frequently tangled up with the question "Is this an example of confounding or effect modification?..."). Ultimately, though, epidemiology is about logic - as you so clearly illustrate in this post. Which is why logical non-scientists, non-physicians and non-epidemiologists have valuable things to add to the post. And I think, as you also cover here, epi gets at the heart of the "scientific method" as well as any scientific discipline. It forces you to think through the logic of how we know things - which can be lost when a student is trying to remember all of the facts and mechanisms and structures that science has illuminated over the years. Your current series is a great reminder.

I'm still hanging in there with you, revere. Intently.

Alex: Counterfactuals do take into account variation. Suppose I get a chemotherapy drug and am cured. Was it the drug or was I someone who would have been survived anyway? It doesn't matter whether my surviving was just "luck" (I had a 50-50 chance and it came up heads) or I was going to survive with probability one ("immune"). Likewise if I die from the disease was it that I was "doomed" (nothing would have made a difference) or just "unlucky" (I wasn't in the half that the drug worked for by chance)?

This may be a digression, but I am finding it interesting that in an attempt to get a better grip on the concept of causation the words 'luck' and 'chance' are still being used without any further examination. This is a step away from the question of whether A can be said to cause B, to look at a single event and ask whether it it is 'caused' or 'random'. Isn't science predicated on the assumption that all events are caused by something, ie: there is no such thing as random chance? So when we use the word 'luck' what we really mean is that the ultimate cause is both unclear to us and highly unlikely to occur again under the same circumstances. Thus we say the outcome of a coin toss is 'chance', when in fact it is the result of the conjunction of angles and velocities and masses of coin and air that produce a situation sufficiently complex we are unable to predict the outcome with any useful degree of accuracy. So a better way of looking at 'luck' might be to consider the likelihood of a recurrence of the same set of circumstances that produced this outcome. In the case of a patient's response to a drug, instead of wondering whether the outcome was affected by an unaccounted-for variable or luck, we are really wondering whether the the unaccounted-for variable would have a significant chance of happening again. Designing a study to reduce the effect of random chance is really designing a study that reduces the effect of low-probability variables.

Sorry, I'm in way over my head and probably babbling about things that are irrelevant, but it struck me how the discussion was setting up a dichotomy between cause and chance, when it seems to me it is actually just a continuum on which we draw a line somewhere between things that we consider of significant probability and those of insignificant probability. Whether this helps us understand causation any better I am not sure.

Fred: Yes, you are in way over your head, but so are the rest of us. You raise some deep questions that have confounded centuries of scientists as philosophers. The question of what "random" means is related to the question of what "probability" means and the other issues, a Laplacian determinism that you espouse in the coin flip example, are highly controversial and capable of several different constructions. I hope to talk a bit about randomization in an upcoming post, but suffice to say all our methods depends on the assumption that there are some random variables that, for whatever reason, are not reducible to any defined initial condition (and the existence of sensitive dependence on initial conditions in non-linear systems in the chaotic regime suggests may never be reducible to them). Then there's the challenge to causality produced by the world's most successful scientific theory, quantum mechanics. It's a messy world but a beautiful one.

Just a comment on the differences between parallel versus crossover study designs. Both can be done with placebo control, or with active control. Both can be either blind or unblind. You point correctly out that the parallel design trial compares two groups of patients who are different, but you miss the fact that the crossover design compares patients at different times, whereas the parallel design trial is to some degree concurrent (if for instance the recruitment time is short compared to the treatment duration in-study). So for instance, a flu epidemic might affect the second half of the crossover trial. The crossover trial is susceptible to carry-over effects (which a correctly designed parallel design trial is not prone to), and the crossover trial is susceptible to a unique kind of bias in the event of informative censoring: if patients with a particular characteristic are more likely to discontinue the trial after treatment with one of the study drugs, then the trial results will be biased, sometimes beyond repair. Crossover design trials are not suitable for studying conditions with time-varying severity (unless the sample sizes are very large). Crossover designs are impractical for comparing more than a small number of different treatment options. Finally, the likelihood that a subject will "guess" which treatment arm is which is higher for the crossover design, introducing a significant source of bias.

Of course, the parallel-design study has its drawbacks too, but the way your post is written makes the crossover study seem like an easy win.

David: Your point is correct, although I don't think I missed it:

A cross-over design is one where the person stays the same but the treatment changes. In this case we might wait a month or two for drug's effect to dissipate and then see if the patient's blood pressure goes back up. This has the advantage that it is the same person so more "other" things are the same. But of course over the span of time other things might have changed and there are still those concerns about whether lack of blinding might affect how good the measurements are

There is no free lunch. So the question of which is better here, parallel or cross-over, is a judgment for a particular study. Neither design has an automatic advantage, which was my main point.

A very interesting post, thanks very much for it - and for the others as well. It's a difficult subject which you need a bit of length for. :-)