Randomized trial versus observational study challenge, II: meta-comments

Just a day into the New Year I was feeling feisty and issued a challenge to readers and the Evidence Based Medicine (EBM) blogosphere in general. I asked for a critique of a fictitious uncontrolled, non-randomized non-blinded small scale clinical study. It was truly a fictitious study. I made it up. But I had a template in mind and I intended to go somewhere with the example and I still do. But it will take longer to get there than I anticipated because it has raised a lot of things worth thinking and talking about in the meantime. I was going to wait a week to give people a chance to read and see the original, but you responded so quickly there is already plenty to talk about.

Before I start with this installment, let me be clearer why I am doing this. At the outset (the first few key taps) the impetus was irritation over the way certain kinds of studies were automatically anointed with credibility if they had the magic words "placebo controlled" or "randomized" sprinkled over them; irritation over a failure to appreciate the value of other kinds of credible scientific evidence that weren't seasoned with the magic words; and most importantly, I was faced with a choice whether to write a post I could do off the top of my head or work on my grant proposal. It was, as they say, a no-brainer. And no-brains is not a way to write a good grant proposal (although I've reviewed many that show abundant evidence they were written that way). But no-brains for a blog post can still work. Evidence: we've been up and running and quite successful for more than 5 years.

However there was -- and is -- an even more important motive. Over a long academic career I've sat on many a committee where a final written product was expected and I've observed that people fall into two main camps. Those that can't write a word until they have a detailed outline; and those that couldn't make a detailed outline until they saw what they had written. For better or worse I am in the latter category, like the writer of fiction who doesn't know what is going to happen to her characters until she reads it. So this exercise is also a challenge to myself -- I hope with your help -- to think through this problem afresh by writing about it here. I hope when we are all done it won't be just me who has a better idea of what this is all about, even if we disagree.

I want to start off the deconstruction of my challenge example with some meta-comments, i.e., comments about the comments (if you want to see the details of the challenge -- there aren't that many, just a couple of paragraphs, but too many to repeat each time -- go to the original or David Rind's excellent summary, here.) Rind has written an extremely interesting post of his own on our challenge at his blog, Evidence in Medicine, which I recommend highly. The first point he makes there is this:

I glanced at the comments on the blog, and interestingly a lot of people are spending time trying to apply a name to what exactly this study is (i.e. "case series", "observational", and the like). I often don't find naming things all that helpful unless everyone has a clear understanding of the naming convention, but this would most commonly either be called an observational study (if we view that the therapy was going to be administered independent of the interest in collecting data about what happened), or an uncontrolled clinical trial (if we view that the primary goal was to find out if the AED works for refractory hypertension).

I think this is an extremely astute observation and it is part of my motive for the example. The desire to name study designs is common and serves a useful purpose for those of us who work with them a lot. It's an organizing device that mentally allows us to go directly to certain issues or modes of analysis without having to go through a long inventory of concerns. If you speak the language, names and phrases carry along with them context and experience. But if you aren't quite so fluent, it can be dangerous, and I deliberately chose an extreme -- an unblinded, uncontrolled non-randomized study of a small convenience sample -- that had none of the magic words and all of the things students and non-specialists are told to watch out for and avoid because they will invalidate the results. I had another end in view, too, but I will keep that to myself for the moment. I agree with Rind that in this context it isn't very helpful. More importantly, it obscures what I want to talk about, the logic of the inquiry and the grounds that might or might not warrant paying attention to it. I want to get to them, with you. But since this post is already long, I will confine it to those meta-comments I mentioned earlier, meta-comments that are in addition to or extend Rind's.

1. I confess I deliberately seduced you into considering this a scientific study by saying it was subsequently published in a peer reviewed journal. But I gave you no reason to believe the investigator was a scientist or that he or she was doing science. I am a physician and a scientist, but most physicians aren't scientists and most scientists aren't physicians. In this particular case what I described is what most of us hope our own doctors are doing: learning from experience and using it -- appropriately -- to help patients. Most good doctors do this. They read something in the scientific literature (while not scientists, they, like most of you, are consumers of the results of science), make some inferences or deductions and incorporate all this, provisionally, into their practice. Now EBM encourages doctors to rely on sound science, it is true, but doctors are above all else, problem solvers. If they have a patient whose blood pressure won't respond to the usual therapies, they will cast about for something that works. If this weren't so, medicine could be practiced by rote, by a checklist. The only difference here is that this doctor thought it through carefully ahead of time and kept special records so that he or she could harvest the results of experience better. At some point they also thought of sharing it with others.

So here's another question for you: Does status as "a study" make any difference? Does the fact that the doctor was not a scientist or necessarily doing "science" in the way many of us understand it affect in any way how reliable the information is or whether she should be using her experience in the treatment of her patients?

2. Another meta-comment. It is interesting how many of you started your comments by saying, "I'm not a doctor . . ." or "I'm not a scientist . . . ," or even, "I'm not an epidemiologist," the implication being that your comments were not as well grounded as they would be if you were. Depending upon what you were opining about that may or may not be true. In this particular instance I think it's not so relevant. The great mathematical physicist John von Neumann once said. "You never understand mathematics, you just get used to it." While many of us would give our eye teeth to be as used to mathematics as von Neumann was, there is a virtue in trying to understand why we do certain things. Many of us in "the business" have long stopped trying to see things from a naive viewpoint. We are used to it and we substitute that familiarity for understanding. If we ever dare to look under the hood -- I mean really look under the hood -- we see problems that are so deep and daunting we put them in the category of settle issues (settled by others smarter than we are) or time-wasting meta-physical exercises of no practical use. The uninitiated, however, have not yet been blinded by the intensity of the difficulties and will ask naive but quite interesting questions. It is our intention, and we think it a good one, to discuss some of those things (next stop will be causality, but in the context of this example and Rinds's post). Because those questions go to the heart of the challenge and make it much more interesting than a class exercise in critiquing a study.

That's our new good intention, anyway. Of course we all know where the road paved with good intentions leads. We'll take the next steps shortly (maybe a couple of days). Meanwhile we are eager to hear from you on the subject, should you be so moved.

More like this

Another meta-comment I would add is how wedded practicing physicians are to "clinically relevant" information. When I was in medical school, the professors in the clinical years (third and fourth, primarily) drummed into the students the importance of findings "in the clinic," and their own observations of what mattered (or didn't) in the treatment of patients. In my more recent experience of teaching first year medical students about public health, I was struck by how deeply this is still engrained in medical education. The students are loath to consider topics being taught that won't have direct relevance to how they will eventually treat their patients. This is admirable, on one hand, but it often extraordinarily narrows their scope of interest. It's why your example of the astute clinician who wants to observe what happens to patients in his or her practice seems plausible.
The alternative refrain in medical schools these days is "evidence-based medicine," which means randomized controlled (clinical) trials. This is really the only scientific approach that has value to most physicians, especially academic ones such as those based at medical schools. It privileges one particular study design over all others and is why the Cochrane Collaborative is placed on such a pedestal. As those of us in public health know, it also has permeated the language, if not the practice, of our field. This, in spite of the fact that RCTs of interventions at the population level are often not possible or would even be unethical. I suspect that this is where you are eventually going, so I'll stop my comment for now and will weigh in later.

By Sam Dawes (not verified) on 05 Jan 2010 #permalink

the trouble with "clinical data" is that they are based on settings that have their own quirky clientele and associated base rates. The trouble with evidence based medicine is that there are always qualifiers such as patient history, genetics, etc. which are not circumstances that are readily addressed by RCT designs or many reviews.

Two comments:

The first is about EBM and Guidelines. I would disagree with the characterization that EBM means only RCTs. Such is NOT the case. It is however supposed to mean considering the quality of the evidence when drawing conclusions and ranking its value.

I would also like to point out two different sorts of pitfalls that exisit. The first is as revere bemoans: narrowing an EBM review to only RCTs leaves out other lines of evidence that may, in aggregate, be meaningful. The other however is that some so-called EBM guidelines have no pre-established critera for what qualifies a study for inclusion and end up selecting those studies that support the pre-existing biases of the reviewers and excluding some of those that support a different conclusion. Many of these so-called EBM based guidelines end up being opinion pieces in disguise. The rigid critera approach as exemplified by Cochrane avoids that, albeit at another cost.

Finally there was a comment in the first thread by someone who described him or herself as a Bayseian - more likely to believe a set of result that he or she expected aahead of time as it had a higher a priori probability. The downside of that is the fallacy of the expectation bias.

Balancing the greater likelihood that something that fits our understanding has to be true vs that fallacy; balancing the biases that open reviews to "all" (which ends up to be a selected set) data have vs the exclusion of potentially important evidence introduced by narrow inclusion criteria are difficult to hit. How to do that, if such can be done, is worthy of discussion.

Sam: Thanks for the trenchant observations about medical education and much else. The question of "where I am going" is an interesting one, because as a result of comments like yours I may never get there as they raise important issues along the way. One you raise implicitly is certain modes of thought that are accepted without question as a result of education. There is no question about them because alternatives aren't presented. Some of them are relatively constant and are related to the social context in which medicine is practiced, with the doc (formerly) at the top of the food chain and therefore making judgments which are privileged because they relate to their practice and they are the boss. But as the social structure of medicine changes, other considerations start to creep in. RCTs are part of an alternative and exterior form of control. EBM tries to square the circle of physician autonomy and external constraint (e.g., cost or coverage control), an idea that hadn't occurred to me before your observations. So we'll have to see where this goes. As Yogi Berra or someone once said, "If you don't know where you are going, it doesn't matter how you get there."

The example of rabies is most pertinent, because it underscores the whole reason that RCTs are needed: variation. It is the variation within populations that motivates the entire enterprise called biostatistics. The more variable the condition, the more the need to take that variability into account. If people were all the same, there would be no need for large controlled clinical trials. Rabies, with practically no variability, would require only a couple of cured cases to entitle the discoverer to a trip to Stockholm.

Therefore, the question to be considered is this: how variable is refractory hypertension? For any condition X, the need to account for the play of chance grows as X increases in variability. For most conditions, X shows considerable variation in human populations.

Whether an effect size is great enough to get excited about depends not only on the variability, but on the clinician's judgment about the change in patient prognosis that can be expected from the change in a measured variable. Making these dependencies explicit can help to focus our attention on how to decide whether this novel treatment for refractory HTN.

By Ed Whitney (not verified) on 05 Jan 2010 #permalink

Does status as "a study" make any difference? Does the fact that the doctor was not a scientist or necessarily doing "science" in the way many of us understand it affect in any way how reliable the information is or whether she should be using her experience in the treatment of her patients?

Based on the original description, it clearly was a study, and the doctor was doing science. (Whether the doctor should be considered a scientist is irrelevant, IMO.) He had a hypothesis with a supporting rationale, a treatment plan, some inclusion/exclusion criteria, a defined endpoint, etc. We can argue about how good a study it was, or whether the doctor was doing "good" science, but I don't think there's any question that it was a scientific study.

More generally, I'd say it does matter whether something is truly a "study." In this context, a think a study is an organized and planned attempt to answer a question. It implies at least a minimal adherence to the scientific method. In contrast, suppose our hypothetical doctor told us the following. "A couple of years ago, based on some animal model and pharmacology data, I started giving this AED to patients with refractory hypertension. Since then, I've treated 29 patients with this AED, and I'm convinced it works." That's not a study; that's an anecdote.

Finally, I think calling something a study also matters. If you tell me something was a study, I'll probably assume it was some kind of organized research effort. I won't necessarily assume it was an RCT, but I'm likely to think there was at least some effort to control bias, to make rigorous observations, etc. After all, it's a study, right?

The problem, of course, is that my assumption may be wrong. If you call it a study, I may give it more benefit of the doubt than it deserves.

Nice exercise, by the way. My general opinion on the original thought experiment was that the evidence was suggestive, but whether it was sufficient for a reasonable practitioner to start using the AED depended much more on clinical factors. For example: Is short term BP drop a reliable surrogate marker for clinical benefit? How severe is refractory hypertension? What are the known side effects and drug interactions of the AED? &tc.

What about this perspective: If several high quality RCT's were done with the same target population, same intervention, same primary and second outcomes, etc...would a clinician or researcher, for that matter, be compelled to investigate the same research question via a case series (all else being equal)? Now, turn it around. If there were a plethora of case series on a subject, would researchers be compelled to conduct a high quality RCT on the research question, if available?

@shark

I think this comes back to Rind's comments about magnitude of effect (where magnitude probably includes both the size of the effect and it's clinical importance). If there were a plethora of case series showing 80% survival after 5y in patients with confirmed late stage pancreatic cancer, that would be pretty compelling without an RCT. In fact, an RCT would likely be unethical in that case.

If multiple case series in the same patients showed only modest but (apparently) reproducible impact on median survival (e.g. 7 months instead of 6), and RCT would still be essential.

qetzal: Tks for the response. The reality is that most situations fall somewhere in the middle of those two scenarios. Bariatric surgery for the reduction of weight and resolution of comorbid conditions is a good example of the tremendous improvements reported via case series (for years!!). Medical guidelines/insurance criteria did not change based on the multitude of case series data and those results were nothing short of spectacular and very consistent. I think we ought to also consider the contribution $$ politics $$ may play. Some diseases and their chronic management are more lucrative than others and anything especially novel (cheap/easy/non-pharmacologic/curative) that shows extraordinary promise will probably be held to the highest of standard of evidence.

EBM tends to be stacked in favor of (1) common conditions and (2) conditions with lucrative interventions. The thing these have in common is the prospect of being able to recruit enough patients to make a large RCT possible. Rare conditions are less likely to meet strict EBM standards just because it may be hard to recruit the requisite number of patients to have an adequately powered study. Lucrative interventions are more likely to be sufficiently funded to recruit and retain enough patients to make the effect of an intervention measured with precision.

The need for large numbers stems from the variability mentioned above. Human populations in community living situations are enormously variable.

And as they (Gregory Bateson?) used to say, an inbred laboratory rat, under sufficiently controlled experimental conditions, will do as it damn well pleases.

By Ed Whitney (not verified) on 05 Jan 2010 #permalink

shark,

I agree. The extremes are pretty easy. The hard part is deciding what's best when things are a bit more equivocal.

For example, were MDs wrong to recommend HRT for menopausal women prior to the recently completed large RCTs? Sure, the latest data shows that HRT was not actually the right treatment for most of those women(*). But the MDs at the time didn't have access to that data. So, did they make the best decision they could have under the circumstances, and it just turned out to be wrong, or did they make a bad decision? I don't know.

(*)I know there's still at least some controversy about HRT; I'm just using it as an example.

I wanted to respond to the other post, but time and other duties kept me away from posting relevant comments as the discussion progressed.

In my occasional contacts with med students at my academic institution, I have noticed a lack of understanding the strengths and weaknesses of various types of studies and when to use what type. That is more properly knowledge in the realm of PhDs. While not necessary to the practice of medicine, it is necessary in the pursuit of good data that will lead to a faster and more precise diagnosis and a more effective intervention and treatment.

By MS, MT(ASCP) (not verified) on 05 Jan 2010 #permalink

The other issue of course is: is your patient population comparable to mine?

Specialists by definition have a selected population. I'm a primary care doc, so my patient profile will certainly be different from that of a cardiologist. So the design of the study is not the only issue.

The study is a great starting point, but if said cardiologist or the medical resident has a stake in the success of the medication for this indication (think Wakefield, or even the Zamboni MS treatment), this design has the potential to be - even accidentally - quite biased. There's a short follow-up. There's the possibility of a change in the physician's practices in other ways. A control group could have really helped evaluate this.

I'm in agreement with the folks above who have said it *could* lead to a trial of said AED for refractory hypertension once all other avenues have been investigated, but overall would not change my practise. It certainly wouldn't put the AED to the front line. But then, I'm a late adopter and relatively conservative with medications.

The RCT does have a privileged position insofar as it has been one way to eliminate some of the more obvious biases, which makes it easier for the clinicians to take them at face value (although the recent trend of providing results per 1000 patient-years makes me pull my hair out).

I actually dislike meta-analyses - based on the garbage in: garbage out theory - combining a group of studies with unrecognised biases and calling it a higher level of evidence makes me uncomfortable.

By redrabbitslife (not verified) on 05 Jan 2010 #permalink

Terrific discussion. I wanted to suggest a real time example of how these study definitions can have an impact on how data are used to define evidence based practices. This week JAMA published a study on the use of antidepressants in mild, moderate, severe and very severe depression. The study sorted through over 2100 antidepressant trials and selected 6 (placebo controlled) studies. Through some meta-analysis it concluded (and the NY Times picked this up immediately) that antidepressants don't work for any but the very severe cases. (I am paraphrasing). I guarantee you this will effect practice immediately because of the MSM publicity. Still, it is only six studies out of over 2100, while well designed in the standard sense it may contradict a whole bunch of other less well designed studies with many more subjects. And it may lead to substantial changes in practice but we shall see. Whether that is good or not remains to be seen. I feel a little duped, as a psychiatrist, by someone, not sure if it is my profession, Cochrane database or big pharma or whom. The recommendations have gone full circle in 20 years.

I think you question is much more important that I first realized. Thank you.

By Dr. Denise (not verified) on 05 Jan 2010 #permalink