I’m an anthropologist and a biologist, so really, I have no problem with the idea of a “placebo effect” in which people become convinced that they are being given an effective treatment and thus, because of that thought in their mind, improve.
Editorial Note: I’m classifying this post as a “falsehood” because it does fit nicely with that series, though this was not on the original list of falsehoods. Also note, this is a hastily dashed off first draft, so please be ruthless in your comments so I may move ever towards the unattainable perfection.
I doubt very much that this works for many ailments. The placebo effect will not cure a broken arm, for instance. And, I also believe that it must be understood in a rational, empirical, scientific manner. The power of positive thinking … the secular version of the “prayer” effect … is either based on a marshaling of the person’s immune system, or some other physiological thing that really happens, or it not real.
How does me being an anthropologist cause me to have no problem with the possibility of a placebo effect? Because as an anthropologist I have studied things, or sometimes just seen things, that I know would confound a western doctor, at least a little. I have observed people enter trances because of song or dance alone. I have observed catalepsy or something similar with no drugs and, I think, no prior or inherited condition or injury. Just plain old shamanism. On one occasion I observed an infant of about two or three months age do nothing but cry for four or five days, until his father treated both him and his mother with the same treatment: a cutting of the skin in dozens of places followed by the rubbing in of a mixture of leaves that had all been burned. I assume the burned leaves were essentially (semi-sterilized) carbon. I guess that the cuts caused the baby to stop crying because its HPAC hormonal axis was overtaxed and he essentially passed out. I suppose that the mother dying a few days later was not caused by the treatment, but was also evidence that the traditional treatment with the cuts and charcoal was not effective over the medium or long term for what must have been mastitis or something like that. The fact that the baby’s mother’s sister simply became the wife of the baby’s father and the (biological) mother of the baby was fascinating. But unrelated to the placebo effect.
I am also interested in the study of variation, and I prefer to look at the so called “Placebo Effect” as part of a system of variation. I think this way of looking at it may make it easier to parse the article recently written by Steve Silberman (“Placebos Are Getting More Effective. Drugmakers Are Desperate to Know Why“) and which is also the subject of discussion on White Coat Underground.
I think Steve’s article is well worth reading, but his piece and much of the other writing I’ve seen on this are somewhat limited by an incorrect understanding of the Placebo Effect. Well, I’m here to explain to you what the Placebo Effect really is. From a particular point of view and using particular analytical tools. You’ll find other writing on this to give a somewhat different perspective, but what I present here, in the form of an examination of Silberman’s article, may be helpful.
This is going to take a while, so I suggest you put on a pot of coffee….
A placebo is a tool used to establish a control in a scientific experiment in which a medical treatment is being tested for both safety and efficacy. The system of control (of which the placebo is only part) is important to determine if some outcome is really the result of the thing that is being tested, which may be a pill, an injection, or some other medical procedure. It is also important to see if side effects that arise during the experiment are likely associated with the thing that is being tested, as opposed to background stuff that happens anyway.
When this is done it is sometimes found that people who receive the placebo treatment have a result that is similar to, but often of a lower magnitude than, the desired effect of the treatment. Some people have incorrectly suggested or assumed that this is the result of something like “positive thinking” (call it what you will) in which the desired effect is obtained not directly from the treatment, but from within the person who takes the placebo (or some spiritual source). That person’s immune system is cranking up, or some other psychological effect is happening (or some spiritual thing is happening).
In a strange somewhat enigmatic large brained species in which individuals can enter a meditative state, sing a mantra for an hour, then suddenly enter a state of epilepsy or catalepsy for several minutes, and so on, one should not be surprised to find such effects (the psychological ones, that is). But the assumption that this “placebo effect” routinely explains what is seen in control groups in medical studies is not on the face valid. There is another explanation that as to be looked at first.
I would divide the control response into two types for the present discussion. All changes in the control group in a medical treatment experiment must fall into one of these two categories, by definition.
1) No-effect outcomes. This consists of random or structural shifts in the measured value that have no link to the desired effect. More on this later.
2) Effect outcomes. This would include an actual shift in the measured variables, one way or another, that result from the psychological (or other) effects of a person thinking they are taking a treatment. Or, in the case of non-drug treatments, these effects may be more subtle. Perhaps opening a person’s hip and digging around for a while has a positive effect on their hip pain, and thus, part of the post-operative relief from a hip-replacement surgery is because of the surgical invasion rather than the new artificial hip joint. That could be psychological or it could be because some powerful molecules rush to the site of the surgery and while there have a beneficial effect on the arthritic joint. For the placebo pills, maybe a tiny dose of sugar has a positive effect, or maybe it’s the gel material the capsule is made out of. Maybe it is the process of drinking more water because you are supposed to take six pills a day with water. It does not matter what you or I think of these possible explanations. My point is simply that if anything like this happens, it is by definition part of this second category of effect outcomes.
In order for the widely touted “placebo” effect to be real, effect type 1, the “no-effect” effect, has to be ruled out first. The problem is, that this no-effect effect can be huge.
This is an oversimplification, but it serves well as a model. Randomly pick any point in time on this graph, and compare it to any later point in time randomly chosen. Over several tries, you’ll find that the second point is higher than the first point half the time, and lower the other half the time. If the y-axis was how sick you feel, and the starting and ending point along the x-axis were the start and end dates of a medical trial, then the default condition is that half the people feel better at the end. The “placebo” effect is 50% even if 100% of the effect is a type 1 “no-effect” outcome. A 50% improvement (and 50% worsening) in the condition is the null model. The average is no change, but the average is not what is being measured by the people who felt better with the sugar pill!
Imagine a scenario where people can enter a trial if they have this disease, and we are trying to see if the drug hastens the end of the period of illness. Everyone in this trial is gong to get better. Everyone in the sugar pill control group will get better. The drug group will all get better. The no-effect outcome is therefore huge.
None of this is surprising to those who do this sort of research. This is all handled statistically. In fact, the placebo group is typically used as the assumed “no effect” baseline.
So that is the starting point for this discussion. Placebos are part of the control system designed to cover our bases on unknown factors that matter, most of which are probably random or non-meaningful (type 1 “no effect” outcomes).
So, now let’s see what Steve Silberman is saying in his article. He is making the claim that the difference between placebo outcomes and treatment outcomes is growing less and less, which makes it harder for drugs that may have been approved years ago to pass muster today. He is specifically saying that some scientists are claiming that the “placebo effect” has grown stronger, though in subsequent clarification he underscores that he does not mean that the sugar pills are getting stronger, but that something somewhat mysterious is happening.
To me, there are two likely explanations for this. One is that we are seeing nothing other than the outcome of re-examining existing research without understanding how variation works. The other is that something is really happening, but in my view the explanation for that is complex and potentially very interesting.
Let’s look at the article in more detail. First, I’m going to critique Steve for, maybe, leading the story with a bit of undue sensationalism. Putting a good lead on your story is a good thing and I don’t fault him for that, but I do think an outcome of this sort of thing is often the framing of the argument in a way that hinders rather than helps further analysis. So let’s take a hard look at some of Steve’s assertions.
The phrase “many test subjects” [had an apparent effect] in the first part of this paragraph …
Behind the scenes, however, MK-869 was starting to unravel. True, many test subjects treated with the medication feltIt’s not only trials of new drugs that are crossing the futility boundary. Some products that have been on the market for decades, like Prozac, are faltering in more recent follow-up tests. In many cases, these are the compounds that, in the late ’90s, made Big Pharma more profitable than Big Oil. But if these same drugs were vetted now, the FDA might not approve some of them. Two comprehensive analyses of antidepressant trials have uncovered a dramatic increase in placebo response since the 1980s. One estimated that the so-called effect size (a measure of statistical significance) in placebo groups had nearly doubled over that time.
It’s not that the old meds are getting weaker, drug developers say. It’s as if the placebo effect is somehow getting stronger. their hopelessness and anxiety lift. But so did nearly the same number who took a placebo …
… was transmogrified into something powerful in this latter part of the same paragraph:
The fact that taking a faux drug can powerfully improve some people’s health–the so-called placebo effect–has long been considered an embarrassment to the serious practice of pharmacology.
In the following paragraph, we see the logically fatal mistake of linking the failure of a drug to exceed the placebo’s ‘performance’ with the failure of the drug to be effective. The underlying assumption in the following words, and that assumption is made of whole cloth, is that the placebo and the effectiveness are in a kind of horse race:
Ultimately, Merck’s foray into the antidepressant market failed. In subsequent tests, MK-869 turned out to be no more effective than a placebo. In the jargon of the industry, the trials crossed the futility boundary. … MK-869 wasn’t the only highly anticipated medical breakthrough to be undone in recent years by the placebo effect.
These phrases I think set up the situation to be ripe for either misunderstanding or confusion. But enough of that, let’s continue on to the meat of the issue.
The following paragraph provides some of the key information on which the article is built:
From 2001 to 2006, the percentage of new products cut from development after Phase II clinical trials, when drugs are first tested against placebo, rose by 20 percent. The failure rate in more extensive Phase III trials increased by 11 percent, mainly due to surprisingly poor showings against placebo.
I’m not going to second guess these numbers, and I’ll assume they are accurate. But I will note that the numbers on which these percentages are based are very small. Percentages can, in this way, be misleading. Also, the clinical trial data itself is from the hidden, propietary laboratories of private pharm companies, or the research they contract out. A small change in the way this information is gathered and reported can easily account for these numbers. (Steve points this out in the article as well.)
Having said that, let’s assume that there is an actual decrease in the performance of trial drugs in relation to placebo, and for now, consider those cases where the drug does not do as well compared to the placebo (rather than the placebo getting better).
Over time, in a given sub area of research, we would expect that once “the” breakthrough is made, there will be big results quick, and over time diminishing returns on research as more and more of the obvious and easier routes are followed. Some time ago there was an argument about how different a new drug had to be to get a new patent. The fact that this is even an issue is a good thing, but what it means is that “new” drugs are better off more than a little different than older drugs, because of the liability of losing your patent if it is not different enough. Between the aging-subarea effect (less easy to get results over time) and the distance effect (biasing towards drugs that are molecularly different in a non-trivial way), perhaps these numbers could be explained.
Let’s take a look at this paragraph:
… Last November, a new type of gene therapy for Parkinson’s disease, championed by the Michael J. Fox Foundation, was abruptly withdrawn from Phase II trials after unexpectedly tanking against placebo. A stem-cell startup called Osiris Therapeutics got a drubbing on Wall Street in March, when it suspended trials of its pill for Crohn’s disease, an intestinal ailment, citing an “unusually high” response to placebo. Two days later, Eli Lilly broke off testing of a much-touted new drug for schizophrenia when volunteers showed double the expected level of placebo response.
Consider that this is research on new materials. If the unexpected never happened, there would be no research. When an engineer specifies a certain way to build a bridge using standard techniques, there are no unexpected results and nobody does new research. Copies of the bridge with variant engineering are not first constructed and tested. You take the engineering knowledge (which may well have arisen in part from prior experiment, but that is of no consequence to the present argument) and you build the bridge.
When building new drugs, you experiment. Experiments result in the unexpected. So, if we have knowledge of dozens of experiments over a given period of time, some of them are going to have results that are less than expected, some close to expected, some more than expected. Citing the cases that ended up on one end or the other of this spectrum of variable outcomes represents a poor understanding of the system, or an effort to make the argument at the expense of rationality.
Also note that the Parkinson’s drug was withdrawn from a trial after tanking. Well, if it was withdrawn prematurely, then how do we know it tanked? Putting a finer point on this… we are trying to draw scientific conclusions in a murky area form data that are self-destructing before we can use them. This reflects the interplay between decision making for corporate or economic purposes vs. scientific purposes.
I would guess that there is another element to this as well: If you are the corporate entity making decisions and leading investors, you will use phrases like “Holy crap, no one could have expected THAT failure” all the time (translated into the appropriate language for marketing investment opportunities, of course).
But wait, maybe there is something here. Have a look at this part of the article:
…Some products …, like Prozac, are faltering in more recent follow-up tests. … … if [previously tested drugs] vetted now, the FDA might not approve some of them. Two comprehensive analyses of antidepressant trials have uncovered a dramatic increase in placebo response since the 1980s. One estimated that the so-called effect size (a measure of statistical significance) in placebo groups had nearly doubled over that time.
It’s not that the old meds are getting weaker, drug developers say. It’s as if the placebo effect is somehow getting stronger.
Almost all the points I make above could apply here. Take all the drugs tested between, say, 1985 and 1995. Retest them with the same exact tests. Some are going to perform better than expected some less, but the range of expectation should be not too variable and the average outcome should be the same. Same with placebos. The effect of the placebo should be stronger in some cases, weaker in other cases, because there is a random component to the process. Then, you can cite the specific results you like and lull your investors, scare your readers, whatever. How, then, do these data hold up against the null model, not just some cherry picked data?
But maybe there is something here. Maybe this is a case of the difference between the trial drug and the placebo getting closer because the type 2 “effective outcome” part of the placebo effect gets stronger. Consider this: Pharmaceuticals might well have a stronger culture-bound self-affective “placebo effect.” In other words, people with certain kinds of depression (for instance) may feel better from interaction with other humans or other entities in their lives in a positive way. Just doing something about depression may itself help with that depression. Are you going to cure severe depression by paying attention to the patient or by giving a sugar pill while you talk nice? No. Are you going to get an effect of some kind? Probably.
Now, here is where the article may be correct, but I am only conjecturing here. The statistical and numerical objections to the argument I make above have to be cleared first. But it is possible, it seems to me, that a shift in self-healing of depression for non-drug related reasons, which would show up in the placebo measure in a given trial, could be stronger in a world in which we “know” as a culture that you can take a pill for depression vs a world in which we did not yet know or “believe” this to be possible.
I am a bit skeptical about the “then” vs. “now” comparison, however. For one thing, the method of measuring depression that would have been used at the time of the early trials is not, as far as I understand it, the same as the method used now. Somewhere along the line somebody had to translate between the two methods. That may (or may not) be a source of difficulty in drawing firm conclusions from these studies (which I hasten to say I have not read).
Beecher’s prescription helped cure the medical establishment of outright quackery, but it had an insidious side effect. By casting placebo as the villain in RCTs, he ended up stigmatizing one of his most important discoveries. The fact that even dummy capsules can kick-start the body’s recovery engine became a problem for drug developers to overcome, rather than a phenomenon that could guide doctors toward a better understanding of the healing process and how to drive it most effectively.
I am afraid that this paragraph makes the mistake of going from the idea that our experimental method has to account for non-medicinal effects of being in a study to it being established fact through rigorous scientific research that sugar pills affect disease.
Silberman’s description of the metastudy done by Potter and DeBrota supports some of what I say above: The data are often secret and strangely managed, so I’m a bit worried about that. But have a look at this:
Assumption number one was that if a trial were managed correctly, a medication would perform as well or badly in a Phoenix hospital as in a Bangalore clinic. Potter discovered, however, that geographic location alone could determine whether a drug bested placebo or crossed the futility boundary. By the late ’90s, for example, the classic antianxiety drug diazepam (also known as Valium) was still beating placebo in France and Belgium. But when the drug was tested in the US, it was likely to fail. Conversely, Prozac performed better in America than it did in western Europe and South Africa. It was an unsettling prospect: FDA approval could hinge on where the company chose to conduct a trial.
Since by the time this study was done everyone in the US who was also likely to show up for a drug trial was already habituated to diazepam probably explains this result. But putting that (bit of snark) aside, this statement is simply wrong. Well, yes, the scientists may have this wrong because they believe in their own mojo too much. There have been a couple of studies that showed various effects being different based on context (like what lab was doing the work) that surprised people, but still, we should have (and in fact do have, by definition) the same two effects across geographical/cultural/national space as across trials. What is happening her is that our all important variation is being examined, for the first time, across geographical space (like it was earlier examined across time). And we are finding …. variation!
It is not at all surprising that there is something about the context … Germany vs. the US for instance… that actually matters (type 2 “effective outcome”), but there is also random variation variation (type 1 “non-effective outcome”) across space. Think of the first category of explanation as the null model. Only if the variation is greater than that which can be explained by non-meaningful variation can the first category of explanation … that Germans have a stronger self-healing “placebo effect” (self-curing of some kind if you take a sugar pill) than Americans, for instance … be considered. I would not be surprised if Germans had more of that, by the way. But that’s the anthropologist coming out and a little personal history, so we’ll ingore that.
Also, Silberman points this out:
Mistaken assumption number two was that the standard tests used to gauge volunteers’ improvement in trials yielded consistent results. Potter and his colleagues discovered that ratings by trial observers varied significantly from one testing site to another. It was like finding out that the judges in a tight race each had a different idea about the placement of the finish line.
But again, this is not really a mistaken assumption. It is a bad assumption made post hoc, maybe, but it is not possible that people doing earlier studies did not realize that self-report and subjective ranking varied from site to site. This has been known for decades in all areas of medical and psychological research. I think maybe Potter and colleagues are overstating the case in this regard.
We have less of a handle on variation than we would like, especially in reporting by subjects but also in other areas. There’s piles of variation out there. And as long as there is lots of variation, the ends of the distributions of effects are always there to be pointed to and held up as an effect.
Potter and DeBrota’s data-mining also revealed that even superbly managed trials were subject to runaway placebo effects.
Now we are calling “more variation than we were hoping was happening in our science” a runaway placebo effect. By now I think you can easily see why I’m not buying that
I’m not going to touch Benedetti’s research in this essay. I don’t have the resources available to me right now to evaluate this fairly complex set of claims. Ted Kaptchuk’s work also requires a closer look. In both cases, I simply ask: Is the no-effect expectation exceeded?
I think that the difference between treatment and control may be diminishing. Well, if the people doing this research say it is, then it is. I think the reasons for that may be many, including the two I’ve described above: Diminishing returns on like research and changes for certain (but not most) drugs in the context-side of the self-healing side of the placebo effect. I strongly suspect that the first of these causes is real, and is large, while the second of these causes is possible and needs to be demonstrated because it is a little vague and we do not have a good handle on the mechanism. Having biological sense behind your statistical arguments is so critically important, let’s never forget that.
Silberman points out additional difficulties that modern pharm is having with testing, inluding finding people who are not already drugged up in one way or another. He also discusses proposed new research designs. You should look at the article.
And as you read Silberman’s article, please do this. Imagine the whole thing … the descriptions of the studies, the discussions of the “placebo effect” … redone with the idea that there are two separate result-changing outcomes from placebo controls: Those that are artifacts of the process or system that do not in fact actually have any curative properties at all ever (type 1 … “no effect”) and those that do. The first type are strong, ubiquitous, known of, but not fully understood, and the second type are probably rare, weaker, probably often don’t exist, and are very poorly understood. The problem is that many people assume that the whole “placebo” effect is this second type, while it is certainly mainly or wholly the first type.
Oh, and if you are thinking that the first type of effect is still a good thing, then stop right where you are. Get a cup of coffee (not decaf) and start at the beginning of this essay and read it again. Type one effects, by my definition used here, are the effects that are pure statistical outcomes and will not make you feel better ever. The burden of proof is on those who claim the second type of effect. The second type of effect is probably real, sometimes, but most of the times not, and it is poorly understood, and it probably varies enormously across time, space, culture, and so on.
Ironically, Big Pharma’s attempt to dominate the central nervous system has ended up revealing how powerful the brain really is. The placebo response doesn’t care if the catalyst for healing is a triumph of pharmacology, a compassionate therapist, or a syringe of salt water. All it requires is a reasonable expectation of getting better. That’s potent medicine.
Maybe potent. But only a small percentage of what is happening to people taking treatments, and only a small percentage of the cause of the incredible shrining difference between treatment and placebo that may be happening in some areas of research.
More Falsehoods !!!
This post is one of a series on the topic of falsehoods. The following is a list of falsehoods posts in order:
- The Falsehoods
- “False Pearls before Real Swine”
- Falsehood: A baby is not the biological offspring of its adoptive mother
- Falsehoods: Has evolution stopped for humans?
- Natural Selection is Survival Of the Fittest (A Falsehood)
- Falsehood: Nature maintains balance.
- Is it a Falsehood that Humans Evolve from Apes?
- The poor and the dark skinned have more babies than the rich and the light skinned
- Acting for the survival of the species (a falsehood)
- Culture Overrides Biology (Another falsehood)