Surgery: Past performance is no guarantee of future results

Blogging on Peer-Reviewed ResearchIn recent years, there has been a lot of interest in improving surgical outcomes. One strain of research tends to examine the "volume-outcome" relationship, which in essence asked the question if the volume of cases that a surgeon or hospital does has a relationship outcome. In other words, are mortality rates lower, survival rates better, or the correction of symptoms more reliable for a given surgical procedure in the hands of surgeons who do more of them per year or hospitals in which more of them per year are done? On the surface, it would seem self-evident that the answer must be yes, but the situation is not as simple as you may think, and the volume-outcome relationship doesn't hold up for all procedures. Basically, for certain complex procedures, there is a correlation between volume and better outcomes, but it is not always consistent and often has at least as much to do with the volume of the hospital as the surgeon.

One consequence of this emphasis on outcomes is that more and more, databases are being developed and maintained that track mortality and complications by hospital, often mandated by state and local governments. Increasingly, these results are being reported to the public, sometimes even showing up in newspapers as tables of mortality rates for various procedures. If a hospital shows up on these lists with a mortality rate significantly higher than that of surrounding hospitals, the consequences can be severe, ranging from the loss of patients to even investigations. But if you're a patient who needs a certain procedure, how useful are these measures? In other words, how predictive of your outcome is a hospital's mortality rate? Intuitively, it would seem that hospitals that report zero mortality for a procedure in a given year would be a good bet, but is that correct?

Justin B. Dimick and H. Gilbert Welch, in an article hot off the presses in the most recent Journal of the American College of Surgeons, examined just that question, and the answer is not what you would predict, so much so that their article is entitled The Zero Mortality Paradox in Surgery.

This article is short and sweet. It asks and tries to answer a very simple question, which is whether hospitals that have had zero mortality from several index procedures over the preceding three year period have a lower than average mortality in the subsequent year for those same procedures. The rationale was as follows:

Policymakers and health care payers believe public disclosure of provider performance will help patients choose the best hospitals. Beginning in 1990, New York State began publicly reporting risk-adjusted mortality rates for cardiac surgery at every hospital. Pennsylvania followed shortly thereafter, and now several states track and publicly report cardiac surgery mortality. More recently, the Agency for Healthcare and Research Quality proposed the use of mortality rates as quality indicators for a broader group of six other noncardiac operations. The underlying idea is appealing; by choosing a low mortality hospital, patients can improve their chances of surviving their operation.

The problem with this approach, however, is as the authors point out, that in hospitals that have a low volume of a given procedure, it is quite possible that, by the vagaries of chance alone, a hospital that may not be so great could have zero mortality. Unless an operation has a very high mortality in all hands, it is quite possible for a low volume hospital to have no mortalities for a considerable period of time for reasons that have little or nothing to do with the true rate of mortality. Most "high mortality" operations, however, have mortalities under 10%.

The study design was quite straightforward. The investigators obtained the Medicare records necessary to identify all hospitals with zero mortality for several procedures from 1997 to 1999. The procedures examined included coronary artery bypass grafting, elective abdominal aortic aneurysm repair, carotid endarterectomy, colon cancer resection, pulmonary lobectomy, and pancreatic resection. The hospitals had to have preformed at least one during the time periods under study. They then examined the mortality rates in these same hospitals for these same procedures for the year 2000. These mortality rates had to be adjusted for risk factors using a multivariable logistic regression model for various factors, including patient demographics (age, gender, and race), urgency of admission (elective, urgent, emergent), socioeconomic status, comorbid diseases, and socioeconomic status. The results were also just as straightforward. In the year after a three year stretch of no mortality for these procedures, no mortality hospitals had mortality rates that were no different from any other hospital in the following year. Indeed, for pancreatic resection, the results were worse in the subsequent year.

Truly, past performance is no guarantee of future results.

There are several possible explanations for these results highlighted in the discussion. One is that the zero mortality hospitals are truly better, such that their average mortality over the entire period is still better than average. This may be true for coronary artery bypass, for which there was a trend towards lower mortality in the year after the zero mortality period. Another possible explanation was that hospitals with zero mortality were actually average but did lower case volumes for given operations studied, with random chance alone making it more likely that they would go longer periods of time without a mortality. This possibility is supported by the second result of this study, which is that the zero mortality hospitals did far fewer of each procedure than the other hospitals. Indeed, for one operation (colon cancer resection), the zero mortality hospitals did only one quarter as many as the other hospitals. The final possibility is that zero mortality hospitals actually have worse performance, which may actually have been the case for pancreatic resections, for which mortality was higher in the period following the zero mortality period. Finally, it is possible that performance at the zero mortality hospital deteriorated during the time of observation, although such deterioration would be unlikely to happen in enough of the zero mortality hospitals at the same time to account for this result.

Overall, this was a provocative study, but my sense is that it's probably close to correct. Its strengths include consistency across all hospitals over four years and and a known database. However, the Medicare database does, as has been pointed out, include a subset of mainly older patients. Unfortunately, an all-payer database does not always cover the same hospitals from year to year. Another problem is that mortality rates are at best a very rough measure of quality when numbers are low. However, mortality is an unequivocal result that is easy to measure that is recorded in most databases, which is why it's often used.. All of these issues confound the interpretation of the study's results, but not enough to invalidate them. The authors conclude:

Patients and families may reasonably be attracted to a hospital with zero deaths. A reported finding of zero events has a qualitative impact exceeding its quantitative meaning, as others have noted; people tend to focus on numerators and ignore the size of the denominator. Although the problems with small samples are widely known among the statisticians and epidemiologists, they may not be readily apparent to patients. We concluded that patients considering where to have surgery should not choose hospitals just because they have reported mortality rates of zero. Otherwise, they may miss out on the potential benefits of going to a hospital with better performance.

This is exactly right. If you don't know the denominator, just seeing a report of 0% mortality for a procedure is almost meaningless. There's a huge difference between 0% mortality for one procedure done and 0% mortality for two hundred procedures performed (and, I might add, a 0% mortality for 1,000 procedures performed). There's also lot more to determining a patient's risk than just looking at the mortality rates. Choosing a hospital based on a spuriously low mortality rate may actually be riskier than choosing a hospital with a higher mortality rate but that does a much higher number of a given operation per year.

More like this

Orac, it is pesky things like mathematics and statistics that keep half the hospitals below average. No matter how much improvement there is, half of them are always still below average. If ever there was a conspiracy, this must be it!

Could it be some for of "regression to the mean"?

PS: no free fulltext - what a shame!

Martin - it is simply regression towards the mean.

David Spiegelhalter made a similar point a few years ago. He was interested in the usefulness of "league tables" that the UK government was fond of producing. So, he took the mortality records from 12 hospitals and looked at the uncertainty in the ranks with a simple binomial model. Basically, if you're top ranked, you rank high, if you're at the bottom, you stay there. But everyone on the middle could be ranked almost anywhere.

Hmmm. In our BUGS practicals, we could add in the ranking for a prediction for the following year. I must try and remember that.

Bob

Yes, if a hospital is performing badly one year, on average it will do better next year. I work with the same issues in manager assessment.

However, past performance is still probably the "best" guide to future results. I think the important point was a bit more subtle: due to statistical validity measures, a rate without a consideration of sample size is virtually meaningless.

Sounded to me like you want a hospital with lower mortality rates demonstrated year after year.

Then it's not just one super team that all get better jobs and moves on, but the way the hospital manages it's patients.

I'm an avowed space geek, and I've noticed a similar problem with rocket failure rate. The Space Shuttle was lauded for its failure rate of .9% back in 2002 -- a value so low as to constitute statistical noise in many systems. That was 1 failure in 112 launches. After the next launch, STS-107, it doubled to 1.8%. Only with a low flight rate can a single failure affect the stats so much. Meanwhile, the Shuttle has enjoyed a time-based failure rate of only two catastrophic failures in 26 years -- virtually unheard of in rocketry. So which perspective is more significant? (Note: it's debatable whether STS-107 counts as a catastrophic failure of the rocket, since Shuttle uniquely straddles the definitions of "rocket" and "payload". One could legitimately describe it as a non-catastrophic rocket failure leading to catastrophic failure of the payload. This is one of the things that complicates comparisons of Shuttle to other launch vehicles.)

Saturn V had an astonishing 0% rate of catastrophic failure, but only had 12 launches (three of which were unmanned). The Shuttle's catastrophic failure rate was identical after 13 launches, yet most lay rocket geeks will confidently say that the Saturn V was safer and more reliable than Shuttle. Maybe it was, maybe it wasn't, but the sample size is so much smaller that we don't really know. This uncertainty is endemic to orbital rocketry, because the enormous expense prohibits shooting up dozens just to see how well they perform.

The problem is even bigger with surgical statistics, partly because there are so many more variables contributing to mortality, but also because there is even less motivation to do the surgery just to see how well it works. Sure, there are clinical trials, but it can't be equated to the kind of testing that is performed on a rocket. You don't do surgery without more reason than "because". So the sample size may be extremely limited, especially for uncommon surgeries. The public doesn't appreciate that for rocket launches, and they don't appreciate it for surgery either. It's also limited in that one should be interested in more than just mortality rate. Sure, it's nice to know how often people died on the table, but that'll be a very small percentage anyway. What about the number for whom the procedure failed? Or who came away maimed, or with some other non-fatal complication? Mortality rates may mask that, because perhaps a hospital with lower mortality rates is just better at saving the lives of those for whom the surgery was bungled, even though it bungles more surgeries, or is more likely to abort the procedure if things get difficult.

Saturn V's much vaunted 100% success rate is only 100% success if we look at catastrophic failure alone; if non-catastrophic failures are included we find that there were a number of failures to varying degrees. Two flights failed to achieve all objectives, giving a much less impressive failure rate of 17%. The last flight nearly resulted in the loss of Skylab, due to problems in the system that had existed since day one but had never really been analyzed due to the low launch rate. So there's more than just mortality alone to consider.

By Calli Arcale (not verified) on 10 Jan 2008 #permalink

I'd also offer that hospital mortality rate is pretty crude, when, presumably, many doctors are providing care. One bad surgeon, for example, would affect the rates for all the others. On the other hand, a given surgeon's mortality rates --for the reasons already mentioned -- can be very misleading as well. I wrote something recently (before I dried up) about the silliness of the designation "center of excellence." The goal of providing meaningful data for consumers is a worthy one. It's also, at least for now and by any current measure, fiction.

"Intuitively, it would seem that hospitals that report zero mortality for a procedure in a given year would be a good bet, but is that correct?"

This is a pretty stupid article that proposes a dumb idea, and then shows this to be wrong. The only purpose of this kind of nonsense is to mislead people into thinking that rating medical providers is not useful. Common sense would indicate that zero mortality with a small number of procedures is of little significance. This study shows the obvious, insignificant events are insignificant.

The data reported by New york State and probably all others consist of 95% confidence intervals around the observed rates. These reports clearly indicate which results are statistically signicifant, and which are not.

A study that evaluated whether statistically significant low mortality rates are predictive would be interesting.