The paradox of screening mammography and breast cancer

ResearchBlogging.orgIf there's one thing that lay people (and, indeed, many physicians) don't understand about screening for cancer is that it is anything but a simple matter. Intuitively, it seems that earlier detection should always be better, and it can be. However, as I explained in two lengthy posts last year, such is not always the case. To understand why requires an understanding of cancer biology. The reason is the extreme heterogeneity of tumor behavior and prognosis. This variability was well described in a study from about a month ago, in which it was observed that the doubling time of breast cancers of approximately the same size can range from 1.2 months to 6.3 years. I've said before how much the misconception that cancer is just one disease when in fact it is many irritates me, particularly when it is a frequent misconception used by purveyors of quackery to suggest a "cure for cancer." Indeed, based on its extreme variability in behavior, I could even argue that breast cancer is not even one disease, and there is evidence to support such a view.

One of the key questions when developing a screening program for any cancer is whether earlier detection does actually improve prognosis and survival for that cancer. It's not as easy as it might seem to design and carry out studies that demonstrate whether a given screening test can in fact result in this desired outcome or not. When that test is mammography, there is no doubt that tumors picked up on mammography have a better prognosis than those detected by symptoms (for example, feeling a lump). The question that has not been so clear is how much of this benefit is due to the screening itself and how much is due to other factors. Indeed, some critics of mammography even argue that there is no survival benefit at all from mammography and the reported benefit is instead all due to lead time bias. A recent study out of the U.K. published in the British Journal of Cancer1 seeks to answer this very question; i.e., whether a cancer detected by mammographic screening confirms an additional survival benefit over and above that caused by the shift downward in tumor stage due to earlier detection.

This particular study examined women between the ages of 50 and 70 in eastern England. Most likely the reason that, unlike the case in the U.S., in most European countries mammographic screening does not begin before the age of 50. In any case, this study examined the records of 5,604 women diagnosed with invasive breast cancer between the years of 1998 and 2003 and identified by the Eastern Cancer Registration and information Centre ECRIC). Using multivariate analysis, the investigators examined the effect of age, mammographic screening status (the tumor detected by mammography or by symptoms), along with standard clinical parameters, such as tumor stage and the presence and absence of positive lymph nodes. They also examined the Nottingham prognostic index (NPI) of each tumor. This is an index based on the size of the primary tumor, the presence or absence of positive lymph nodes, and the grade of the tumor, which is a measure that pathologists make by looking at the tumor cells under the microscope and estimating how "bad" or undifferentiated they look. High grade is bad, and low grade is better. Similarly, the NPI is divided into five different prognostic groups, excellent (NPI5.4). The NPI was used in this study as a surrogate for the intrinsic biological aggressiveness of each tumor, regardless of how it was detected. All of this was then compared to survival data to determine which factors were associated with better or worse prognosis.

In the initial analysis, consistent with previous studies, there was a strong effect of screening on prognosis overall, and the effect was more pronounced in the worst NPI classes. Next, the investigators adjusted for NPI and other prognostic factors. When this was done, it was noted that screening status remained as an independent risk factor for survival. However, when the investigators quantified the relative contribution to this difference, they found that adjusting for size, nodal status, age, and NPI took away 72% of this effect, but left a real effect, with tumors detected by mammographic screening being only 79% as likely to result in death as a tumor detected by symptoms. When the authors graphed survival as a function of NPI, they obtained this graph:

i-d4634eaa7549ee97b4de6d4fd0baf167-NPI.jpg

The natural first question on seeing this graph is: Why does detection by screening mammography apparently confer such a small advantage in survival. What this likely means is that most of the benefits of screening come from factors I've discussed before: stage migration and lead time bias. In lead time bias, screening detects the biological process at an earlier point in its evolution, leading to an apparent and artificial increasin in survival time from diagnosis even if treatments of the disease in question have zero effect. In essence, the time between when the tumor would have been detected by physical examination and when it was detected by screening is added to the apparent prognosis. So, let's say that screening allows a tumor to be detected two years before it would have become symptomatic. Even if treatment has no effect whatsoever on patient survival, such early detection would produce an apparent increase in survival of two years. That's of course the "Cliff Notes" version; I've discussed the concept of lead time bias in great detail before. The other relevant concept is length bias. Put in its "Cliff Notes" version, length bias is a phenomenon in which slower growing cancers remain in a preclinical detectable phase for a longer period of time and thus are more likely to be detected by screening programs. What this means is that by their very nature screening programs tend to detect proportionally more slow-growing, good prognosis tumors. Most likely, therefore, the majority of the difference in prognosis between tumors detected by mammography versus tumors detected by symptoms comes down to lead time bias and length bias.

Most, but clearly not all.

One thing that the data show is that there is definitely a small but very real survival benefit when a tumor is detected mammographically. Indeed, one point that the authors make is that perhaps whether or not a tumor was detected by mammography rather than by symptoms should be a prognostic factor factored into treatment decisions:

These data confirm the known survival advantage for patients with screen-detected cancers. They show that although most of this advantage is due to a shift in NPI, the mode of detection does impact on survival in patients with equivalent NPI scores. This residual survival benefit is small but significant, and is likely to be due to differences in tumour biology between screen-detected and symptomatic cancers. Current prognostication tools that do not include known biological markers may overestimate the benefit of systemic treatments in screen-detected cancers and lead to overtreatment of these patients. A prognostic tool combining clinical, pathological and biological factors might allow more accurate prognostication, and more appropriate systemic therapy, for all patients with breast cancer regardless of their mode of detection.

The point that the mode of detection of a breast cancer does indeed appear to have prognostic significance, aside from differences in NPI or differences solely due to lead time bias or length bias was also driven home in an accompanying editorial by Dr. Berry from the M.D. Anderson Cancer Center2 and further explains the issue of lead time and length bias:

Length bias is more important than lead-time bias, at least in breast cancer. But neither its importance nor the concept itself is easy to understand. 'Length' refers to the tumour's presymptomatic period when the tumour is mammographically detectable. The length of this period is the tumour's sojourn time. Sojourn time varies from one tumour to another. (There is an obvious relationship between lead time and sojourn time; lead time is shorter because it requires actually finding the tumour during the presymptomatic period.) Sojourn time is typically positive, but it is negative for tumours that become symptomatic without being detectable on a mammogram. Breast tumours are heterogeneous, even after accounting for stage and other known clinical and biological characteristics. Aggressive tumours have shorter sojourn times because they grow faster. Indolent tumours have longer sojourn times. Screening finds tumours in proportion to their sojourn times, and therefore longer times and slower growing tumours are preferentially selected. This is length bias. (There are many analogues: when you look in the sky and see a shooting star, it is more likely to be one with a longer arc; when you reach into a newly opened bag of potato chips and select one, it is more likely to be big.) A special case of length bias is overdiagnosis, when screening finds a tumour with a sojourn time so long that the tumour would not kill the woman even if it was never found.

As he further explains, accounting for NPI partially removes lead time bias but does not remove length bias. Some tumors will naturally grow faster than others, and by their natures such tumors will have worse prognoses than tumors that grow slowly. One consequence of not taking mode of diagnosis into account could be overtreatment if tumors detected mammographically are indeed less aggressive than those detected by symptoms. My take on this issue is that there is a lot of biology that is not well understood underlying the differences in aggressiveness of different tumors. New technology that can look at the gene expression profile of tumors or to profile the levels of large numbers of proteins will lead us to understand what gene "signatures" control aggressiveness and metastasis in tumors. However, as Dr. Barry explains, there is a paradox:

Therefore, although the authors are correct in worrying that screen-detected cancers may be overtreated, a greater concern is that some screen-detected cancers should never have been detected in the first place! The rub is that just as with treatment, we do not yet have a good understanding regarding which cancers we do not want to detect. Mammography is too crude a tool to make this distinction.

Indeed it is. The paradox of breast cancer screening is that there are indeed some tumors whose sojourn time is so long that they will never harm the patient and it is these tumors that we tend to detect more with intense screening. The price of detecting tumors early and realizing the benefit of screening is that some tumors that would never develop will be detected and, because we do not have reliable tests to differentiate highly aggressive from indolent tumors, treated. Despite its proven ability to decrease mortality from breast cancer in women over 50, mammography does remain a pretty crude tool. The reason it persists is because it is inexpensive, at least compared to newer modalities. Unfortunately, the major problem that was not mentioned in either the article or the editorial is that newer, more sensitive modalities like MRI suffer from the same problem in spades, as I've discussed before. Indeed, because of the sensitivity of MRI, it is even less able to distinguish between tumors. That's why I tend to believe that ever more sensitive detection modalities are not the answer. Rather, the development of better molecular diagnostic tests that more accurately distinguish between aggressive tumors and tumors that are unlikely ever to trouble the patient will be far more likely to improve the "signal-to-noise" ratio and decrease the unwanted phenomenon of overtreatment.

REFERENCES:

1, Wishart, G.C., Greenberg, D.C., Britton, P.D., Chou, P., Brown, C.H., Purushotham, A.D., Duffy, S.W. (2008). Screen-detected vs symptomatic breast cancer: is improved survival due to stage migration alone?. British Journal of Cancer, 98(11), 1741-1744. DOI: 10.1038/sj.bjc.6604368

2. Berry, D.A. (2008). The screening mammography paradox: better when found, perhaps better not to find. British Journal of Cancer, 98(11), 1729-1730. DOI: 10.1038/sj.bjc.6604349

More like this

"That's why I tend to believe that ever more sensitive detection modalities are not the answer. Rather, the development of better molecular diagnostic tests that more accurately distinguish between aggressive tumors and tumors that are unlikely ever to trouble the patient will be far more likely to improve the "signal-to-noise" ratio and decrease the unwanted phenomenon of overtreatment."

Ooh, yes! ANYTHING to supplant the (EXTREMELY PAINFUL) mammogram! Could you expand upon what these better molecular diagnostic tests might entail?

By Melissa G (not verified) on 23 Jun 2008 #permalink

The problem seems to be that it's easier/faster to produce screening machines than it is to understand cell biology and produce a measure/test that would tell you whether your DCIS will be a problem or not.

Is this just another manifestation of the "thinking" vs "doing" bias in American medicine?

Top quality post on an actual clinical science problem with obvious real world interest. And of course you will get almost no comments on it.

So, well done and thanks.

Orac, I think you should submit this excellent piece to PLoS, so that other scientists can ponder the valuable information. We cannot arrive at good answers if we only look at one corner of the puzzle. Thank you. Dr. Bob Gleeson

By Dr. Bob Gleeson (not verified) on 23 Jun 2008 #permalink

Fascinating and useful. I gather that Professor Michael Baum has made a vigorous argument that (in the UK) the breast cancer screening programme is expensive for what it achieves and that the funds would be better spent in researching tests to distinguish progressive and non-progressive tumours and in improving treatment upon diagnosis. (I chose this link as less intemperate than some of his other writing; I should say that I find his argument to be interesting.)

For too long the mantra of screening - 'catch it early and we will save your life and save your breast' - has been allowed to unchallenged. Breast cancer is too complex a problem for such a facile solution. For too long women have been patronised and coerced into screening. And now the government wants to extend screening for breast cancer to women below the age of 50, in the face of the recently published UK trial that showed no significant advantage for this age group...
I therefore propose that the uncritical promotion of screening is unethical by modern ethical standards and reflects a paternalistic attitude that would be unacceptable for treatment aimed at curing established disease...In the area of established disease, focusing on prevention can be dangerous, and very often cure is better than prevention.

By Mary Parsons (not verified) on 24 Jun 2008 #permalink

Maybe you can answer a question - way back when the allele for family-related breast cancer was isolated, the original researchers suggested that the gene was particularly sensitive to radiational changes - detection with x-rays might not be recommended. However, before long, that caveat kind of disappeared, and I could never find out whether other investigators found otherwise or that it was suggested that such a recommendation just wasn't politically viable - they didn't want anything that might frighten women away from screening out there. Do you know what happened?

"The problem seems to be that it's easier/faster to produce screening machines than it is to understand cell biology and produce a measure/test that would tell you whether your DCIS will be a problem or not."
But is it just DCIS? One of the more recent studies that tried to estimate overdiagnosis - and the only one that was actually based on real data and not on modeling - the one from Malmo trial, estimated overdiagnosis at 10%. However, their math had a major flaw - their estimates were sensitive to follow up period i.e. they would've decreased the longer this period is even if all extra cases had been due to overdiagnosis. This is clearly explained in rapid responses (http://bmj.bmjjournals.com/cgi/eletters/332/7543/689) especially from Welch.

When this flaw is corrected, the estimate gets closer to 25% of all screen-detected cancers. Given the improvements in technology, this number can be even higher. Could DCIS alone be blamed for such a large number?

There have been other studies - some put estimate as high as 40% as some as low as 5%. But this was the only one based on real data and not just math exercises based on assumptions one may argue with.

@Mary Parsons - Dr. H. Gilbert Welch said more or less the same thing in his book "Should I be tested for cancer. Maybe not and here is why" and in a number of papers:

I object to the emerging mindset that patients should be persuaded, frightened, and coerced into undergoing [mammography, PSA tests, fecal occult blood testing]. There is today a certain "medical correctness" about screening - making patients feel guilty if they choose not to pursue testing. This is wrong.

IMHO - we, the patients need accurate information about both benefits and risks of screening. If public has better understanding about limitation of tests with proven benefit, it will cool off a bit public enthusiasm for non-recommended tests as well which would result to fewer demands of latest and greatest test one saw on TV.

What about those of us already diagnosed with cancer? Does the need for follow-up screening increase for us? What about our sisters?

Although insured with "good" PPO insurance, I can't afford the co-pays for the yearly MRI of my one remaining "healthy" breast (young age = dense breasts = maybe mammography not enough). Should I just skip it? My onc says OK to skip MRI as long as I do yearly mammogram. What would you advise someone like myself?

Very enlightening post. Thanks so much for it.

A few questions:
do you think the gene assays like the oncotype DX have addressed some of these issues regarding over treatment? What do you see as their current limitations?

A question about the nottingham prognostic indicator and other prognostic indications, how would you stratify risk for the three following patients:
1. only DCIS found, but also a single micro met to the sentinal node;
2. multi focal disease with more than one primary (do you ignore the smaller primaries and only concentrate on the larger one, as current staging practice dictates, or do you think that multi focal disease is any kind of prognostic indicator?);
3. patient with a high grade (9/9 BR) tumor, over 2cm, but lymph node negative? Does tumor grade and size trump lymph node status, or does the lack of cancer in the nodes point to a tumor that probably isn't programmed to spread?

Thanks.

I seem to recall a similar discussion here in Canada about the merits of widespread prostate testing - many prostate tumours were being diagnosed that would never have harmed the man in his natural lifetime, but they were being treated anyway, leading to the subsequent drop in quality of life resulting from post operative problems.

http://www.cbc.ca/news/background/cancer/psa.html

I agree that being able to determine the level of aggressiveness of the tumour is the way to go.

Interesting post. I know of so many young women diagnosed way below 50 years of age, that I feel a baseline should be done way before 40 even. I like your idea for a different diagnostic tool than the one used at present. I saw on Kevin's site that someone objected somewhat to the term "crude tool," I thought - you've obviously never had one. You'd love one post op then.

To laurasf, I was young when I was diagnosed, and I continue only to have the mammogram each year. Make sure, you make yourself do a breast self exam. You need to know for yourself. You will know yourself better than any clinician.

I hope you don't mind Orac. I understand how it is to be in laurasf's shoes. We need to take some responsibility to know ourselves too. Even when my tumors were palpable, I had to place my surgeons fingers right on it, he couldn't find it on his own, nor could my Gyn. Just trying to help a fellow survivor.