Effect Measure

The Director of Loyola University Medical Center’s clinical microbiology laboratory is reported as saying that rapid flu tests are a public health risk. Here’s some of what he said and then my explanation as to why it is misleading or just plain wrong:

Rapid influenza diagnostic tests used in doctors’ offices, hospitals and medical laboratories to detect H1N1 are virtually useless and could pose a significant danger to public health, according to a Loyola University Medical Center researcher.

“At Loyola, we determined four years ago that the rapid tests for influenza detected only 50 percent of the patients who were positive,” said Paul Schreckenberger, Ph.D., director of Loyola’s clinical microbiology laboratory. “I can flip a coin and get the same results as I could with those tests. So what’s the value of the tests? I can flip a coin for free.” (Medical-News)

We’ve discussed this before, but since the underlying concepts seem poorly understood (even by clinical laboratory directors), it’s time to discuss it again.

First, let’s review how we evaluate diagnostic tests. There are two dimensions: reliability and accuracy. These terms have technical definitions in epidemiology. Reliability is what most people call repeatability. If you perform the same test twice on the same specimen, will you get the same result? Extremely reliable tests may also be extremely inaccurate. If you measure blood pressure with a broken device so that it gives the same (wrong) value each time, the device is extremely reliable (in the technical sense) but very inaccurate. Accuracy is a measure of how close you come to the true value. With blood pressure that can be measured quantitatively and inaccuracies can be of various degrees, but for qualitative measures like “flu or no-flu” it is either right or wrong. There are two kinds of ways to be wrong: you can say someone has the flu when they don’t, or that they don’t have the flu when they do. There are also two kinds of ways to be right: if the person has the flu your test can correctly say they do; and if they don’t have the flu, your test can correctly say they don’t.

It is the latter two measures that are used to express accuracy of a test, because the former two, although convertible to the latter, are measuring error, not accuracy. These two measures (how good is the test in picking up flu and how good is it in picking up non-flu) are called sensitivity and specificity. It is the relatively poor to moderate sensitivity of the rapid antigen test (50%) that Dr. Schreckenberger is complaining about and likening it to a “flip of a coin.” Unfortunately that is not the right way to think about 50% sensitivity because it ignores two other vital pieces of information, the specificity of the test (an independent measure of accuracy) and the prevalence of flu in the community (a feature independent of the test’s accuracy but important for evaluating its use).

Note that the test is not being applied to everyone in the population, only those who have influenza-like illness (ILI). There are many causes of ILI besides influenza and there are more kinds of influenza than swine flu, but it turns out that just about the only type and strain of influenza currently circulating in the community is swine flu, so that if you have influenza at this time it is almost certainly swine flu. Thus the relevant question is not whether you have a positive test if you have swine flu (that’s sensitivity and the 50% figure Dr. Schreckenberger quotes), but whether you have swine flu if you have a positive test. That’s something he doesn’t discuss, and it’s not clear he understands the difference because he likens the outcome of the test to the flip of a coin. But it isn’t. Given 50% sensitivity, if someone comes in that I know has swine flu, I could flip a coin and get the same result as the rapid test. But I don’t know if they have swine flu or not. If I did I wouldn’t have to do any test. I only know that they have an ILI and I want to know if that ILI is swine flu. In mathematical terms I am conditioning on a positive test, not conditioning on having swine flu, but you can get an understanding of this technical point with some examples.

Suppose it’s a non-pandemic year and summertime, when most ILIs are caused by viruses other than influenza virus. Let’s say that out of every 100 ILIs, only 6% are caused by a flu virus. That’s 6 people. The rapid test will pick up 50% of them (that’s what 50% sensitivity means), or 3 actual cases. Three of the 6 will go undetected by this test. But there may be others with positive tests who don’t have swine flu. That’s a reflection of a lack of specificity. For the rapid test specificity is pretty high, often 90% – 100%. That sounds pretty good, but in the instance where flu isn’t common among ILIs, as in a non-pandemic summertime, you will see what happens. There are 94 people out of 100 in the example who don’t have flu. If the specificity is 100%, then there are no false positives and the predictive value of the test is 100%: everyone with a positive test also had the flu (even though 3 were missed). If the specificity is 90%, 90% of 94 is about 10 cases (rounding up), so of 13 positive tests (3 real cases and 10 positive tests that were other kinds of viruses), the chance of being a true case if you had a positive test is only 23%, much worse than Schreckenberger’s “flip of a coin.”

Now consider what happens if the proportion of flu among the ILIs is 30% or 50%, something that can easily happen in a pandemic or just a regular flu season when a lot of the ILIs are being caused by flu. For 30% flu among the ILIs, 15 are being picked up by the rapid test, while of the 70 ILIs that are not flu, either 100% (79) or 90% (63) are negative by the test. That means that either 0 or 7 positive tests are wrong. In the first case, every positive rapid test will be correct, while in the second 15 of 22 (15 + 7) will be correct, or 68%. If 50% of ILIs are flu, the situation is worse for the Schreckenberger claim: a positive test will correctly identify swine flu in 50 out of 55 positive tests (for 90% specificity), or 91% of the time.

So much for the claim that a 50% sensitivity means the test is no better than a toss of a coin. That would be true only if you used it in a circumstance where you already knew the answer (that the person had flu already). But it’s useful to also consider the flip side. What should you conclude if the test says you don’t have flu? With a 6% prevalence of flu among the ILIs, 90% specificity means that 84 of the 94 people without flu will have negative tests and 3 of the 6 who do have flu will have negative tests. That means that a negative rapid test for flu will be correct on average 84/87 = 96.6% of the time (for 100% specificity it is 97%), while for the worst case, 50% prevalence, a negative test will be right 64% of the time, still better than the flip of the coin Schreckenberger alleges.

What’s the lesson here? When there’s hardly any flu around, a positive rapid test isn’t very good at predicting whether you have flu, but a negative test almost certainly means you don’t. When there’s a lot of flu around (half of ILI being flu), a positive test can be quite accurate (depending on the specificity; with 100% specificity it is 100% correct, but even for 90% specificity it is correct over 90% of the time). A negative test will be correct only about 2/3 of the time, but better than a flip of the coin. With an intermediate case of 30% ILI being flu (quite conceivable during the flu season; remember this isn’t 30% of the population, but 30% of those with flu-like symptoms), a positive test will be right more than 2/3 of the time and a negative test will be right 81% of the time, both better than a “flip of the coin” (these are both for 90% specificity; with greater than 90% specificity the test will do better).

CDC has been quite clear that clinicians should not rely on negative tests to say a patient doesn’t have flu, since in current circumstances that could be wrong about a third of the time. That’s not a flip of the coin but it’s still a lot of missed cases. But with a lot of flu around and virtually all of it swine flu, a positive test is a very strong indication you have swine flu. Not a flip of the coin.

The point of the post, though, is not to advocate for rapid tests (their value depends on any particular test’s sensitivity and specificity and the prevalence of flu in the population, and all three of which vary), but to discuss some aspects of testing that seem to be poorly understood. Even by clinical laboratory directors.

Comments

  1. #1 Don S
    November 19, 2009

    It also depends on the circumstance in which we clinicians are using the test. Let me illustrate –

    One, the classic CDC don’t bother to use the test one – I have a high risk child, say a kid with CP and asthma, who has 103 for a day cough sore throat and achiness. Sure a positive test is highly likely to be a true positive but the only clinical utility would be a very high negative predictive value – bottom line is that no matter what the test says I am going to treat – why bother testing?

    But what about the 18 month old kid with 101 and mild croup? That COULD be influenza and the kid is by definition “high risk” with Tamiflu advised if it is. Influenza looks like a host of other viral infections in this crowd. I still can’t believe a negative test. Should a positive test be enough to convince me to treat if I would not otherwise? Or a negative be enough to convince that I shouldn’t?

    That Loyala director’s point is that o, it isn’t much use to you. We are stuck, with a high risk individual, treating if we think it might be an ILI. And with the not high risk one, why do we care?

  2. #2 revere
    November 19, 2009

    Don: I don’t think that was his point because he also touted their PCR test to see if it was swine flu. What he said was that the test was no better than flipping a coin because its sensitivity was 50%.

  3. #3 Don S
    November 19, 2009

    Sorry to doublepost, but it is important to underline why the test may be, as claimed by that doctor, “a significant danger to public health”, if clinicians fail to understand how poor a test it is.

    A negative test MIGHT falsely convince a clinician that they do not need to treat someone high risk with ILI, a circumstance in which the negative predictive value may be less than 50%, depending on the rate of illness in population of those with those symptoms at that time.

    A positive test in the second case MIGHT falsely convince a clinician or a patient that that they have already the disease and that they do not need to be vaccinated, needlessly exposing them to true illness later in the season.

    It may or may not be exactly true that the odds of a true positive is 50% in all circumstance, and very likely not in the background of being in the middle of a pandemic and given an individual with clear ILI, but the harm of the test to public health is real.

  4. #4 revere
    November 19, 2009

    Don: Since the sensitivities and specificities of many tests used clinically are as bad or worse, does that mean that much of what clinicians do is a danger to public health? But the main point of the post is that the claim that a sensitivity of 50% is equivalent to flipping a coin when you test someone just isn’t true. it is a confusion that is common but still a confusion. There are three parameters here: two for the test, one for the tested population. He only considered one of the three and got it wrong. Note the last paragraph of the post.

  5. #5 Don S
    November 19, 2009

    Oh I get that he doesn’t get the whole Bayes Theorum thing. But let’s face it, few do. Our failure to understand it in real life creates all sorts of what has been called “cognitive illusions”. But his bigger point is real and valid.

    And yes, much of how clinicians use tests is a harm to public health for the exact same reason – a failure to appreciate that even a test of high specificity is likely a false positive if the “prior probability” is low – many medical misadventures begin with that misunderstanding, with the false belief that screening is always better. A highly visible current events example is the controversy over screening mammography in the 40 to 50 year old woman crowd … Let’s just leave it that we clinicians as a group often misuse and overuse tests.

    Testing can and often does more harm than good and should be used selectively. In my first case I would guess that my clinical assessment of influenza is likely much higher than 50%, maybe 90% plus. In the second? I don’t know. I’d like to know if a positive test is likely believable so I don’t have to treat every not very ill appearing under two kid with a febrile URI or febrile croup who I would otherwise have a less than 50% suspicion of H1N1. Is this test good enough for that?

  6. #6 revere
    November 19, 2009

    Don: An old friend once joked about the SMAC-20 and its ilk: if you order an unnecessary test you’ll get an abnormal result. Then everyone is sucked into the medical vortex.

  7. #7 da
    November 19, 2009

    I’m aware of at least one study that looked at the sensitivity and positive predictive value of a clinical diagnosis of influenza among children tested by viral culture: 45% and 41% respectively during peak flu activity, or worse than if you just “flip a coin” ;)
    Accuracy of Clinical Diagnosis of Influenza in Outpatient Children

  8. #8 Don S
    November 19, 2009

    Interesting article but I’d still suspect that the first hypothetical would be 90% plus! (Obviously most are not so clear cut.) But for the rest, this bit is especially my point:

    “The clinical diagnosis of influenza was least accurate in children <3 years of age. This finding was discouraging, especially because the burden of influenza is greatest in this age group [1, 2], but our findings are in accordance with the clinical experience of diagnosis of respiratory infections in infants and young children. Most viral infections in these children present with varying degrees of fever [18], and infants and young children are unable to verbally describe their subjective symptoms such as headache, myalgias, or pharyngitis”

    So what is a pediatrician to do?

    The test sucks. My ability to tell clinically apparently sucks. Which high risker (especially in the under two crowd) do I put on Tamiflu? (Which is not risk free candy corn.)

  9. #9 Don S
    November 19, 2009

    That was supposed to be “in children less than 3 years old. This finding was discouraging, especially because the burden of influenza is greatest in this age group [1, 2], but our findings are in accordance with the clinical experience of diagnosis of respiratory infections in infants and young children. Most viral infections in these children present with varying degrees of fever [18], and infants and young children are unable to verbally describe their subjective symptoms such as headache, myalgias, or pharyngitis.” (Somehow cut off by the less than sign?)

  10. #10 tymbuktu
    November 19, 2009

    Early this AM the 58-year-old mother of my step-son’s fiancee died in an Ohio Hospital of H1N1. She had been in ICI/Isolation for two weeks on a vent, medically induced coma and medcially induced paralysis for about a third of that time. The first two rapid tests were negative although she clearly had the classic symptoms plus double pneumonia ( a biopsy of the lungs did not show any antibiotic resistant bacterial infection ). She was “in” a rotating bed that looked like something out of Alien. She never regained consciousness. Yesterday the PCR test came back from the state as positive for H1N1.

    Not like I’m a physiciam but I texted them over a week ago about trying an ECMO and found where they were near her. The doctor called Ohio State but told the family it would be pointless “because everyone dies right after they come off the machine.”

    I texted them the CDC page for emergency use of Peramivir (sp?) but this wasn’t tried either. I obviously can’t second guess the physician but in light of your article on testing, here was a example of someone who “obviously” had H1N1 but whose correct test results took almost two weeks. She “died” of liver and kidney failure.

    She had never had pneumonia before but had been diagnosed with possible mild asthma a few years ago and given a rescue inhaler, which she never needed.

  11. #11 Paula
    November 19, 2009

    It seems as if a crucial issue here is how the limitations of these tests are/not understood in actual, ordinary medical offices. Many such offices–and I don’t mean only in rural areas–will have recent med school graduates or new nurses, not necessarily well trained, telling patients not to worry hey the test came out just fine–negative. This isn’t going to happen much in “good” practices in the neighborhoods of teaching hospitals, but it probably is happening in many other places. In this respect, Dr. Schreckenberger’s point is definitely true. Perhaps the issue is training, not test limitations, but lives are just as affected.

  12. #12 revere
    November 19, 2009

    Paula: Don is correct that the subtleties of the conditional probabilities involved (Bayes Theorem being the basis) are not understood by most doctors or even bench scientists, althoutgh they are critical to interpretation. It’s a fact of life.

  13. #13 ginger
    November 19, 2009

    “With blood pressure that can be measured quantitatively and inaccuracies can be of various degrees, but for qualitative measures like ‘flu or no-flu’ it is either right or wrong.”

    I don’t want to be a pedantic dick, but I guess I can’t help myself. You don’t mean quantitative and qualitative; both these measures are quantitative. You mean continuous and binary.

  14. #14 revere
    November 19, 2009

    ginger: You raise an interesting scaling issue. Some of us use random variable in its generalized sense, a map from the sample space to a set, not necessarily ordered and certainly not necessarily having the structure of a field (in your example, Z2). It could map into an arbitrary set. And of course the contrast with continuous isn’t binary but discrete. While you might conceive of blood pressure as a continuous variable (and can model it that way), in reality all measurements are discrete. This is just to prove I can be even more of a pedantic dick than most anyone. It is a sad fact, though, that I do adhere to a more general, and increasingly common, notion of a random variable because of my own research, the mathematical foundations of epidemiology.

  15. #15 betty
    November 20, 2009

    Fortunes are made from perceived risk… but when someone you love is dying you just ignore the math. On a lighter note, does post #14 make everyone’s heart beat faster?

  16. #16 Student (former)
    November 20, 2009

    for 50% ILIs being flu, PPV = 83% (assuming 90% specificity). positive test correctly identifies swine flu in 25 of 30 (as opposed to the 50 of 55 as written).

    (keep going, Revere! we’re still out here and paying attention)

  17. #17 revere
    November 20, 2009

    If you are a former student then you know I make a lot of arithmetic mistakes. I prefer Greek letters to numbers. But glad someone is paying attention!

  18. #18 Don S
    November 20, 2009

    tymbuctu,

    Please accept my condolences and best wishes to you and your family on your loss.

    Don

  19. #19 Paula
    November 20, 2009

    Tymbuctu,
    My condolences too, and I was angered to read of the physicians’ refusals to look into ECMO or even peramivir.

Current ye@r *