The Center for Infectious Disease Research and Policy (CIDRAP) is a resource for all manner of information on infectious diseases and especially avian influenza. At their website one can find a technical overview which compiles a lot of bird flu information scattered over many sources. But it is a technical overview (although not overly specialized). Some of the entries may not be self evident even to physicians. We’ve selected one example because it interests us and we think it might interest others. It’s just a couple of sentences but will seem counterintuitive to many.
Laboratory tests do not need to be conducted on all patients with suspected influenza. Factors that influence the decision to test or not test patients with signs and symptoms of influenza include:
[snip]
Level of influenza activity in the community: The positive predictive value of influenza tests, especially rapid assays, increases with prevalence of influenza in the community; therefore, if the prevalence of influenza is low, the utility of the tests decreases. As influenza prevalence increases, the predictive value of clinical diagnosis without laboratory testing also increases and laboratory confirmation may not be necessary. (cite omitted)
Translated this says the positive predictive value (PPV) of influenza tests is better when there is more flu in the community than when there is less flu. If the disease is rare in the community (as is H5N1 in the absence of a pandemic, even in hard hit countries like Indonesia), then the the predictive value of the test is greatly decreased. In an outbreak setting, however, you probably don’t need the test. Seems like a paradox. How does this work? It is really an elementary application of Bayes’ Theorem in probability theory but it doesn’t take a advanced mathematics to understand it. It’s basically arithmetic.
Test performance has two dimensions, accuracy and reliability, common English words with technical definitions in epidemiology. Accuracy measures how well your test reflects the true value of whatever it is you are measuring. Reliability, on the other hand, is a measure of repeatability. If you perform exactly the same measurement again, will you get the same result? This is sometimes called precision.
Accuracy and reliability are different, as you can easily see. A broken meter is quite reliable because it gives the same reading each time you measure something, but it isn’t accurate, since it isn’t related to what it is measuring. It is reliably wrong. On the other hand, a meter whose readings aren’t very repeatable (its measurements bounce around a lot) may still be sufficiently accurate for some purposes if on average it gives the right answer. If you think of shooting arrows at a target, accurate shots are grouped around the bullseye, although they be scattered, while precise or reliable shots are grouped closely together (but not necessarily on the bullseye). You would like both accuracy and reliability, of course, but how much of each you need is dependent on the purpose.
Let’s make this easier and talk only about accuracy and make it easier still by considering a test that has only two readouts, one positive and one negative, like a flu test that tells us if the person has influenza or not. Accuracy now means that when a person has the flu the test correctly tells us so, and when they don’t have the flu, the test correctly tells us they don’t. While this is a simple case on the surface, you can also see there are two ways for a test to fail. It can say someone has the flu when they don’t, or say they don’t have the flu when they do. So there are two corresponding measures of accuracy, one called sensitivity, the proportion of those who have the flu the test correctly identifies, and the other called specificity, which is the proportion of those that don’t have the flu the test correctly records as not having the flu. Note that sensitivity and specificity are related to, but not identical with “false positives” and “false negatives.” A false positive is (1 – specificity) while a false negative is (1 – sensitivity) [NB: Correct versions, as per correction, bottom of post.]. We will stick with sensitivity and specificity in this analysis, although they can be directly converted to false negatives and false positives. One way to remember what “sensitivity” refers to is to think of a sensitive test as one that is sensitive at picking up a disease when it’s there.
We need one more concept and term, predictive value. There is both a positive and negative predictive value but we will only consider positive predictive value, or PPV. The PPV of a test is the proportion of people the test says has the flu that really do have the flu. You need to stop and think about this for a moment. It sounds like sensitivity, but it isn’t. PPV is the thing most people want to know. Here’s wy. Sensitivity asks the question, if you have the flu, how likely will my test be able to tell me so. PPV, on the other hand, answers this question: if I have a positive test, how likely am I to have the flu? These are drastically different questions and provide the clue to why the amount of circulating flu in the community affects the performance of a test as measured by the PPV.
Let’s recap, using a cancer screening test as an example instead of flu (it is easier to visualize). The sensitivity of the test (say some new blood test) is the probability the test will pick up a true case of cancer. Fine. Important question. But if a patient gets the test and it is positive, he or she wants to know what that means for them, i.e., does it mean they are likely to have cancer? The PPV is the probability that you actually have cancer if the test says you do.
Here’s an example. You go to the doctor and she gives you a highly sensitive and highly specific new cancer test. Let’s say it’s 99% sensitive and 99% specific. In other words, highly accurate, a lot better than most rapid flu tests. The test comes back positive. Oh, oh, you think. I have cancer. Better make my will.
Not so fast. For most cancers your chance of actually having cancer if this very accurate test says you do is usually less than 10%, usually much less. The proportion of people in the general population with any particular kind of cancer (e.g., lung cancer) is very small, typically less than one in 10,000. Let’s work this out. If you give the test to one million people, 100 of them will have lung cancer (one in ten thousand times one million people). Your test is highly sensitive so it will correctly pick out 99 out of these 100 cancers. So far, so good. Your test is also highly specific, so it will correctly identify 99% of those without cancer as free of the disease. But since most people don’t have cancer, the remaining 1% of a large population is a lot of people, i.e., it will also misidentify many as well. In this example, 999,900 people out of a million don’t have lung cancer and of these, the test correctly labels 99% of them as not having cancer, or .99 times 999,900. But 1% will be misidentified as having cancer when they don’t. That’s 9999 people the test said had cancer when they didn’t. In all, the test identified 99 + 9999 people as having cancer, of whom only 99 out of (99 + 9999) actually did, or less than 1%. What this means is that even with an extraordinarily accurate test (99% sensitive and 99% specific), if the doctor tells you you have a positive test, your risk of actually having lung cancer is still only 1%. The next step would be to run more expensive or invasive tests to confirm or disconfirm the initial screening result.
The culprit here is easy to identify. Sensitivity and specificity are features of the test, but the PPV also involves how common the condition is in the population. If 50% of the population had the condition, then things change drastically. The PPV is now 91%, not 1%. Thus the PPV is much higher (and thus more informative) with a higher proportion of the population affected then if the condition is rare in the population. This is the source of the innocent sounding statement in the CIDRAP overview:
Level of influenza activity in the community: The positive predictive value of influenza tests, especially rapid assays, increases with prevalence of influenza in the community; therefore, if the prevalence of influenza is low, the utility of the tests decreases. As influenza prevalence increases, the predictive value of clinical diagnosis without laboratory testing also increases and laboratory confirmation may not be necessary.
This is one of the main reasons confirmatory tests are needed when testing is done before an outbreak is underway, and why testing is often not done at all when there is an outbreak. While quick and cheap tests are usually the least accurate, they are useful to reduce things to a higher yield subpopulation for more expensive and time consuming tests. Taking the example above, the initial screening test reduced the original population of 1,000,000 to 10,098 with a cancer prevalence of 1% instead of .01%. Now the PPV for a 99% sensitive and specific test (now a different one) is about 50%. Notice however that unless your test is 100% sensitive you will miss some cases, i.e., you will have some false negatives.
In practice the sensitivity and specificity of a test can be adjusted up or down by changing the threshold for what is called a positive test, but when you do so you usually trade one off for the other, i.e., if you lower the threshold for what is called a positive you increase your sensitivity but you will likely decrease your specificity. How you balance the two is one of the arts of diagnosis and screening and will depend on the costs in money and public health terms of false positives versus false negatives. Remember you can always devise a test that is 100% sensitive (just say every tested subject is positive) or 100% specific (just say every subject is negative), but usually not both at the same time, although in some cases there are means for a definitive diagnosis (e.g., an autopsy). Such tests are 100% sensitive and 100% specific. For routine diagnostic testing that is unusual, however.
This is probably more than most of you wanted to know about this subject, althugh it will be just right for some of you. We only hope there are enough in the latter category to have made this long post worthwhile.
Correction: Amico, in the comments, offers two corrections, the first of which I have now made to the text so it is correct. I inadvertantly reversed the false positive and false negative expressions. The former is (1-specificity) and false negatives are (1-sensitivity).
The other correction Amico characterized as a quibble. He would prefer I say that sensitivities and specificities are probabilities instead of proportions. Technically this is correct. In practice we use the proportion as a way to estimate the underlying probability. I wrote it the way I did because it is difficult enough to keep these terms straight for most students without using the word probability and I think for these purposes the easiest way to think of it is as a proportion, although strictly speaking Amico is correct. However nothing relies on this distinction here, which is why I assume he said it was a quibble.