One frequently hears claims that the current swine flu pandemic has been exaggerated because there are “only” 1000 or so deaths, while seasonal flu is estimated to contribute to tens of thousands of deaths a year. There are two reasons why this is not an apt comparison. We’ve discussed both here fairly often. The first is that the epidemiology of a pandemic and seasonal flu are very different. Epidemiology studies the patterns of disease in the population and swine flu is hitting — and killing — a very different demographic from seasonal flu. Its victims are young and many are vigorous and healthy. The second is that it compares apples to oranges. The 1000 deaths figure is for laboratory confirmed swine flu deaths (as are the various case counts), while the seasonal flu figure is an estimate, not a count of laboratory confirmed influenza deaths (see our post here if you want to know more about how the sausage is made). CDC and the states stopped counting cases early in the pandemic (here for some commentary from us), so we don’t know how many cases there have really been. CDC keeps track of the general trends and patterns through a multi-part surveillance system. But for planning and resource allocation it would still be nice to know how much flu there is. Now a paper has appeared in Emerging Infectious Diseases that provides us with some rough and ready estimates. It also explains why this number is so hard to get.
Here’s the set-up:
Human cases of influenza A pandemic (H1N1) 2009 were first identified in the United States in April 2009. By the end of July, >40,000 laboratory-confirmed infections had been reported, representing only a fraction of total cases. Persons with influenza may not be included in reported counts for a variety of reasons, including the following: not all ill persons seek medical care and have a specimen collected, not all specimens are sent to a public health laboratory for confirmatory testing with reverse transcription-PCR (rapid point-of-care testing can not differentiate pandemic (H1N1) 2009 from other strains), and not all specimens will give positive results because of the timing of collection or the quality of the specimen. (Reed et al., Emerging Infections Diseases [Epub ahead of print; DOI: 10.3201/eid1512.091413], cites omitted)
In order to make an estimate, the researchers (from CDC and Harvard School of Public Health) used a very simple multiplier method coupled with a Monte Carlo simulation. We’ll explain both. First the multiplier method.
The multiplier method sequentially adjusts for the loss in each of the five steps a case has to climb in order to be counted: seeking care for ILI (A), specimen collection (B), submission of specimens for confirmation (C), laboratory detection of pandemic (H1N1) 2009 (D), and reporting of confirmed cases (E). Take the first step (A), care seeking behavior. What proportion of all cases of influenza-like illness (ILI) have a specimen taken? If you don’t go to a health care facility or doctor for your illness you will never have a specimen taken and won’t be counted. So in order to figure out how many ILI cases will pass this first barrier to being counted, the authors checked published and unpublished studies where such information could be estimated. For example, in 2007 CDC did a community survey in 10 states on ILI and health–seeking behavior as part of their ongoing Behavioral Risk Factor Surveillance Survey (BRFSS). BRFSS is done in all states but each state can add items to a core set. Ten states added items on ILI and care seeking. In May 2009, at the start of the pandemic, the 10 states and CDC repeated these surveys, supplementing it with special field studies done in Chicago and Delaware. That provided some basic information from a random sample in 10 states of how likely someone was to seek care if they had ILI, both in a “normal” flu season (2007) and during the early stages of the swine flu pandemic (May 2009). Early in the pandemic doctors and health departments were urged to collect clinical specimens from all suspect ILI cases, but the testing burden rapidly overwhelmed the states. On May 12, CDC guidance was revised to concentrate on hospitalized patients. The authors therefore used different estimates for the proportion seeking care and getting tested (step A) after that date. And because they suspected hospitalized patients presented with more serious illness and were more likely to be tested, they estimated prevalences from hospitalized and non-hospitalized lab confirmed cases separately, using higher multipliers for the non-hospitalized patients.
To give you some idea of the multiplier data and their source, here are steps (A) and (B) as given in Table 1:
(A) Proportion of persons with influenza who seek medical care: 42%, from 2007 BRFSS; 52 – 55% from 2009 BRFSS, 49 – 58% Delaware University survey; 52%, Chicago community survey.
(B) Proportion of persons seeking care with a specimen collected: 25% (2007 BRFSS), 22- 28% (2009 BRFSS) 19 – 34% (Delaware survey).
And so on for the rest of the steps. The multipliers are 1.0 divided by the proportions. For example, if the proportion seeking care was 50%, the multiplier would be 2 (1/0.5).
Although I illustrated the process at steps A and B, the authors had to work backwards from step E, the number of reported laboratory confirmed cases. That’s the number CDC gives out to the public. On July 23, 2009 there were a total of 43,677 lab confirmed cases reported to CDC by the states, including 5009 hospitalizations and 302 deaths. These are the “hard data” they have for this. But if you have a multiplier for that step you can get a number for step D, and using step D’s multiplier you can get steps, C, B and A. Finally step A’s multiplier gives you the estimated number of swine flu over the period covered. This is the estimated prevalence for swine flu in the US from April to July 2009 (it is thus a period prevalence, not a point prevalence).
Now these data give ranges for the proportions and are not precise, so there are uncertainties in the underlying number. That’s where the Monte Carlo method comes in. As the name suggests, this is a probabilistic method, but it isn’t difficult to explain. If you are familiar with spread sheets, you can easily see how you might use the proportions in formulas to “back out” from the lab confirmed flu number to the number you want. Of course you could easily do this by hand, but the Monte Carlo method does this over and over again, thousands of times, so a computer is needed. The Monte Carlo method actually does use a spread sheet to do this, but it randomly picks a proportion for each step from a range suggested by the data. For example, in step B (proportion of specimens taken for non hospitalized patients) the proportion was randomly picked (uniform distribution) from the range 19 – 34% and for step C (proportion of specimens collected sent for confirmatory PCR) it was randomly selected from the interval 20 – 30% before May 12 and 5 – 15% after May 12. If you do this over and over again you will get a different set of 5 multipliers each time (because they are randomly selected) and you can run the spread sheet to get a final number (multiplier times result of step A). The authors did this 10,000 times and got a range of numbers for the final result of backing out all the multiplier steps. These are the ranges given by the paper. The same multiplier method was done for specific age groups but there wasn’t enough information to use the Monte Carlo method.
So what were the results?
We demonstrate that the reported cases of laboratory confirmed pandemic (H1N1) 2009 are likely a substantial underestimation of the total number of actual illnesses that occurred in the community during the spring of 2009. We estimate that through July 23, 2009, from 1.8 million to 5.7 million symptomatic cases of pandemic (H1N1) 2009 occurred in the United States, resulting in 9,000-21,000 hospitalizations. We did not estimate the number of deaths directly from our model, but among reports of laboratory-confirmed cases though July 23, the ratio of deaths to hospitalizations was 6%. When applying this fraction to the number of hospitalizations calculated from the model?that is, by assuming that deaths and hospitalizations are underreported to the same extent?we obtain a median estimate of 800 deaths (90% range 550 – 1,300) during this same period. (Reed et al., EID [cites omitted])
The median multiplier for reported to estimated cases was 79. In other words, for every reported, lab confirmed case there were 79 total cases. That median estimate gives about 3 million cases. The 90% range was 47 – 148, meaning that each reported case could represent somewhere between 47 to 148 unreported cases. That’s gives the 1.8 million – 5.7 million figure.
There are also age-specific incidence estimates, using the multiplier method and census data on underlying populations for each age group. These give a median of 107 cases per 100,000 persons in the 65+ age group, 2196/100,000 in the 5 – 24 year old age group. Hospitalization rates were highest in young children (median 13/100,000 children under age 5). Once again, the extraordinary susceptibility of the younger age groups is evident.
There are soft spots in this back-of-the-spreadsheet type of analysis, of course, which the authors are explicit about. For example, the multipliers are derived from people with ILI (fever with cough or sore throat without any other known cause). Not everyone with flu has fever or respiratory symptoms like that, so that would underestimate the true number. But it’s a reasonable way to get a handle on something that at first seems impossibly elusive.
The tool is up on CDC’s website at http://www.cdc.gov/h1n1flu/tools