How to decide when to close a school because of swine flu

This is about a paper published Friday. The post is long because the paper provides practical advice on a problem of importance, the issue of school closures. The advice is based on data and could be implemented at the level of a school district or even a single school without requiring a lot of money or effort. It's not cost free, but could probably be done with existing personnel and resources. The fact that it would be grounded in empirical evidence makes a decision to keep a school open or to close it much easier to defend. It would not be arbitrary or a guess. Improvements in the technology of rapid flu testing would make it even better, but we know the data and policies can be implemented on a routine basis because they are being implemented right now -- in Japan. But they aren't doing anything in Japan we couldn't do here. The method should be in the toolkit of every school system. And although the authors don't mention it, it seems clear that implemented widely it would be an extremely valuable additional component to the influenza surveillance system. The length of the post is due to my interest in spelling out the details of this fairly short, but readable paper, so that most parents or school officials can follow all the technical details in the original if they would like to do something like what is being suggested and be able to defend it to others.

A characteristic of a pandemic virus strain, whatever its virulence, is that it tends to hit younger age groups than the seasonal flu. School age children and adults under 60 are now experiencing probabilities of coming down with flu already by the first week that we associate with what we see by the end of the flu season (see this post for more background). This means that school age children and their teachers are in the cross hairs of this virus. Since social distancing is one of the major tools in our armamentarium, this immediately brings up the question of school closures. In the early days of the pandemic CDC first advised school closures of a week if a case appeared in the school population, then two weeks, and then as the picture became clearer, they backed off and suggested a school needn't be closed at all unless absenteeism was so great that instruction was affected. That left it up to local school districts or public health authorities to come up with their own criteria for when to close, with little solid evidence to go on as to what was the best strategy or even if there was a good strategy. The paper we are discussing is an evidence-based and easy to apply criterion for school closures. So let's get to it.

The paper by Sasaki et al. is entitled "Evidenced-based tool for triggering school closures during influenza outbreaks, Japan" and you can get a .pdf here. The researchers are from Niigata University Graduate School of Medical and Dental Sciences in Japan and the Harvard and Boston University Schools of Public Health in Boston. Boston is fast becoming the leading locus of surveillance science and Japan has a system of elementary school daily influenza-related absentee surveillance. The combination of resources and talents has borne real fruit.

Here's the set-up:

Current public health policy prevents influenza-infected children from attending school until 2 days after fever has disappeared. An illness requires 2 physician visits: 1 for the initial diagnosis and 1 to obtain written permission from the treating physician to return to school. Diagnoses are usually made by using a rapid antigen test and patients are treated with the antiviral drugs, oseltamivir [trade name Tamiflu] or zanamivir [trade name Relenza]. (Sasaki et al., Emerging Infectious Diseases)

Using the school absentee surveillance data for 54 elementary schools in Joetsu City, Niigata Prefecture, Japan during four flu seasons (2005 - 2008) the researchers tested various combinations of daily absenteeism thresholds and how long that threshold was maintained for the ability to predict an outbreak of influenza in the school within 7 days. To explain this we first have to deal with two issues. The first is how they define a combination of absenteeism threshold and duration. And the second is to explain how they defined an "outbreak." Let's take the absenteeism/duration question first.

Every school day data are generated on influenza absenteeism in these Japanese schools. That number could be 1% or 5% or 8%, etc. Sasaki et al. asked if there was a threshold of absenteeism that predicted whether an outbreak would happen within 7 days of the threshold being reached (we'll define an outbreak below). But absenteeism data can be noisy (bounce around a bit), so they considered three scenarios:

  1. a single-day scenario, in which daily influenza-related absentee rates are observed for the first time above a given threshold for 1 day;
  2. a double-day scenario, in which rates reached a given threshold for the first time for 2 consecutive days; the rate for the second day was the same or higher than for the first day;
  3. a triple-day scenario, in which rates reached a given threshold for the first time for 3 consecutive days; rates for the second and third days were the same or higher than the rate for the first day. The double-day and triple-day scenarios did not include weekends.

A threshold/duration combination might be something like, "Absenteeism exceeded 3% or more for 2 days" (scenario 2 for a 3% threshold).

We'll show you a diagram (from the Technical Appendix of the paper) of how this works in a moment, but first we have to discuss what the authors meant by an outbreak. Since some absenteeism is "normal" in any school, they chose as a conservative definition of an outbreak the cases where the absenteeism was really high during flu season. "Really high" in this case meant in the upper 5% of all absenteeism days. Here's how they determined that. They made a bar graph (a histogram) of the number of days during the four flu seasons that had 1% absenteeism, 2% absenteeism, 3% absenteeism, etc., and then found the point where only 5% of the total days had absenteeism that bad (the upper 5%). Here's a figure from the Technical Appendix to the paper showing this:

i-b1aa8ed98b53aa0d008ab7827df591f7-outbreak definition.jpg

Using this method, they determined that 95% of all school days during the four flu seasons in this school district (54 schools and an average school size of 221 pupils) had 10% or less absenteeism. Only a small proportion (5%) of days during these four flu seasons had days where more than 10% of the children were out sick with flu as determined by the physician diagnoses as prescribed by the policy. These are the events (greater than 10% absenteeism in that school) that were considered a school "outbreak." The strategy then was to see if there was any particular combination of absenteeism threshold and scenario that was a good predictor for whether an outbreak event occurred within 7 days of the combination giving a "signal." To make this clearer, here's a diagram from the Technical Appendix showing an example for the three scenarios in an Example school ("School 1"):

i-0641a78c5d623ce87f0522f27d2cf3b0-Algorithm.jpg

Excerpt from the caption to the figure:

Each scenario was evaluated at 9 different absentee threshold points: 1%, 2% 9%. The example illustrated above shows how we evaluated the algorithm at 1 school during 1 influenza season under 3 arbitrarily chosen scenario-threshold combinations. A) For the single-day scenario evaluated at the 2% threshold, we calculated the date that absenteeism due to confirmed influenza reached at least 2% and noted whether the outbreak threshold of 10% was reached in the following 7 days. B) For the double-day scenario evaluated at the 3% threshold level, we calculated the date that absenteeism due to confirmed influenza reached at least 3% and was sustained at >3% for at least 2 consecutive days (excluding weekends), and then noted whether the outbreak threshold of 10% was reached within the 7 days after the first day. C) For the triple-day scenario evaluated at the 2% threshold level, we calculated the date that absenteeism due to confirmed influenza reached at least 2% and was sustained at >2% for at least 3 consecutive days (excluding weekends), and then noted whether the outbreak threshold of 10% was reached in the 7 days after the first day.

In other words, if you set the threshold at (for example) 3% or more absenteeism for 3 days you would be predicting the school would have an "outbreak" (absenteeism would climb above 10%) within 7 days after the first day of the three day exceedance. That might or might not be a good predictor. The technical problem for this paper was to determine the optimal threshold/scenario combination so that you'd get the prediction right most of the time an outbreak happened and not call too many false alarms (predict an outbreak within 7 days that didn't happen).

What the authors did at this point to do this is a little harder to explain in simple terms, although worth explaining. It is very conventional and accepted methodology that was first developed in the Second World War to evaluate radar technology. In that instance the problem was determining a sensitivity threshold to predict whether a blip on the screen was real or noise. That corresponded to turning the brightness knob on the radar screen. If the brightness were too low then real enemy bombers would be missed. If it was too high, extraneous blips will look like enemy bombers and fighters would be sent up unnecessarily. The method's name, Receiver Operating Characteristic (ROC) comes from that history, although it has since been used for similar problems involving medical diagnostic instrumentation (where to set the sensitivity level when you are looking for disease) and screening (where do you set the optimal threshold to get the most true diagnoses while minimizing the false positives). For any threshold setting and particular method there is a trade-off between the false positives and false negatives. You can "turn the knob" in ways that will give you various combinations of these two undesirable outcomes, and you want the one that is optimal in some sense you have to define. In this case the authors are looking for a combination of threshold level (e.g., 3% absenteeism) and scenario (for example 3% for two days in a row) that will best predict an outbreak within 7 days but not over predict one.

The ROC curve is Figure 2 of the paper. Here it is:

i-2e56d3049c0cda279e83586711f1e6e3-ROC curve.jpg

Here's how to read this diagram. You see three curves on it. Each one is one of the scenarios (check the legend to see which color is which scenario). The vertical axis is labeled sensitivity and goes from zero to one (read this as percent true positives). The other axis is mislabeled specificity. It should really read (1 - specificity) and can be read as percent false positives. Thus the ROC curve is plotting true positives versus false positives. That makes the diagram a little hard to wrap your head around, but what makes it even harder if you aren't familiar with ROC curves is that each of the three curves also has a scale (unmarked) that runs along the length of the curve, starting at 9% at the lower left and running through to 1% by the time you get to the upper right. You have to imagine the numbers 9 to 1 written along each curve, with the spacing not being regular nor the same on each of the curves. Once you see that, you will understand that each point along a curve now represents an absenteeism threshold and the scenario given by which of the three curves it's on.

The curves you see represent the predictive ability for choosing a threshold derived from actual data for the three scenarios (the three curves) from absenteeism data from all four flu seasons. If you decided you could tolerate a 40% false positive rate (horizontal axis) for your prediction, then send a vertical up from .40 and where it intersects each of the three scenarios (about 86% for scenario 1 and 90% plus for scenarios 2 and 3) you would get the true positive rate. Unfortunately the way the ROC curve is (not) labeled you can't see what threshold value that represents on each of the three curves, since they aren't marked, but you would have that information from your calculations and could have put them on the graph (the trouble with doing so is that it makes the graphic very cluttered looking, but it still would have been better to at least place tick marks along each curve). To take an example, if the intersection of the vertical from .40 (i.e., 40% false positives) intersected scenario 2 at the 3% mark along the red curve's length and the true positive rate were at 94% (horizontal level of the intersection point as measured on the vertical axis), then this would say that if you waited until the absenteeism rate were 3% for at least two consecutive days (scenario 2) and you used that to predict there would be an outbreak (absenteeism climbing above 10% within 7 days), you'd capture 94% of all outbreaks that really occur (true positive rate) but you'd also call an outbreak 40% of the time when one didn't occur (false positive rate).

Now each curve represents 9 thresholds and there are three curves so there are 27 pairs of true and false positive rates represented on this diagram. Which one is "optimal"? That's a very tricky question. Let's ask a slightly easier question first: which scenario is best? A perfect scenario would be one that climbed straight up the left-hand axis to the top left corner and then shot across the top to the top right corner (none of the empirically derived curves in the figure so that, of course). On this hypothetical perfect ROC curve, the optimal threshold (sometimes called a cutpoint) would be the place on the scale along the curve where it turned the corner. That's the place where everything before it was a true positive and there were no false positives, and everything after it (along the top of the diagram) only adds false positives because you've already got 100% of the true positives. You wouldn't want to put the threshold anywhere before the top of the curve (which corresponds with the vertical axis) because then, while you wouldn't have any false positives, you wouldn't have captured all the true positive either. On the other hand, if you put if anywhere along the horizontal part of the perfect curve other than the top left corner, you'd be adding false positives with no benefit to the true positives. So the corner is the optimal placement for this hypothetical perfect ROC curve. It corresponds to some threshold value that isn't marked along the length of the curve (remember the individual thresholds like 9% and 6% correspond to positions along the length of the curve but they aren't regularly spaced so without markings or tick marks you don't know where they are on the figure in the paper). None of the three curves are of this "perfect" variety, the scenario 2 and 3 curves are closer to it than the scenario 1 curve (which is the lowest in the figure). A rough measure of how close to the perfect curve you get is by measuring the Area Under the Curve (AUC), and you will see this measure mentioned in the paper. It's not a great measure, for technical reasons, but you can see clearly in this case that scenarios 2 and 3 are better than scenario 1.

The much trickier question is where along the length of any of the curves is the "optimal" point. For the perfect curve there was one obvious choice: the corner. For these curves it's not obvious. The authors use what appears to be an objective measure, called the Youden Index, which is the true positives minus the false positives. Since each point along the curve gives both a true positive (the vertical axis) and false positive (the horizontal axis) reading, you can calculate a Youden Index for each. The highest Youden Index would be 100% true positives and 0% false positives, for a Youden Index of 1.0. The worse would be where the true and false positives would be equal (Youden Index of 0.0; we needn't worry about the minor adjustment of a negative Youden Index here because all of the curves have positive indices over their whole length). The authors chose the point along each curve that had the highest Youden Index:

For the single-day scenario, the optimal threshold was 5%, with a sensitivity of 0.77 and specificity of 0.73. For the double-day scenario, the optimal threshold was 4%, with a sensitivity of 0.84 and specificity of 0.77. For the triple-day scenario, the optimal threshold was 3%, with a sensitivity of 0.90 and specificity of 0.72. [NB: This may be a bit confusing because specificity is (1 - false positive rate); thus you want both high sensitivities and high specificities. A specificity 0.72 in scenario 3 is a false positive rate of 28%]

This sounds very objective, but it hides a nasty complication. Calculating the Youden Index in this way makes in implicit assumption about what we call the loss function. There are two kinds of mistakes you can make when you are trying to predict whether an outbreak will happen within 7 days. The first is that you will there will be one and one doesn't occur. You may have closed the school when it didn't need to be closed. The second is the reverse. You don't close the school and an outbreak happens. Each of these mistakes has a cost to it and they can be difficult to compare. In one case the cost is lost instruction and work time, etc., from needlessly closing a school. The other is the cost of sickness from having disease transmission occur in the school and from the school to the community. The Youden Index as used in this paper assumes those costs are the same. If you had some estimate ahead of time of the relative costs, you could incorporate the different weightings into the Youden Index.

There are a few other limitations to the study, two of which the authors mention. One is that there was no information on either the vaccination status or treatment with antivirals of any of the absentees. To see how this might affect things imagine two schools, one with nobody vaccinated and one with everyone vaccinated with a vaccine that is 97% effective. If the first school had a 3% absenteeism rate from flu it could be used as a threshold for closure because 97% of the remaining students are still susceptible to be part of an outbreak. But the second school could have a maximum of 3% absenteeism. All other students are immune, so the school wouldn't have to be closed, even though the absenteeism rates are identical. Moreover the ROC curves in this study are calculated from a single school district in one prefecture in Japan. That's a different system. In different localities the same method could be used but it might produce different results. The paper is a recipe how this could be done and until it accomplished the Japanese data provide a rough guide to what might be reasonable until we have similar data in the US or other places that could use such a system (which is almost everywhere).

As valuable as it is, the paper is quite short and would have benefited from more details. I'm guessing the authors would have liked to provide them but the EID editors didn't allow a longer manuscript. That was an error in this case. One of the things I am unclear about is exactly how and when influenza-associated absenteeism was determined in this dataset. If you want to do this on a real-time basis, you can't wait for a doctor to confirm the diagnosis with a rapid antigen test, because even if this can be done the same day as the first day of absence for an influenza-like illness (ILI) it would take at least a day or two to get into the surveillance system. Given the poor health care coverage in the US, the problem is worse. So the most feasible method would be to monitor daily absences with a phone call from the school asking the parents the reason for the absence, probably with a check list of questions (does you child have fever or seem warm? is he/she coughing? etc.). In this way the Japanese system could be replicated for a single school or even a single grade level or class. It would require data collection or absentee data review, however, but if no past data were available the Japanese data could be used as an initial default (it is at least based on some concrete data about influenza).

There is another set of questions, though, that the paper doesn't address that need to be thought through. As noted at the outset, CDC went through three phases in their recommendations for school closure. Initially it was very conservative because no one knew how transmissible or how virulent the virus was. If a case appeared in the school, close it. This was cautious but prudent advice in a stage where we didn't know what we were dealing with, but as things became clearer CDC moved in the opposite direction and essentially said keeping schools open was the default unless the operation of the school was being compromised by extensive absenteeism. The criterion used in this paper, predicting (and hence trying to avoid) an outbreak defined as the upper 5% of what one usually sees for absenteeism during a flu season is not related to CDC's most recent criterion. It may be that at 10% absenteeism (or whatever the upper 5% of absenteeism is for a particular school or locality) bears little relation to school operations. Maybe those operations are hampered at 6% absenteeism or 16% absenteeism. Choosing 10% absenteeism because it represents an outlier figure for absentee percentages during flu season doesn't make it less arbitrary. If there were some more objective measure of when school operations were being compromised that would be a better choice as far as corresponding to current CDC guidelines.

There is another issue that needs some thinking through. One of the things the authors of this paper did was check to see if school absences correlated with what was going on in other components of the Japanese surveillance system, specifically reports from sentinel providers of ILI in the community. It turns out the correlation was good, which means the school absences were tracking or mirroring what was going on in the community. That means that it could be used as another early warning system for the community but it also means that it isn't telling us more about transmission in the schools than we are finding out from other surveillance systems about transmission in the community. However assuming that schools are one of the link points in community transmission (parent gives to child who passes it on to a classmate who passes it on to their family, thus providing a transmission route between two otherwise unrelated families), then closing a school in a timely (proactive) fashion is one element in interrupting transmission and thus flattening out the epidemic curve, the main objective of social distancing. But the effect of this calls for some additional modeling to see the impact.

This has been a very long post, but it is potentially a very practical one. Parents and school administrators can use this method to frame their thinking about school closure, explore the possibility of collecting or reviewing the kind of data that might allow something similar in their schools or districts, and have some concrete evidence to inform and defend decisions they will face shortly. This can be done without needing the cooperation of other agencies like health departments and would not require great expenditures. It could even be done with the help of volunteers.

Here are the links to the paper, once again:

Sasaki A, Gatewood A, Ozonoff A, Suzuki H, Tanabe N, Seki N, et al. Evidenced-based tool for triggering school closures during influenza outbreaks, Japan. Emerg Infect Dis. 2009 Nov; [Epub ahead of print]

Technical Appendix to the paper

More like this

I would be much more curious about what's gained and what's lost. If 85% of the kids are still coming to school, that still keeps 85% of those households running normally. But if you close the school to prevent more kids from getting sick, every parent with a kid in school is going to have to figure something out, and the healthy kids aren't getting to go to school!

How much good is actually being done by closing the school, and does it offset how much it costs people to have their kids stay home from school? How many extra sick kids is that worth?

Ethan: Yes, that's the problem of incorporating the loss function. But it's not a matter of curiosity. There are both empirical issues (what is the ecnomic cost, for example) and value issues. But before you apply a loss function (or a utility function) you have to have something to apply it to. This paper gives a practical way to bootstrap that process. Without something like this, you are reduced to curiosity.

Revere,To follow up on Ethan's point, not only is there an economic cost for parents to stay home (or use daycare, opening another can of worms) when the school is closed, but in many instances, school is the main (only?) source of meals for a significant number of the kids.Further, while the average absentee rate in Japan may be in the low single digits, in some places here it is much higher. Typically, school districts get their state funding based on attendance, and it is when their reimbursement rate drops below the cost of staying open that they close. (Provided that they have enough adults to at least babysit the kids.) My observation is that level is around 25-30% absent. That probably works for staff, too. I suspect, with no empirical evidence whatsoever, that administration can manage to cover 20-30% absences in staff.At least the administration can tell you what their attendance averages are historically. What they can't tell you is why the kids were absent. In my part of the world, the first day of deer season results in lots of absences, all ages, both sexes (faculty included). At least that is known historically and could be factored in when making the determination.

I'm sure most parents would rather have the inconvenience of staying home with a healthy child then with an ill/hospitalized child.

MoM, Ethan: I know it's a long post, but the issues you raised in the para. fifth from the end (not counting the cites. It starts "This sounds very objective . . . " and is just under the last pull quote. The actual paper also has a cite to the economic costs of keeping children home (, ref. 14, Sadique, haven't had a chance to read it yet). We've written here about the school nutrition issue, too. But in principle you start with a calclulation like this one and then you factor in the loss functions if you can make some reasonable estimates.

First off, this article shows what happens, in Japan, when certain levels of absenteeism have been reached. I think that is it, and that it would be a mistake to imply that it says anything more.

It does not demonstrate that closing the schools in Japan at any point would effect the additional disease rate from there. To reach the conclusion that school closure would stop disease spread you would have to assume that school closure is an extremely effective means of producing social distancing. A big assumption (as the article itself notes) that is likely to have a different relationship with reality depending on cultural norms and the particular virulence of the particular bug, I think. The variations are not only the cost of a parent staying home (and food, etc.) but whether or not a parent indeed does stay home, or if children kept out of school are still exposing each other. Those behaviors are likely to vary greatly just within different American communities, let alone between American groups and behaviors within Japan.

So while the study is interesting, its utility seems somewhat limited.

A nearby county school system had an easy time calling the decision when a 14-year-old student died from swine flu. Another is in critical condition. I don't know how many kids are sick, but they closed for a week.

Don S brings up an excellent point. While I don't have data to back this up, it would seem that school closure would have different effects on social distancing in different districts. For example, a school closing in a rural, low-population-density district would seem to have a much greater chance of achieving effective social distancing than a school closing in an inner city.

And it's not just a simple matter of population density - socio-economic factors come into play, too. A school district that serves a population consisting of either a) family units with one working parent and one stay-at-home parent, or b) family units with working parents that can afford to take extended leave, would seem to have a better chance of achieving effective social distancing from a closure than a school district that serves single parent family units, low-income family units, etc. Kids left at home alone in these types of units may choose to congregate in other settings - thus negating any social distancing benefit.

In other words, just because you close the school doesn't necessarily mean the kids are going to stay at home.

Don: I think you misunderstand the point. It is to provide a way to think about what to do. As I point out in an admittedly long post, the question of what threshold to use is still open (is it outliers, as here, or should it be something to do with operation of the school, or should it be some "value" trigger like a death) and it leaves open the loss function. But it is also a method that can be tailored to individual schools or even grade levels or classes. Because it is based on an indicator or surrogate for what you are interested in it that is available it can be done daily (in real time) and is meant to be predictive, not reactive. So if you think 10% or 30% or whatever is the right level for an outbreak, you can try to do something prior to that happening. If done on a widespread basis it is also a prediction system for a community outbreak, regardless of what you do when the threshold is met. So, yes, it's interesting, but it's also more than that. It's practical, implementable, flexible and may have more surveillance uses than indicated in the paper.

Oh I have no doubt that I do not completely understand this study. (Honestly I am still confused by those graphs even after several reads.) But I am not so sure I misunderstand the limited point I am addressing. My point is that this may indeed be able to used to predict a certain future percent infected within a school without intervention. I just question how much we understand the efficacy of our intervention options and the analysis offered seems to implicitly accept that we do. Without that understanding the cost/benefit analysis is random guess work and the ability to predict a future 20% or 30% infection rate does not guide us much. The comment within the paper: "School closure could be 1 effective method of social distancing, although evidence supporting its effectiveness is incomplete. Some studies suggest that though child-to-child transmission might decrease, transmission might increase in other age groups" is worthy of greater note is all.

In New York City public schools, the contractual coverage obligation is one extra per day. Going from 5 periods to 6 yields an extra twenty percent of class periods covered by the attending staff. If you have a staff of 100 and (typical) staff absentee rates of 5% (this is New York City), that means you can cover absences from within without resorting to subs unless your staff attendance rate drops below 83%. It's difficult to combine under-attended classes for purposes of maximizing coverage. Each sub covers one extra missing teacher, and each large high school has between six and ten full time subs on call. So you can drop another few percent before emergency out-of-contract coverages start getting handed out - that would be right around 70 - 72 percent. That's if everybody is optimally cross-scheduled. If you drop lower than that, you start having to stick multiple uncovered classes in large spaces like the auditorium, gymnasium, lunchroom, and library, which buys another five to ten percent - because of regulatory and statutory limits on adult/student supervision ratios.

Once a school drops below %60 staff attendance, the building becomes ungovernable. You can't cover the classes, and if you combine you're violating state law on teaching time. The lunchroom doesn't run if non-instructional staff is absent to the same ratio as instructional staff. This problem is magnified in smaller schools which lack the teacher power to make up for absent staff in flexibility and bulk.

By The New York C… (not verified) on 05 Oct 2009 #permalink

It looks like schools in CT are opting to start watching the situation when they hit about 15% absent, and are closing between 30 and 40%. The closures are ranging from 2 days to 5 (including a weekend) so far. The concerns don't seem to about spreading the flu but rather with operations.

I would say to the parent who wants to keep the schools open for the healthy kids...that's great...as long as your child isn't one of the ones that died from this outbreak and decided to sue the school district for not closing earlier. Blaming the schools is not the answer. If you want something done in your school...GET INVOLVED, proactively with planning.