David Kane has asked me to post his argument that
Roberts et al. (2004) claim that the risk of death increased by 2.5-fold (95% CI 1.6-4.2) in Iraq after the US-led invasion. I provide evidence that, given the other data presented in their paper, this confidence interval must be wrong. Comments and corrections are welcome.
Let me kick things off with my comment:
His argument turns on the CI for the post-invasion mortality rate (including Falluja) of 1.4-23.2. I would suggest that he has proven that this CI is wrong (as it obviously is, since there is no way the mortality rate could be below 4) rather than that the risk ratio CI is wrong.

Comments
I thought that the CI for the post invasion mortality rate with Fallujah included was bizarre and said so in the previous round of posts. You have 32 clusters and your post invasion mortality CI is 5.6 to 10.2. Throw in Fallujah with its huge excess death toll and your post-invasion CI is 1.4-23.2? So one cluster with a huge death toll increases the probability of a lower mortality rate? It's the relative risk CI's that make intuitive sense. 1.1 to 2.3 without Fallujah and 1.6 to 4.2 with it. Robert's graphs were like the relative risk CI's--I mean, they behaved in the intuitively expected way. I thought of emailing the Lancet1 authors to ask if there's an error in their CI's, but they probably get enough email from nuts without me adding to their spam list.
Posted by: Donald Johnson | July 24, 2007 7:06 PM
David Kane seems to be asking how the variance of a ratio can be small, when both the variance of the numerator and the denominator are big. Say we have r = X(post)/X(pre). Can r be estimated precisely, even if X(pre) and X(post) are estimated with considerable error?
Kane assumes that the answer is no, but this assumption is wrong. When X(pre) and X(post) are highly correlated, the ratio of their means can be estimated precisely, even if the individual means can't be.
Posted by: Ragout | July 24, 2007 10:34 PM
1) Thanks to Tim for posting this and for providing such a useful forum for Lancet discussion. If it were not for Deltoid, I would never have gotten involved in this dispute.
2) Tim is wrong to suspect that the confidence interval for post-war mortality is incorrect. As my paper highlights, Falluja is such an outlier that the range given in the Lancet is almost certainly correct. Moreover, I have sent a copy of the paper to the authors and Les Roberts insists that they stand by all their results. If Tim has reason to believe that the post-war mortality confidence interval is wrong, he should mention it to Roberts.
3) Ragout is confused on three issues. First, as the paper demonstrates, the correlation between estimates of pre-war and post-war mortality is irrelevant (assuming normal distribution). The simulation proves this. You can assume a correlation of anything from 1 to -1 and you get, more or less, the same answer. Second, even if it were true that a high positive correlation affected the results, that would not matter in this case since there is no evidence of this in the data. If anything (although this is somewhat a side point), there is a slight negative correlation (I think) between pre/post mortality estimates among clusters. Third, there is no point in talking about whether or not we can estimate the relative risk "precisely." What is the definition of "precisely?" Maybe the correct RR confidence interval (probably something like 0.6 to 5) is "precise." Maybe it isn't.
The point is that, I think, it is a mathematical fact that, if the other information in the paper is correct, the central result (RR 2.5 with confidence interval 1.6 -- 4.2) must be wrong.
2 + 2 != 5, however much the defenders of Roberts et al (2004) might wish it to be the case.
Posted by: David Kane | July 24, 2007 11:07 PM
Kane,
It seems that you are mistaken regarding some basic points of statistics. Before we clarify those points it would be hard to get into the specifics of your argument.
The original analysis was done in a frequentist framework. When carrying out such an analysis, there is no distribution of an unknown parameter (such as CMRpost) to talk about, since the parameter is not assumed to be a random variable.
Therefore, statements such as P(CMRpost < 3.7) = this-or-that, which are liberally strewn about you paper, are completely meaningless.
Posted by: Sortition | July 24, 2007 11:44 PM
1) Donald Johnson is a knowledgeable writer on this topic, so I don't want to be rude. But, really, look up the definition of variance. A single outlier data point will significantly increase the estimated variance and, therefore, the confidence intervals. That is simply the formula.
2) As much fun as it is to argue with anonymoids like Sortition, I don't see his point. Although I prefer a Bayesian framework (and goodness knows that virtually every public appearance by any of the Lancet authors features a Bayesian version of their results), the exact same problem arises if you are a Frequentist. If the confidence intervals for pre- and post-war mortality given in the paper are correct, then the confidence interval for the relative risk must be wrong. It does not matter if you are Bayesian or Frequentist. The math is the same. 2 + 2 != 5
Posted by: David Kane | July 24, 2007 11:57 PM
David,
Your simulations prove nothing because they assume the data are normally distributed. Given the inclusion of Falluja, the data are wildly nonnormal (nor are they unimodal).
The annual mortality rate per 1000 in post-war Fallujah was 100-200 above the Iraq-wide mean of 12.6. If the data were normal, mortality rates of -100 and more should be equally likely. But of course, negative mortality rates are not possible, and hugely negative mortality rates are a serious problem with your method.
To say this another way, if the data were really distributed normally with mean 12.3 and SD (12.3-1.4)/1.96, then the odds of observing a Fallujah in a sample of 33 clusters should be infinitesimal. Since we do observe Fallujah, and we think that this is a typical sample (as you argue), it follows that the assumption of normality is wildly wrong.
Posted by: Ragout | July 25, 2007 1:06 AM
It is hard to take seriously a paper that purports to prove "mathematical facts" and yet is based on arguments that are mathematically false.
Your "it doesn't matter much" is not a credible response. If it doesn't matter, let's see you make the correct argument and then we can address it.
Posted by: Sortition | July 25, 2007 1:09 AM
1) Sortition, I don't know how to say this any more clearly: The exact same proof applies if you are a Frequentist. Since the authors themselves never describe their results this way (see Burnham's quote), I don't see much point in using the mind game of repeated samples. If there is a particular part of the proof that you don't understand, be specific. I am eager for feedback.
2) Ragout. It is not I who assume that the confidence intervals for pre- and post-war mortality are normally distributed. It is the authors. Now, they may be right to make this assumption. They may be wrong. But it is their assumption or, rather, the assumption of the statistical software that they use.
3) But even if you wanted to use the data from Falluja to reject a normal distribution (a reasonable thing to do), any fatter tailed distribution (say, t), makes the problem worse not better. The intuition should be obvious. There fatter the tails of the distribution for post-war mortality, the more likely it is that post-war mortality is below 3.7.
4) None of this is to say that there isn't a way to change the model to get the answers that the authors want to get. If you assume that the distribution of post-war mortality is right-skewed (i.e., that only big increases in death are possible, not big decreases), then you ought to be able to somehow get the 1.6 -- 4.2 confidence interval for the relative risk. But that is not the model that the authors used. (See their paper for the exact description.) Therefore, they must publish a correction, or withdraw the paper.
5) Note that I would not spend too much time on that exercise. If you try to create a new model, you would need the raw data to estimate it. The authors now report that the data is no longer "available." I am trying to get clarification on just what they mean by that, but my sense is that the data is gone. (The individual level data, not the aggregate data that Tim kindly posted.)
6) By the way, that thread from Tim's posting of the data brings back lots of fond memories, eh? I miss Seixon! And hope that dsquared chimes in on this topic. But note that BrendanH was the first (?) to suggest that the relative risk confidence interval must be wrong if Falluja is included. See his full results here. I am not sure how to translate between what he has done and what I did, but it seems like we agree that, if you include Falluja, you can't reject the null hypothesis that mortality in Iraq is unchanged.
Posted by: David Kane | July 25, 2007 7:58 AM
David, it looks from a cursory scan of your paper that you're assuming that the ratio of two normally distributed variables is itself normally distributed (or at least, this seems to be the implication of your multiplying the confidence interval by the risk ratio). This is not the case for anything other than independent variables, and even small departures from independence can mean that the true distribution of the ratio is very very different from normal (this is why the weak instruments problem in IV estimation is so serious). I'm busy today so I can't be sure but could you confirm whether I'm right in identifying this assumption?
Posted by: dsquared | July 25, 2007 8:22 AM
Erratum to the above after checking my own posts on IV estimation - the ratio of two normal distributions is Cauchy distributed, and there aren't any non-trivial cases where the ratio is exactly normal.
Posted by: dsquared | July 25, 2007 8:25 AM
David says he misses comments from Seixon. For me that's like missing a bad hangover or worse still, a serious case of Montezuma's revenge.
Given all of your hand waving, David, I just wonder how many Iraqi civilians you think have died in what was an is an illegal, catastrophic war conducted on the basis of lies and deceit. Moreover, since the U.S. State Department called the mineral resources of the region "One of the greatest material prizes in history" and a source of "Stupendous strategic power" (in 1950), and senior planners like Kennan and Brezinski said that any country controlling this 'material prize' had "Veto power over the global economy", I'd like to know your rationale for the efforts you appear to be making to legitimize the invasion. How many corpses is enough in defense of naked aggression?
Posted by: Jeff Harvey | July 25, 2007 8:26 AM
Just to restate what others have said (basically, so if it comes to vote counting, I've expressed a vote): if we start with a population of Iraqi districts post-invasion, we can estimate death rates (although almost noby seems interested in doing so). We can, if we choose, say they are all in the same population (located inside Iraq), and note there is a huge variation in death rates. Or, we can note that data correlates with political fact, and that a single district, involved in high-intensity war has a higher death rate. If we restrict our population to districts that are not involved in high-intensity war, we get a more homogeneous population, covering most of the country at that point, and have sampled one site which is politically and statistically distinct. I come across the same work in my field. I think it's much more useful to have estimates of a specified population (say, weights of adults) than to have data that's not so easily categorized (weights of humans, including fetuses, 3 week-olds, microcars filled with midgets, etc.). Kane's argument is made. Personally, I think it's a lousy one, with a poorly specified universe.
Posted by: stewart | July 25, 2007 8:27 AM
also to note the standard textbook caveats about extrapolation of statistical estimates a long way outside the range of data. To use the Fallujah datapoint as evidence for the existence of significant probability mass below RR=1 is to postulate the existence of an unobserved "anti-Fallujah", where the crude mortality rate fell as much as it rose in Fallujah. The fact that CMR is bounded below from zero (unless the Rapture occurred during the sample period, which it didn't) means that an "anti-Fallujah" is actually impossible. To get the same effect on the confidence interval as an "anti-Fallujah" without resurrecting the dead would require lots and lots of unobserved clusters where the mortality rate fell substantially, which raises the question of why weren't more of them sampled?
Posted by: dsquared | July 25, 2007 8:42 AM
Thanks to dsquared for his comments. High quality discussion like this is precisely why I asked Tim to post the paper at Deltoid. To substance:
1) I am not assuming anything about the distribution of the ratio of two normal variables. Although the paper has details, the basic trick is that, if pre- and post-invasion CMR is normally distributed, then the difference between the two is normally distributed. I show that the distribution of this difference overlaps significantly with zero. If that is true, then the lower bound of the relative risk confidence interval is too high.
2) Yet dsquared's comment raises another possible approach. Assuming zero correlation between the estimates, we can easily simulate the RR risk using the given CMRs. For R users, this would be:
Note that the mean is what the paper gives us. The upper bound is high, but not by much. The inconsistency is, as in my paper, with the lower bound.
Now, obviously, this approach is not how the original paper estimates the RR. But it is further circumstantial evidence that, if the CMRs are correct (and I am pretty sure that they are), the RR risk must be wrong.
3) As to Jeff Harvey's charming question, the more that I study this topics, the more convinced I am too trust Jon Pedersen's judgment: 100,000 excess violent deaths.
Posted by: David Kane | July 25, 2007 9:41 AM
David, you're just window-dressing a problem of post-hoc covariate analysis. There are clearly two types of town in Iraq - those being blown to shit by Americans, and those being only slightly blown to shit by Americans. This was only discovered after the fact in the Lancet study, so a post-hoc analysis of deaths by high- and low-risk areas was necessary. Unfortunately, there was only one observation of a high risk area. Had the cluster survey been 10 times the size, maybe there would have been 10 clusters from "towns being blown to shit by Americans" and we'd be able to do a post-hoc high- vs. low-risk analysis.
Given we have only one such cluster, i.e. insufficient data, the natural thing to do is to exclude the high-risk set and do only a very qualitative discussion of how it might affect the data.
This happens all the time in Epidemiology. For example you do a study of sexual behaviour and HIV incidence in gay men, but discover at the end of the study that 5 of your 100 subjects are also injecting drug users, and they all got HIV. Obviously you know these people are at higher risk of HIV, but you don't have enough data to compare them with the rest sensibly. Had you 10 times as many of these people you could do a sensible covariate analysis but you can't, so you just put them aside so as not to bias your main findings. I don't really see the difference.
Also David, consider this thought experiment: randomized cluster survey of civilian mortality in England in 1940. You accidentally get a single borough in London (i.e. focus of the blitz), and are forced to conclude no change in mortality because your sample variance is too high. Do you really believe that in 1940 in England such a study would be correct? Of course, had you sampled 10 times as many clusters you would have got a borough from Plymouth, Yarmouth, ... and you would be able to conclude that in the countryside mortality was unchanged, but in the Southern cities it was much higher. Can you see any difference here?
Posted by: SG | July 25, 2007 10:00 AM
David, thanks for your answer. In other words, mass carnage and slaughter. A vast crime against humanity, for which the occupiers are obliged to pay reparations. Moreover, I suppose this also means that you support the idea of convening war crimes trials for the various leaders and their cronies involved in this debacle e.g. Bush, Cheney, Rumsfeld, Blair, Straw, Hoon, Berlisconi, Aznar etc.
Posted by: Jeff Harvey | July 25, 2007 10:20 AM
Of course, had you sampled 10 times as many clusters you would have got a borough from Plymouth, Yarmouth, ... and you would be able to conclude that in the countryside mortality was unchanged, but in the Southern cities it was much higher. Can you see any difference here?
I disagree.
Had you taken a larger sample in the area of focus, you may have been blown to bits and there'd be no number, hence it would not have happened because we wouldn't know about it.
This is a common problem, collecting data in a war zone. Quibblers should travel to Fallugia and do their own study, just like the Intrepid Auditors are traveling to the surface temperature stations to overturn the very fundament of science.
Best,
D
Posted by: Dano | July 25, 2007 10:53 AM
David, you don't explain why you think the CMR's are correct. Yeah, outliers increase variance (I'm so well read I actually knew that). Robert (who I hope shows up here) did his own analysis of the data and adding Fallujah increased the CI, but it also shifted the whole thing to the right, as intuition would suggest and the probability distribution he got for the excess deaths has several peaks, which I assume has something to do with how often the Fallujah cluster pops up in resamplings. Speaking as the ignorant layperson here, if I saw the result the L1 team got with the CMR including Fallujah, I'd suspect the software package was employing an inappropriate mathematical model. Like dsquared, it's not clear to me that finding a bombed-out place like Fallujah suggests the existence of other places where mass resurrections occurred.
SG, I tried that argument myself, in modified form. What if Bush had dropped a nuke on Fallujah, killing 200 in that cluster? Think of the variance and just imagine how much this would increase the probability that the invasion of Iraq lowered death rates. It didn't seem to strike David as a reductio ad absurdum like I hoped.
Anyway, in the interest of furthering my statistical education, can someone concoct a nice simple example we could do by hand which would show how adding a very high mortality cluster to a sample of low mortality clusters would widen the probability distribution so much that there is now a greater chance of lower death rates than before? I'd like to see this if it can be done. I hate computer statistical packages which give wildly nonintuitive results if you can't follow the details. Please type slowly, with lots of explanation, so I can follow. Merely telling me that outliers increase variance is unsatisfying. Or ignore me--I'm just trying to get free tutoring lessons.
I'll start. Suppose you have two clusters with mortality rates of 0 and 1. The average mortality rate is 0.5 and if you resample you get 1 run of 0,0, two runs of 0,1 and 1,0 and one run of 1,1. One out of the four runs gives an average mortality less than 0.5. Now suppose you add a third cluster with a mortality rate of 100. If I understand bootstrapping and how you'd use it to get average mortality rates, you resample with replacement 3 times because you have 3 clusters, average the mortality rate for the three clusters, then do it again over and over. In this case there's only 27 possible resamplings. And only 4 of the 27 give you an average mortality rate below 0.5. Only 1/27 gives you an average mortality rate of 0. The distribution of average mortality rates has shifted to the right. It's a lot wider, but it behaves the way I intuitively expect. I don't find an increased chance of getting an average mortality rate less than 0.5 just because I added an enormous outlier.
So can someone concoct a simple example that behaves in a surprising way, so that I could understand how adding a Fallujah-like outlier would make a rational person think "Huh, I guess the invasion might have lowered death rates after all."
Posted by: Donald Johnson | July 25, 2007 11:18 AM
I used "average mortality rate" in two different senses in my example above, if anyone cares. That was confusing. What I meant except in its first usage was the average mortality rate in a given sample--so if you picked 0, 1, and 1 out of the 3 clusters with mortality rates of 0, 1, and 100, that's an average mortality rate for that particular resampling is 2/3.
Sorry for the clutter--I'm looking for simple examples to clarify things in my own mind, not intending to supply confusing examples with my own idiosyncratic terminology.
Posted by: Donald Johnson | July 25, 2007 11:24 AM
Ragout's point is the correct one by the way - once you have assumed that a massively bimodal dataset has a unimodal distribution, you are never going to get a meaningful answer. The Figure 2 diagram actually has 1.34% of its probability mass represented by states in which Iraqis were being resurrected from the dead, and the idea that you can correct for this problem by simply truncating the distribution at zero is clearly wrong; the very large outlier on the positive side isn't a reason to believe in a lot of non-outlier cases on the negative side.
Hang on ...
David, what are you going on about here talking about the authors assuming normal distributions? A quick read back of Roberts et al (2004) reveals, at the bottom of p3, that
"As a check, we also used bootstrapping to obtain a non-parametric confidence interval under the assumption that the clusters were exchangeable. The confidence intervals reported are those obtained by bootstrapping. The numbers of excess deaths (attributable rates) were estimated by the same method, using linear rather than log-linear regression."
So basically your analysis is completely tangential. Roberts et al got their confidence intervals by bootstrapping from the empirical distribution of their data. The empirical distribution of the data wasn't normal and it wasn't nearly normal.
David, what's your source for your remark to Ragout above that:
It is not I who assume that the confidence intervals for pre- and post-war mortality are normally distributed. It is the authors. Now, they may be right to make this assumption. They may be wrong. But it is their assumption or, rather, the assumption of the statistical software that they use.
It seems to be flatly contradicted by the passage from p3 quoted above and I really can't find any other places in the text where they talk about a normal distribution of the noise (in fact, the terms "Normal" and "Gaussian" don't appear in the paper at all). Is this from private correspondence?
At present it seems to me that when you say:
"If you assume that the distribution of post-war mortality is right-skewed (i.e., that only big increases in death are possible, not big decreases), then you ought to be able to somehow get the 1.6 -- 4.2 confidence interval for the relative risk. But that is not the model that the authors used. (See their paper for the exact description.) Therefore, they must publish a correction, or withdraw the paper."
then you're wrong - they used a model which only had big increases, because the "model" was a resampling of the actual observations, which of course contained Fallujah but did not contain "anti-Fallujah".
Posted by: dsquared | July 25, 2007 11:32 AM
Donald - you're not confused. There is no way that you can get the sort of result you describe simply by simple bootstrapping from the empirical distribution. You need further assumptions to get there - either fitting a parametric distribution (like David's normal assumption) or by messing around with the dataset - sometimes for financial applications it can be sensible to double the size of the dataset by appending a copy with the signs changed, but you need to be very careful when you're doing this about what your underlying model is, which is usually something parametric.
Posted by: dsquared | July 25, 2007 11:38 AM
Thanks to dsquared for these comments. Initial thoughts:
1) It seems clear to me that the authors use two totally different methods for calculating confidence intervals. The passage that dsquared quotes from applies only to calculations for the relative risk. The paper is not completely clear on this point, but the full paragraph implies this.
2) It seems certain to me that the confidence intervals for the estimates of crude mortality rates assume a normal distribution. I quote the full paragraph in the paper. Besides the software clues, the key fact is that the authors provide an estimate of the design effect. This is a natural product of the usual normal distribution approach. How would you get a design effect if you were calculating the confidence intervals for CMR using a bootstrap? I don't think you can, but counter-examples are welcome.
3) Goodness knows that this discussion would be a lot more productive if the authors were to release their code. Elizabeth Johnson, the graduate student who actually did the calculations, does not respond to my e-mails or phone calls.
4) The other way that I can be certain (?) that dsquared is wrong in his inference that the authors used a bootstrap or other empirical method to calculate the confidence intervals for post-invasion CMR is that, if they had, the confidence interval would not have been symmetric. But it is. So, almost certainly, we are dealing with a normal distribution.
5) All of which does no prevent SG (or anyone else) from using a skewed distribution for modeling post-war mortality. Go ahead. But that is not what the authors did.
Further comments sought.
Posted by: David Kane | July 25, 2007 11:50 AM
How would you get a design effect if you were calculating the confidence intervals for CMR using a bootstrap
the design effect is just the ratio of between-cluster variance to within-cluster variance, isn't it?
Posted by: dsquared | July 25, 2007 12:00 PM
Kane,
You made a fundamental mistake - you are using meaningless terms, symbols and arguments (namely, all those that refer to distributions of CMRpre, CMRpost and RR).
You do not attempt to dispute that, yet you still expect your "proofs" to be accepted as valid. This is absurd. Either come up with valid proofs or abandon your position.
Posted by: Sortition | July 25, 2007 12:09 PM
Donald and Tim are concerned that the confidence intervals for the CMRs are incorrect. Without access to the full data, this is tough to know for sure, but the cluster level data is enough to provide plenty of consistent evidence on this point. R users will find my R package handy.
If anyone wants, I can paste in the whole R session, but the main points are easy. Looking at just post-war CRM, we can take simple cluster averages. This is probably not what the authors did. Clusters with more people should get more weight. But it is close enough. With Falluja, the mean is 14 and the standard deviation 32. Without Falluja, the mean is 8.2 and the standard deviation is 6.6. Those means are pretty close to what the paper reports. The key is that the standard deviation is almost 5 times bigger with Falluja than without.
We can't translate these standard deviations directly into standard errors since it depends on the clumping of the data. (Actually, given that we know the design effects, this might be possible, but I haven't done it.) But a rough guess might be to just divide the standard deviation for without Falluja by the square root of 32 (for the number of clusters). This gives 5.5. Double that, and you have a one side confidence interval of 11, just about spot on to the 10.9 reported by the paper.
Again, there is a lot of hand-waving going on and we ought to divide by a bigger number since the sample size is bigger than 32 once you consider all the people in the clusters, but, big picture, there is no reason to doubt the CMR estimates and confidence intervals which the authors present. If anyone has reasons for doubting them, please present them.
Posted by: David Kane | July 25, 2007 12:11 PM
Let me summarize again. Roberts et al estimate pre- and post- CMR using the mean, and calculate confidence intervals under the assumption that the means are approximately normally distributed. David notes that the confidence interval of the difference in these estimators includes zero, assuming normality. Then, Roberts et al estimate the relative risk, using some other method, and get a confidence interval that does not include one.
David claims that these findings are inconsistent, but this is true only in a trivial sense. Roberts et al calculated the CIs for the means under one assumption, and the CIs for the RRs under another assumption. In particular, in calculating the RRs, they modeled the log of the data, while when calculating the mean CMRs, they used the levels of the data. In addition, they took into account the correlation of the data when calculating the RRs. Finally, they reported bootstrapped SEs for the RRs, rather than assuming normality. So it is not at all surprising that CIs for the RRs and the CIs for the difference in means would give different answers.
By the way, one problem here might be that the description of RR estimator in the Roberts et al paper is totally opaque. They mention overdispersion, so I assume they used a negative binomial regression. It does seem clear, though, that they modeled log deaths, took into account the correlation over time in deaths, and reported bootstrapped SEs for the RR.
Posted by: Ragout | July 25, 2007 12:19 PM
It is perhaps the original authors' fault, not Kane's, but normality is a crazy assumption here. Re "any fatter tailed distribution (say, t), makes the problem worse not better": well, that's assuming we're talking about a symmetric distribution. But death rates spike up not down. So the issue is more one of skew than kurtosis, i.e. the right tail needs to be fatter, not both tails (indeed the left tail needs to be thinner than a normal since death rates can't be negative). What happens if you instead use, say, the Poisson distribution?
David, I see you're sort of acknowledging this in part 5 of comment 22, but I'd be interested to hear: Independent of whether the authors used it or not, do you agree that normality is a really bad assumption (especially when you include the Fallujah cluster)? And don't you agree that intuitively a more reasonable model would increase -- not decrease -- the lower confidence bound relative to a normal?
Posted by: David Kane's friend | July 25, 2007 12:58 PM
Thanks for this very helpful feedback.
1) Ragout is exactly correct when he writes:
Correct! Part of what we are arguing about above (like the normal assumption) would refute this summary, but I believe that ragout has it right. This is certainly what I mean to argue. Ragout goes on:
Correct! Although I find ragout's guess about what they did reasonable, neither he nor I (nor any of you) know the exact procedure used for estimating the RR, although it seems reasonable to assume that a bootstrap was used.
All of which raises an interesting point. You don't need to use a bootstrap to calculate an RR in this (very standard) cluster set-up. So, why did they? What would the result have looked like if they just used the usual approach, as BrendanH did? My guess is that they would get the same (not statistically significant) answers that Brendan got. Since they didn't like those answers, they went searching for a model that would produce the answers that they wanted. A bootstrap did. Once they found that, they did not bother to check that the RR results were inconsistent with the CMR results.
Of course, this is all pure speculation, but there is no reason to use a bootstrap for the RR and not for the CMR.
ragout claims that my claims are true only in a "trivial sense." Perhaps! No author can fairly judge the importance of his own work. My preferred analogy is to scales and weighing. Imagine that the Lancet authors had reported that using scale A, each of two bags of apples weighs 2 pounds. Using scale B, those same two bags of apples weigh five pounds together. I then assert that, since 2 + 2 != 5, the conclusion is wrong. ragout says, "Well, they told you that they used two different scales. Your claim is trivial." Perhaps. But a scientific paper needs some minimal amount of internal consistency. You can't assert that 2 + 2 = 5 and then blame your scales. The scales are your responsibility. Why do these scales give mathematically inconsistent answers?
Posted by: David Kane | July 25, 2007 2:06 PM
My friend asks:
No. Normality was not a "really bad" assumption. Now, anytime you are estimating something, like crude mortality, that must be non-negative, a normal assumption will be "wrong" since it allows for negative numbers. But as long as most of the mass is greater than zero (as here), it probably doesn't matter much at all. Some simulations that we did with a truncated normal did not affect our results much if at all. So, normal is fine in this case. If the CMR estimates were screwy, you can be sure that I would critique them. In fact, my sense of the literature is that this approach is totally standard.
There is nothing wrong with assuming normality when estimating CMR as long as almost all of the posterior distribution is greater than zero.
Posted by: David Kane | July 25, 2007 2:41 PM
David, there are two problems with a normality assumption. One is the mass at negative mortality as you say.
The other -- and bigger -- problem I see with normality is the Falluja outlier which is wildly improbable under a normal assumption. And this outlier doesn't come out of left field. You expect to see positive spikes in violent mortality in a war zone. (See also SG's comments about WWII above.)
So it's good to hear you did work with truncated normals to address the thin left tail, but I would expect the thick right tail to be the far bigger issue.
Posted by: David Kane's friend | July 25, 2007 2:55 PM
Couple points:
1) Note that the debate does not even turn on the normality assumption for the post-invasion CMR. Even if this is a nice skewed distribution, as long as its lower tail is at 1.4 (and it is unimodal), the last proof in the paper still holds. So, I do not need for it to be normal (although it almost certainly is). Even if the authors are using something else, as long as they are reporting the lower confidence intervals correctly, the last version of my proof (really, Mike Spagat's proof) works. The normal distribution stuff is a bit of a red herring.
2) Perhaps someone more patient than I can explain to Sortition what is going on in the paper. I have tried to make it as clear as possible. Also, Sortition, can you at least see that other commentators here seem to understand the math?
3) This issue of what distribution one ought to use to model post-invasion CMR is a tricky one. If not normal, then what? The tricky part is how much of a skew you assume/require. If you assume that CMR's below 4 (or whatever) are impossible, then you have essentially assumed your conclusion (that mortality has increased). My guess is that any distribution which gave reasonable prior mass to the possibility that CMR has gone down would produce similar results to mine. But, again, that is a non-trivial technical problem, independent of my point.
4) dsquared asks:
Correct. But the reason, I think, that this is an interesting number, the reason that people report it in papers (like L1) is that it is almost always used in conjunction with normal models. I have never seen a paper which reported something like:
"The crude mortality rate during the period of war and occupation was 12·3 per 1000 people per year (95% CI 1·4-23·2; design effect=29·3)"
in which the estimation procedure was not based on a normal model. Has anyone?
Now, if I were really clever, I would use the information about the design effect to re-engineer what some of the other key numbers are. But, for now, I will leave that as an exercise for the reader.
Posted by: David Kane | July 25, 2007 3:09 PM
My friend notes:
Not for my purposes. It does not matter to me what the mean post-war CMR is, nor how fat the tail to the right. All that I need to show concerns the lower confidence interval. I am only calculating the probability that CMRpost < CMRpre. For that calculation, all that (really) matters is the bottom percentiles for CMR_post (its 2.5th percentile and its mass below CMR = 3.7). The right hand tail (whether short or long or somewhere in between) does not matter for my proof. It is irrelevant.
Now, of course, the bigger a tail you place on the right hand side, the less mass there is to go elsewhere, including the left. I do require that the 1.4 lower CI is correct, as presented in the paper. Again, no one has shown any reason (meaning a specific alternate model) to doubt that. Roberts et al made a mistake, but not in estimating CMR.
Posted by: David Kane | July 25, 2007 3:17 PM
David, I'm not sure I understand your comment #32. Isn't the point of your analysis that under normal assumptions adding the Falluja cluster dramatically changes the lower CI? If instead of a normal, you use a distribution with a fat right tail such that the Falluja cluster is no longer that improbable, its inclusion or exclusion will have much less effect on the CI. The point is -- because of the Falluja data point -- the far right tail is very important in determining the lower CI.
Posted by: David Kane's friend | July 25, 2007 3:35 PM
My entire paper assumes Falluja is included and only addresses the claims made by Roberts et al which also include Falluja. Now, the reason that the CI for post-war (versus pre-war) is so much larger is because the estimate for Falluja post-war is so different (bigger) than those for other clusters. Falluja pre-war, on the other hand, looks a lot like other clusters. You write:
I agree that this is possible in theory, but I have to see a concrete demonstration. Pick any right skewed non-negative distribution that you like which a) Has a mean around 12 (which will be needed to match the mean of the data) and b) has lots of mass well between 5-12 (because lots of clusters are not nearly that violent) and c) has a tail which stretches out to easily include Falluja and d) does not assume the conclusion by having zero mass below 5.
I do not think that there is an distribution which does this very well, or any better (in aggregate) than the normal distribution that the authors use. Suggestions welcome.
Once we have such a distribution, we can estimate pre and post CMRs along with their confidence intervals. We can then check to see if these are consistent with the reported RR. Perhaps it will all make sense! I doubt it. And, anyway, my point is just to demand a correction/retraction of L1 as published. It is wrong. The authors are welcome (if they can find their data) to create a new model and then publish that one instead. But they first need to admit their mistake. Or someone needs to demonstrate mine.
Posted by: David Kane | July 25, 2007 3:57 PM
You guys are driving me nuts. I'm over here on vacation trying to enjoy the Tour de France (the results of which, btw, appear to have more credibility than David's results) and you guys can't do a simple mortality calculation. Months (years?) ago I posted a figure showing the bootstrap distribution for excess mortality. Here's the bootstrap distribution for the odds ratio of post- to pre-invasion mortality .
As before, I don't know what their random seed was, I don't know how many replicates they used, I don't know which bootstrap CI they used, I don't use the same software--but even with all of those caveats I come pretty darn close to Roberts' (or probably, Garfield's) results: they claim a RR of 2.5 (1.6 - 4.2) with Falluja and 1.5 (1.1 - 2.3) without it. I get 2.5 (1.2 - 5.2) with Falluja and 1.5 (1.1 - 2.2) without it.
Posted by: Robert | July 25, 2007 4:06 PM
Dang it, Robert, I was about to post a link to your old excess mortality graph and claim I'd done it.
Posted by: Donald Johnson | July 25, 2007 4:21 PM
David, this is an attempt at a constructive suggestion but it will probably come out poorly.
Why don't you try redoing your analysis of P(CMRPost) etc. using the assumption of an unobserved covariate? This unobserved covariate is a categorical variable, takes values 0 or 1, and its regression coefficient is about - what - 7? So in any cluster where it takes the value 1, it shifts the mean of the mortality to the right by (estimated fallujah CMR - CMR in remainder of iraq). The unobserved covariate is, obviously, measuring the presence or absence of a major military conflict. You can model various assumptions about the distribution of this variable, but the best assumptions are obvious: before the war, it has a value of 0 with constant probability 1. After the war, it has a binomial distribution with p=(number of clusters in fallujah)/(number of clusters in the country). I bet this will make p=1/32, approx.
Further, assume homogeneity, so the clusters with this unobserved covariate value of 1 have the same variance as the clusters with the value of 0. You'll see that these clusters have essentially 0 probability mass in regions of mortality which lie below the pre-war estimate. You can even, I'm sure, do a calculation of exactly the variance which would be needed (assuming non-homogeneity) for these observations to have any significant overlap of probability mass with the pre-war estimates. You can rejig your figures accordingly, and you'll see visually what is happening.
Obviously, if p=1/32, the chance of observing a large number of these clusters is very small, so unless you do a sample in the order of 320 clusters you are unlikely to get enough high-risk clusters to get a sensible estimate of excess mortality in the different levels of the covariate. But you don't want to - your interest is in showing how the low-risk clusters have changed, since everyone knows that shitloads of people die in high-risk clusters, and we don't need to throw away the lives of Iraqi doctors to find out.
I think while you are claiming that Roberts et al have to assume the answer of "higher deaths", it's pretty obvious that you are assuming the answer that there is no unobserved covariate corresponding to a massive bombing campaign and military intervention in one town. Given what we know of war, Iraq, and the sample, which answer do you think it is better to assume?
Posted by: SG | July 25, 2007 8:25 PM
Robert makes a fair point which I hope to address soon. SG suggests an interesting model which I encourage him to work out himself.
For now, I want to step back and try to claim why this arcane dispute matters. Recall that the most quoted result from the article was the 98,000 excess death estimates (with confidence interval 8,000 to 194,000). Now we all realize that this estimate excluded Falluja. Now, here is a question for Lancet defenders, especially serious folks like Tim, dsquared, Robert, BrendanH and others:
Take the exact same computer code that produces the 98,000 (CI 8,000 - 194,000) estimate. Now, don't exclude Falluja. What does the code produce?
Now, without the exact code, it is tough to know the answer to this. I argue that the authors do a lot to try to imply that the mean estimate would rise to 300,000 or so and that the confidence interval would get wider but still safely exclude zero, something like 150,000 to 600,000.
But this is, I am fairly certain, false. Using the exact same code, I bet that the lower bound would be well below -100,000. See the paper for details. Moreover, it is almost certain that the authors knew this and that they purposely organized and wrote the paper in such a way as to hide this fact, to purposely mislead readers (including smart readers like Tim and dsquared) into thinking that including Falluja would move the lower bound of the confidence interval for excess deaths up. After all, excluding Falluja is "conservative!"
Now, before diving into the weeds on this topic, I would first love to establish what smart Lancet defenders believe. So, what is the answer to my question above?
Posted by: David Kane | July 25, 2007 9:28 PM
Tossing in a big outlier can drive the lower CI limit down. Imagine 5 points; 3, 3, 3, 3, and 3. Now toss in one more point, 4. The upper confidence limit will now be higher, but the lower confidence limit will be lower.
Posted by: z | July 25, 2007 9:48 PM
Very cool discussion. I had classmates over there, although most of my year group is out of the field by now. Dave, you seem smart...for a Marine. ;)
Posted by: TCO | July 25, 2007 9:51 PM
David, its not an "interesting model" which I should go away and work out - its the answer to your conundrum. What you are doing is claiming that a very very unusual event observed in a sample of 32 observations is a true random draw from the same distribution as the other 31 observations, when we have STRONG evidence to suspect that it is a random draw from a different distribution. Fallujah was a planned event, in which the US government moved a lot of resources to ensure that the mean of the observations drawn from that area would be shifted significantly to the right. You can`t claim that it is simply another observation from the same sample as the other 31.
Let me give you another example. You do a study of fitness in boxers, so you sample 32 boxers randomly from a list provided you by the Nevada state boxing association. 31 of these yield a body fat percentage of 12, CI 11-13. One has a body fat percentage of 16. Study author DK takes this as evidence of a wide CI, and concludes that commonly boxers in Nevada have a body fat percentage as low as 8, since his revised CI ranges from 5 - 20. Study author SG, on the other hand, investigates the records and finds that last year the Nevada boxing association admitted women for the first time, and didn`t tell you when you they gave you the list. The 32nd observation is a woman. You cannot caim that she is just an unlikely observation from the same distribution as the rest of the sample, because she is a structurally different observation... there is a missing covariate.
In order to argue Fallujah is an unusual observation from the same distribution, you need to give a reason. The authors gave a very valid reason - a big fat war - for thinking it wasn`t. The onus lies with you to explain otherwise.
Posted by: SG | July 25, 2007 10:05 PM
SG writes:
No. This is what the authors do. See their paper. They measure the post-invasion CMR using a normal distribution with the Falluja cluster treated identically to all the others. The clusters are exchangeable. If you think that this is nuts, you should bring it up with Roberts et al, or with the Lancet's peer reviewers.
Posted by: David Kane | July 25, 2007 10:10 PM
With regard to Robert's excellent post (and graphics) above, I have some comments.
1) Although Robert gets close, this is still not a replication. Is anyone else annoyed that the authors refuse to provide the details of their methodology?
2) Assume that Robert has replicated their results. Does that invalidate my claims? No! (I think.) Go back to the two scale analogy. Robert has demonstrated that there is a scale B which acts just as the Lancet authors claims it acts. I don't deny this. I just think that you can't simultaneously believe both the post-invasion CMR and the RR numbers. Robert may be correct that such-and-such procedure produces the 1.6 -- 4.2. But that result is inconsistent with a CMR_post of 1.4 -- 23.2. You can believe one. You can believe the other. But you can't, mathematically, simultaneously believe both.
3) Now, the potentially suspect part (Warning: Speculation alert!) is why the authors insisted on using the bootstrap. They only had 33 observations. The bootstrap is rarely used with so few observations. Why didn't they report the non-bootstrap results? They report standard confidence intervals in the 2006 paper (although they check them with a bootstrap).
So, another question for Lancet defenders. Why report bootstrap RR confidence intervals in 2004 but not in 2006?
I think that standard results for the RR would have yielded a lower bound for the confidence intervals well below 1. The authors did not like that result, so they hid it. They looked around for a method which would give the result that they wanted, found one, and then reported it (without noticing that it contradicted their other results).
Fortunately, it is easy for the authors to prove me wrong. Just tell us what the standard calculation for the relative risk produces when Falluja is included. As long as it is something similar to 1.6 -- 4.2, there is no problem. But if it is significantly different (as the quotes from Burnham and Roberts suggest), I think that they are guilty of purposely misleading their readers.
Posted by: David Kane | July 25, 2007 10:55 PM
"Tossing in a big outlier can drive the lower CI limit down. Imagine 5 points; 3, 3, 3, 3, and 3. Now toss in one more point, 4. The upper confidence limit will now be higher, but the lower confidence limit will be lower."
Oh, goody, someone at my level maybe. From what I glean in this thread, it seems to depend on what sort of distribution you fit to the data. If you assume a normal distribution previously you had no variance and your bell curve was a delta function spike at 3. Add the 4 outlier and there's a nonzero variance and so there's a chance of a result less than 3.
But if you do a simple bootstrap (as I understand it anyway), that won't happen. Finding a 4 doesn't give you any reason to think there's a 2 in the vicinity. You resample from 3,3,3,3,3, and 4 and there's no way any collection of clusters you get will have an average value of less than 3. Because, you know, they are all 3 or bigger.
This whole business seems vaguely arbitrary to me, probably from not understanding it enough. But the bootstrap approach (as I understand it) seems to stick closer to the data you actually have.
And David Kane, yeah, when you brought this up a few weeks ago and I looked at the CMR with Fallujah I thought the result was nuts. Maybe someone should have objected. BTW, I thought it was 32 non-Fallujah, with Fallujah being 33. And that brings up something I once thought of. The prewar and postwar time periods are of different lengths, which complicates things, but there was 1 violent death prewar, which means one violent cluster. There were 15 clusters with violent deaths postwar, or 14 violent clusters and 1 superviolent cluster if one wants to treat Fallujah differently but I'm lazy and don't. I don't know how to correct for the time duration problem, but this alone kinda suggests an increase in violent deaths due to the invasion. Maybe you could do simple binomial statistics on clusters, classifying them as violent or nonviolent. This is about my speed. Lumping Fallujah in I get a variance of Npq = 33(15/33)(18/33) = 8.2. Sqrt 8.2 is 2.8 and I'll double that for my CI half width and you have 5.6, so my informal CI for violent cluster number out of 33 would be 9.4 to 20.6. I'm not sure what to do with the prewar number since the time duration is different, but say it counts as 2 violent clusters instead of 1, which seems generous to me. The variance would be about the same, so if that's 2 violent prewar clusters with a variance of 2, then that's 2 plus or minus 2.8 and with a maximum value of 4.8 there's no overlap with the postwar CI. I did a bad thing there--spreading my CI for the prewar violent cluster number into the negative range, when really the distribution skews right and could be approximated by the Poisson distribution, but hell, it's not going to do too much harm in this case. I think the upper end of the CI goes up slightly if you do it right. Someone who took this seriously could do it more carefully, but it seems clear the war caused an increase in the number of violent clusters. What a surprise.
Posted by: Donald Johnson | July 25, 2007 11:30 PM
As I read it David, that`s not what they do. They present the RR and CI with Fallujah excluded as their main result, and then do a bootstrap estimate of the confidence interval with Fallujah included.
The bootstrapping simply assumes that the probability of the "high-risk town" covariate occurring in the sample reflects its true population proportion. So in any sample of 32 clusters the probability of more than one such cluster occurring is very rare, but a much more likely effect is that in most clusters it won`t occur, biassing the confidence interval calculation towards a higher lower end. You can do the same thing with any data set where an unexpected covariate has introduced a highly unusual data point - in the absence of enough data to treat the unexpected covariate as a confounder, bootstrapping is a way to get an estimated confidence interval without throwing away any data. There are only 32 points, after all.
Posted by: SG | July 25, 2007 11:36 PM
For example, David:
So they have restricted their report of extra deaths to the low-risk towns outside of Fallujah, and have explicitly given a figure for the death rate in 97% of Iraq. Does this sound like it might possibly represent a covariate analysis to you? With insufficient data for the remaining 3%?
Posted by: SG | July 25, 2007 11:39 PM
I got carried away with my sloppy statistics for dummies approach, but there's a serious point underneath all this. David, do you really think there's the slightest chance the invasion of Iraq lowered death rates? I can still remember those days of late 2004 when it was still considered quite daring in mainstream political circles in the US to suggest that the Iraqis might be suffering more violence and death under occupation than they had during the late years of Saddam's rule, but nowadays I think you'd have to be blind not to realize this is exactly the case, and it was the case in 2003-2004 as well, though the violence has gotten worse since. Looking at L1 in the simplest most naive way possible, one sees one violent death in the (somewhat shorter) period before the war and 21 violent deaths (excluding Fallujah) after the war. 1 violent cluster (in a shorter period of time) prewar and 15 postwar. Any sophisticated statistical analysis which takes this data and somehow "shows" that there is a chance the death rate went down following the war is clearly