Deaths in Iraq updated

James Wimberly updates estimates of deaths in Iraq. If you extrapolate from Lancet 2, the death toll is now over a million. Which sort of explains why the coalition won't do counts of their own.

Elsewhere, David Kane has put an R package for the Lancet data and his own discussion of the data. One interesting thing is that 24 of the deaths from car bombs (out of 38 deaths) occured in a single incident.

Tags

More like this

Back in November 2001 Neil Munro was an advocate of war with Iraq and predicted: The painful images of starving Iraqi children will be replaced by alluring Baghdad city lights, smiling wages-earners and Palestinian job seekers. Iraq war advocates like Munro don't like the results of the Lancet…
I think it is worthwhile to update James Wimberly's comparison of surveys of deaths in Iraq. In the table below death tolls have been extrapolated to give a number of deaths due to the war up to Oct 08. Survey Violent deaths Excess deaths ILCS 160,000 Lancet 1 350,000 510,000 IFHS 310,000…
Lenin on the IBC attack on the Lancet study I had anticipated that the team behind Iraq Body Count would react to the latest survey on Iraqi mortalities published in the Lancet by trying to minimise their import and undermine their reliability. I was not wrong. The reason is fairly simple: they're…
There has been more discussion at Crooked Timber on David Kane's criticism of the Lancet study. In response to Tim Burke's comment: Good faith skepticism starts with, "Ok, I want to look at why you're making this claim, and your evidence for it. I don't want to take anything on faith." Not, "I'm…

But Roberts et al have known for some time there would be plenty of motivation for a 3rd study. I suspect they are working on one right now, and we will see it some time in mid-2008 .

It's only a million dead because Rachel Carson banned DDT ...

Doing a third study would be EXTREMELY dangerous for the survey team. To put it mildly, neither the Iraqi government nor coalition forces will put themselves in harm's way to protect the team members as they go about their survey. And the team themselves are going to come under deep suspicion in the districts they enter. A new study is not realistic until some semblance of peace arrives.

Kane's main point is around the Falluja figures. If these are included, the confidence inervals become so wide that acceptance of the null hypothesis (no increase in death rates) is possible. However, it seems to me that exclusion of the Falluja data is a reasonable step for any practical statistician - why include a massive outlying cluster in the dataset? Falluja had clearly been the recipient of a pummeling way in excess of any other region of Iraq.

It should be added that it is good to see the Lancet finding subjected at last to rational scientific criticism, not the negative politically-motivated attacks of Bush, Blair, Howard and their ignorant troop.

One caveat that has to be made on extrapolated numbers these days is that the refugee population of Iraqis is now quite material to the total, and therefore some adjustment needs to be made to reflect this when scaling up death rates.

It is not true that including Falluja puts "no change" in the confidence interval. I quote:

>The risk of death was estimated to be 2·5-fold (95% CI 1·6-4·2) higher after the invasion when compared with the preinvasion period.

The number 1 (no change in the risk) is not contained in the interval 1.6-4.2.

Tim,

You are, of course, quoting the paper correctly. They imply that the lower bound for the confidence interval when Falluja is included is a relative risk of 1.6, i.e., an increase in mortality of 60%.

Yet, I argue in the paper that you kindly link to that this is highly misleading, that it is a result of using a bootstrap procedure that is inappropriate. I claim that if they included the Falluja data (in whatever statistical routine produced the 8,000 to 194,000 confidence interval) you would get an interval that goes from -700,000 to 1 million. (I now think that this interval is wrong because it ignores the fact that mortality can't be less than zero, but that will get fixed in the next draft.)

In any event, if you (and some of your well-informed readers) could address the argument made in the paper in more detail, I would appreciate it. I don't want to look (too) stupid when I present at JSM next month.

I understood the purpose of posting packaged datasets onto CRAN was to provide nice clean data with well-understood properties for use in teaching and in testing software.

The ethics of using CRAN to pursue a debate over whether a group of researchers have correctly analysed a contentious dataset seems iffy at best. The presence of the Lancet package on CRAN provides David Kane's vignette an aura of authority that it might not deserve. I feel that it sets a precedent that may be regretted in the future.

By The Feral Abacus (not verified) on 26 Jun 2007 #permalink

Has the data from the 2006 survey been released to everyone who requests it or are there still limitations?

As to David Kane's question. It appears to me (naively as it were) that for a large outlier on one side to symmetrically widen the confidence limits requires using a symmetric error distribution. In this case, since the outlier is from an area which has been known to have experienced heavy fighting there is a good argument against using such a form. If you think about it, that was the real argument against including it in the original survey, that it would require widening the confidence limit on both sides. So then the issue becomes is there another survey, of something else that would provide an appropriate proxy for the error distribution. This could include public services, hospital admissions, etc.

1) The Feral Abacus does not understand the purpose of CRAN or, perhaps, open source software in general. My package is open source (and has been used by other researchers involved in this debate). If you don't like it, don't use it. If you have suggestions on how to make it better, tell us.

2) If you think that my paper is wrong, please point out my errors. Such is the way science moves forward.

3) The data is still restricted. AFAIK, most requests have been granted, but at least one that I know of has not been. The authors should listen to Tim on this one!

David, I fail to see what strident criticism other people's research methodology has to do with open source software. Please elaborate.

As for your paper - 'If clusters in violent regions had more interviews than those in less violent regions, the calculated mortality rate might be too high.' (from page 14)

Is this correct? Would it not simply mean that mortality in violent regions was known with greater precision than mortality in less violent regions?

It's also unclear why you are so 'troubled' (p 14) by small differences in the numbers of interviews (between 38 and 41 households were interviewed in 44 of the 47 clusters). Can you quantify the effect that this modest imbalance in design has had on the results?

BTW what's the source for Johnson et al (2006)?

By The Feral Abacus (not verified) on 26 Jun 2007 #permalink

1) So, your claim is that, while the package itself is a fine addition to CRAN, the vignette I have chosen to include does not belong there? Because it is too "strident"? I disagree. If you are a member of the R community, then you should bring up the topic on one of the R mailing lists. I am fairly certain that a majority of R users would disagree with you.

2) It depends on the precise methodology. If one cluster sampled 1,000 people and reported 10 deaths while another sampled 10 and reported 5 deaths, then you can estimate overall mortality in (at least) one of two ways. First, estimate mortality within each cluster separately (1% and 50%) and then average those estimates (25.5% mortality). Second, combine the total sample (15 deaths out of 1,010) for an estimate of 1.5%. It is still not totally clear to me which approach the paper used. The second approach would, obviously, be an underestimate.

3) The effect on the results of sampling different numbers of households would be trivial. But that is not the issue! This is merely a symptom of the larger concern about the quality of the underlying data. See previous discussion on this site about response rates.

4) See here:

http://www.rhul.ac.uk/economics/Research/conflict%2Danalysis/iraq%2Dmor…

for links on Johnson et al (2006)

By David Kane (not verified) on 26 Jun 2007 #permalink

David, boot-strapping strikes me as the correct method, and a method that assumes that a Falluja in the year before the invasion could have happened will give misleading results.

I'm just a folk Bayesian rather than someone who knows anything about statistics, but there's something weird about looking at a cluster with a huge excess of deaths and concluding that this increases the probability that the death rate has drastically decreased.

By Donald Johnson (not verified) on 26 Jun 2007 #permalink

I read the explanation for why including Fallujah would widen the CI symmetrically--maybe there's a town where Saddam's last year was incredibly violent. I recall someone saying that a year or two ago when I asked in one of the earlier threads.

The problem is (speaking as a layperson) that even leaving aside whatever prior knowledge we have about Iraq, when you find one cluster with a gigantic increase in mortality after the invasion, it doesn't suggest in any logical way that I can see that this means there might have been a cluster that suffered equally dramatic losses before the invasion. Maybe there were, but we already knew that if you have 32-33 clusters in your sample, you're most likely going to miss the rare cluster that suffered huge numbers of deaths before or after the invasion (or maybe both). So why should finding one cluster that suffered massive excess death post-invasion increase the odds that you've missed clusters that had suffered massive death pre-invasion?

By Donald Johnson (not verified) on 26 Jun 2007 #permalink

I go along with the tenor of the previous posts. On a quick reading of the paper, it seems to be special pleading to drag in the Falluja data. Naturally, doing that makes the estimates much less robust and subject to wild swings.

Having watched Prof. Burnhams internet presentation, I also think it is inaccurate to say that the authors "get around this problem by ignoring it". In fact, they have been very open in communicating how they handled the Falluja data.

The paper makes a large claim that it is "correcting" the estimates by including Falluja, appealing to the "formulas for cluster modeling", rather than the realistic assumption that no city in Iraq would balance Falluja on the opposite sense (extremely violent pre-invasion, quiet afterwards). I was always taught in situations like that to trust the realistic scenario rather than the formulas.

Let me repeat that I am glad to see David's paper - the more constructive criticism the survey results receive should strengthen the confidence in the conclusions, (assuming it withstands the criticism, as so far I think it will).

David Kane doesn`t appear to have actually analysed the data in any way, just written a long snark with some calculations of means, while (still) implying fraud.

For example he says that the confidence interval when including Falluja "may be" as high as 1,000,000. But he quotes one of the authors saying that if you include Fallujah the confidence interval includes infinity (Burnham, page 4). As far as I can tell there is no evidence in David Kane`s paper of an attempt to reconstruct even a basic cluster-sampled model with and without Fallujah. Surely doing so would confirm the general details more effectively than giving an inaccurate guess at what the authors have stated?

So I`m left wondering: did David Kane write this so he can have a more "official" vehicle for his claim that the authors committed fraud; or did he write it to push his new theory of how to handle outliers (include them no matter what)?

My advice to David is to revise the text and remove the "Gotcha!" note, which does not sit well. Fine, if he wants to make a name for himself in certain quarters, but if that is so, these contributions are pointless anyway.

Incidentally, the British Government privately conceded from the start that the original study was sound. Just follow this link:

http://www.bbc.co.uk/pressoffice/pressreleases/stories/2007/03_march/26…

This means that most of the rubbishing of the Lancet study was misinformation designed to confuse the public. But that should not surprise anyone.

1) Thanks for the suggestion about the snark and Gotcha tone. I am guilty as charged! I hope to remove that tone in the next draft, but the temptations are hard to resist.

2) SG notes:

David Kane doesn`t appear to have actually analysed the data in any way, just written a long snark with some calculations of means, while (still) implying fraud.

Well, you missed the first draft, in which I pointed out a series of "problems" with the data, most of which he authors have corrected. Even now, the version that they distribute is wrong. Unless you use my R package, you aren't getting the right answer. If you don't think that this sort of knitty-gritty checking of the data constitutes analysis, then we will have to agree to disagree. In all applied situations that I have seen, getting the raw data correct is at least 90% of the work.

as far as I can tell there is no evidence in David Kane`s paper of an attempt to reconstruct even a basic cluster-sampled model with and without Fallujah. Surely doing so would confirm the general details more effectively than giving an inaccurate guess at what the authors have stated?

Sure. I have not done so. But, guess what? No one has! I am in contact with many/most of the folks working on this and, AFAICT, not a single outside team has been able to replicate the basic cluster model. The authors have (so far) declined to make available the underlying model code to anyone.

3) Donald Johnson writes:

So why should finding one cluster that suffered massive excess death post-invasion increase the odds that you've missed clusters that had suffered massive death pre-invasion?

I do my best to provide intuition in the paper. If you don't like my intuition, please try any text in survey methodology. I like Survey Methodology by Groves et al. Or, try this intuition. Imagine any real valued variable like changes in the mortality rate. I show you the results from 5 clusters. They are similar. I ask you what are the odds of a cluster with dramatically lower mortality. You say X. If I then show you a cluster with dramatically higher mortality, you ought to then revise X upwards. (All the standard models will do so.) Seeing one outlier increases the odds that other outliers are in the population.

4) Toby argues:

On a quick reading of the paper, it seems to be special pleading to drag in the Falluja data. Naturally, doing that makes the estimates much less robust and subject to wild swings.

That's ridiculous. In any data analysis, the default is to include all the data you gather. "Special pleading" is required to ignore data not to use it. Now, one might argue that the methodology plan for the research (a copy of which the authors refuse to release) specified that any "outlier" cluster would be ignored. But, first, I bet that that wasn't true in this case. (If it were, the authors would have told us.) Second, it would make no sense to do so. Wars result in clustered violence and mortality. In fact, it would be unusual not to see some outlier clusters. No decent methodology for exclude them by default. Third, according to Burnham et al (2006), the Falluja cluster is not an outlier at all. They claim to confirm those results when returning to Falluja. (Not sure if the paper itself claims that or if the claim is made in other venues.)

5) SG wonders:

did David Kane write this so he can have a more "official" vehicle for his claim that the authors committed fraud; or did he write it to push his new theory of how to handle outliers (include them no matter what)?

Not sure what to make of this. I am writing a paper (and presenting at JSM) for much the same set of reasons that other people write papers and present at JSM.

By David Kane (not verified) on 27 Jun 2007 #permalink

"...In any data analysis, the default is to include all the data you gather. "Special pleading" is required to ignore data not to use it..."

Yes, of course, and the Johns Hopkins/ Lancet team did so. The they decided how their analysis was to proceed, which was not to ignore data, but to follow a rational approach towards a reliable estimate. I think they have been transparent enough.

To then jump in and say "Gotcha, you left out the Falluja data, and when I apply my correct formulas, the estimates swing around ....is, well, special pleading i.e. pretending the original paper's explanation is inadequate. It could be called by unkinder names.

Pointing out the way the estimates change drastically could be used to demonstrate the correctness of the Lancet/ Johns Hopkins approach rather than the reverse. If David has a concern, then all well and good, but so far that is all it is - a concern.

"Imagine any real valued variable like changes in the mortality rate. I show you the results from 5 clusters. They are similar. I ask you what are the odds of a cluster with dramatically lower mortality. You say X. If I then show you a cluster with dramatically higher mortality, you ought to then revise X upwards. (All the standard models will do so.) Seeing one outlier increases the odds that other outliers are in the population"

That is textbook talk; Iraq is not a textbook. One would expect common sense to have a say as well. As Prof Burnham does point out (in his internet talk) investigators will consult other sources to information .. he specifically mentions the Iraq Body Count study as valuable in this regard. Now any research on Falluja would have found (to take on random example):

"According to Mike Marqusee of Iraq Occupation Focus writing in the Guardian, 'Fallujah's compensation commissioner has reported that 36,000 of the city's 50,000 homes were destroyed, along with 60 schools and 65 mosques and shrines'". Civilian casualties were not reported by the coalition. I think it is safe to say no other city in Iraq ensured this level of attrition. If it did it is virtually impossible that it went unreported. The case for Falluja as a unique outlier, which would distort the estimates by unrealistic proportions, seems to me to be a good one.

If David's paper amounts to "They didn't do it according to the book", then the Johns Hopkins/ Lancet team have little to worry about.

David Kane claimed:

But, guess what? No one has! I am in contact with many/most of the folks working on this and, AFAICT, not a single outside team has been able to replicate the basic cluster model.

Hmmm. How many folks are working on this, how many of those have you been in contact with, and how would you know whether they have or have not been able to replicate the basic cluster model?

How many folks are working on this, how many of those have you been in contact with, and how would you know whether they have or have not been able to replicate the basic cluster model?

Excellent questions. According to the authors, about a dozen people/teams have requested access to the data. All but one have been granted it. I have been in touch with at least half of these, either directly (via e-mail and phone) or indirectly (friend of friends). I believe that I have reason to claim that nothing has been replicated. This is all the more true since the data as distributed by the authors is still wrong! (Unless you use my R package (which corrects things), you are looking at bad data.)

Is it possible that someone/somewhere has replicated things? Sure! But, if they have, they haven't told anyone, including the Lancet authors (who I am in contact with).

By David Kane (not verified) on 27 Jun 2007 #permalink

I agree with Toby about the reasons why one would expect an outlier in post-invasion Iraq, particularly in Fallujah, while we have no reason to expect such a thing in pre-invasion Iraq (in 2002, that is. The late 80's during the Anfal campaign or 1991 would be a different story.)

There's something very strange-sounding to the layman about the textbook methodology. I mean, yeah, I could calculate average number of excess deaths (positive or negative) per cluster and Fallujah greatly increases the variance. The question to me is so what? Why would a discovery of a huge increase in violent deaths in Fallujah post-invasion make the slightest difference in one's thinking of how likely it is that one missed a cluster with a huge number of violent deaths pre-invasion? Suppose the actual fraction of Lancet-size clusters where Saddam murdered a large number of people in the Lancet's pre-invasion period was X. Then the expected number of such outliers in a sample of 33 would be 33X. We don't know what "X" is, but the discovery that there were a large number of violent deaths in Fallujah in the post-invasion environment adds precisely nothing to our knowledge of X. All we continue to know is that out of 33 clusters, none of them showed a huge number of violent deaths in the pre-invasion period.

Suppose Bush had dropped a nuclear weapon on Fallujah and wiped out 200 people in the cluster. I suppose the textbook method would widen the CI so much we'd have to consider the possibility that the invasion saved 5 million lives.

By Donald Johnson (not verified) on 27 Jun 2007 #permalink

Epidemiology is full of instances where people exclude correct observations that are outliers. For example, in a study of passive smoking effects on lung cancer one might find a single subject with asbestosis, and exclude them as a confounder. Obviously a city which has suffered two wars (Fallujah) is going to be radically different to a city which has suffered 1 war (Baghdad). There are even diagnostic tools for regression models to assist us in choosing the outlier to exclude. There is no problem with doing this but a big problem with not doing this, as evidenced by the twisted logic you ultimately have to employ:

"We know that Fallujah suffered a massive amount of additional damage and war than the rest of Iraq. In any study of war damage we must include such a site. By so doing, we have massively inflated our confidence interval, so we can no longer draw any conclusion as to whether anywhere in Iraq suffered any extra damage as a result of the war."

David Kane wrote:

about a dozen people/teams have requested access to the data. [...] I believe that I have reason to claim that nothing has been replicated.Is it possible that someone/somewhere has replicated things? Sure! But, if they have, they haven't told anyone, including the Lancet authors (who I am in contact with).

Seems to me there's some reporting bias in your belief. In any event, the real question typically isn't "able to replicate exactly or unable to replicate exactly?" Far more frequently, it's "by how much, and why?" So, of the teams you've been in contact with, how close did they come, and what did they find out?

As an aside, I wonder how much practical use your R package will receive. There's an odd Catch-22: the people truly qualified to evaluate the data probably wouldn't need it, and the people who would need it probably aren't truly qualified. At this point, the best use for the data is heuristic and perhaps it would have been a good exercise for students to find data inconsistencies by themselves. I'm not sure whether the extremely off-putting non-redistribution limitation allows that.

"I believe that I have reason to claim that nothing has been replicated"

David Kane has also made his other "beliefs".

But that David Kane happens to believe something does not make it so.

I suspect he may learn this when he (finally) presents his much-(self)touted paper.

JB wrote:

I suspect he may learn this when he (finally) presents his much-(self)touted paper.

Oh, I don't know about that. At most academic meetings where you only have a few minutes to present, there simply isn't enough time for fireworks to get going. Besides, to the extent that David's point is that data ought to be shared, I support that. To the extent that "the results of published studies which fail to meet the replication standard should be disregarded," well, I don't think a contributed paper at 8:30 in the morning to the section on estimating mortality and migration is really the place to hammer that out.

What I meant was that he just may find out at the upcoming conference that, unbeknownst to him, someone has been able to "replicate the basic cluster model".

Kane's past speculations regarding possible fraud (based as they were on sketchy information at the time) are a separate issue from the data sharing one.

Besides, his arguments about data sharing in this case are a red herring, as far as I can see (the proverbial "mountain made out of a mole hill").

JB wrote:

What I meant was that he just may find out at the upcoming conference that, unbeknownst to him, someone has been able to "replicate the basic cluster model".

Well, the chance of that happening is pretty low -- people who are obsessed with the Roberts and Burnham papers probably have difficulty believing that the rest of the world isn't equally obsessed but I'm guessing not too many people are planning on flying to SLC to ambush David with PowerPoint slides at ten paces (though it's not unheard of: Mary Rosh's favorite professor reportedly did that to Levitt). Basically, I don't think too many people really care that the study said each cluster had 40 households while David has uncovered the fact that some clusters had 39 HH's while others had 41.

I'm guessing not too many people are planning on flying to SLC to ambush David with PowerPoint slides at ten paces "

Who ever said anything about "ambushing" and the like?

I'd agree, that's not very likely (absurd, really, and also beside the point).

I don't know whether it is the case or not, but I'd say that it is at least possible (which is why I said "may") that someone else at that conference may be presenting on the Lancet results.

Robert, while the idea of sharing data is nice in principle, in epidemiology it often doesn't happen in practice. Organisations which provide data to secondary research institutes do so through special agreements which almost always stipulate that you cannot share the data. So others who want to check the data have to contact the original organisation and replicate the data request. Sad but true.

And I don't think this is so sad when one is dealing with data which might be capable of being used to identify individuals. In any cluster-survey where this can occur, people always refuse to share the data until they have deidentified the clusters. Sometimes the data is sensitive enough that people have to give special undertakings to funders and/or subjects that they won't share the information.

I haven't read anything about it, but I have a suspicion that when Burnham et al got funding to go and ask iraqis if their families members were executed for being militiamen, they had to give some pretty strict undertakings about how the data would be shared. And I say that's only reasonable.

David Kane has been insinuating fraud all along and a large component of his insinuations is based on suggesting that their unwillingness to share is unsual. It's not, he knows it's not, and it's disingenuous at best for him to talk this way.

1) SG writes:

David Kane has been insinuating fraud all along and a large component of his insinuations is based on suggesting that their unwillingness to share is unusual. It's not, he knows it's not, and it's disingenuous at best for him to talk this way.

My insinuations of fraud have been mostly based on the survey response rates. Let me repeat my claim: There has never been a nation-wide single contact face-to-face survey with a response rate as high as Roberts et al (2006). If you disagree, please provide a counter-example. Note that many (attempted) counter-examples (ILCS, Gallup) are off-point because they featured multiple contact attempts. If someone wasn't home, the interviewers came back the next day/week.

2) Sharing data is not "unusual." Consider this list of dozens of journals which require authors to share data/code. The Lancet articles could not have been published in any of them. See also all the funding sources (NIH, CDC, et cetera) which require sharing of data/code. Because the Lancet authors did not get funding from these sources, they do not need to share. But every one else who does get such funding does need to share. (Of course, no one asks to see the data, in most cases.)

3) JB accuses me of "(self)-touting" my paper. Why? Tim linked to it, not me. In fact, I am pleased learned something about the distribution of car bomb deaths that he did not know before. Such are the small steps by which science advances.

4) Donald writes, "There's something very strange-sounding to the layman about the textbook methodology." Well, talk to the textbook authors or, I guess, to God. The formulas are what they are. In my next draft, I'll try to give better intuition. Suggestions welcome. Another way to think about is that the existence of the Falluja outlier means it is more likely that there are, not just a single large outlier on the other side, but a small group of medium ones.

5) SG claims:

Epidemiology is full of instances where people exclude correct observations that are outliers. For example, in a study of passive smoking effects on lung cancer one might find a single subject with asbestosis, and exclude them as a confounder.

Perhaps. But in every case I know of, such data-removal is specified in the methodology before the project begins. The plan calls for the removal of asbestosis victims when/if any are in the sample. It would be inappropriate to collect a bunch of data, look at it and then, on the basis of what you see, decide what to include and exclude in the analysis.

So you stand by your accusations of fraud?

As for sharing data, your list is instructive. Critical Public Health, for example, only requires disclosure to the extent "consistent with protecting the identity of individual participants". children`s research digest states that data should be shared provided that "the confidentiality of participants can be protected" and "legal rights concerning propietary data" do not preclude it. I note that Children`s Research Digest is one of a long list of journals sharing the same statement - I can`t even be bothered counting them all.

As for your response to 5), it`s nitpicking. I mean really, is your argument that because it wasn`t in the plan, if they stumble on a completely empty city with no people in it, they have to use the data "population 0, deaths 0" because it wasn`t in the plan that they could exclude it? If they had included a bunch of crazy data points that biassed up the death rate and lowered the variance, and not excluded them because the plan said not to, I have a strong suspicion that you would have argued for excluding data regardless of the plan.

Donald: There's something very strange-sounding to the layman about the textbook methodology."

David: Well, talk to the textbook authors or, I guess, to God. The formulas are what they are. In my next draft, I'll try to give better intuition. Suggestions welcome. Another way to think about is that the existence of the Falluja outlier means it is more likely that there are, not just a single large outlier on the other side, but a small group of medium ones.

What on earth are the assumptions underlying this bizarre claim? You can't suppose that the distribution of violent deaths is unchanged by the outbreak of a war; so what "formulas" are you talking about?

Since you invite suggestions, here's mine: forget about trying to bring the reader's intuition into line with your own; just provide a proof. Many of us are quite willing to accept counter-intuitive results if the calculus and matrix algebra are up to scratch. If nothing else, it shows that the author has at least made the effort to audit his own reasoning. Requests for counter-examples give precisely the opposite impression. If you are presented with "a nation-wide single contact face-to-face survey" with a similar response rate, is your next demand going to be for a nation-wide, single contact, face-to-face, low-budget survey carried out by doctors in a war zone in the 21st century?

By Kevin Donoghue (not verified) on 28 Jun 2007 #permalink

Robert, "that's a sure thing. Les Roberts is presenting in a special session."
In my previous comment (which you responded to) said this
"he just may find out at the upcoming conference that, unbeknownst to him, someone has been able to "replicate the basic cluster model".

It should have been obvious (to you at least) from that comment that I was talking about someone otherthan the Lancet authors.

Putting two and two together is really not that hard. Try it some time

JB insisted:

I don't know whether it is the case or not, but I'd say that it is at least possible (which is why I said "may") that someone else at that conference may be presenting on the Lancet results.

In that case, rather than speculate, you could have looked at the [preliminary program](http://www.amstat.org/meetings/jsm/2007/onlineprogram/index.cfm?fuseact…) yourself. I do not see any other relevant presentations than the ones I noted (nor, given that this is the JSM, would I have expected to).

"Another way to think about is that the existence of the Falluja outlier means it is more likely that there are, not just a single large outlier on the other side, but a small group of medium ones"

Looking at the chart in David's paper, there does not seem to be grounds for this, any more than presuming that taking more clusters will "fill in" the large gap between Falluja and the main mass of the cluster distribution.

Intuition tells me that if there were "islands of high security" within Iraq in which civilian mortality rates had decreased significantly, then (a) refugees would have tended to flee to those areas, in which case we would know about them, and (b) Al-Qaida, as it has tended to do, would have started to target those areas so they would not have remained "security islands" for long.

I think the assumption of more outliers so that the mortality rate in Iraq remains stable, or possibly even decreases(!), throughout a genocidal conflict is pretty thin.

Anyone who supports his analysis with "talk to the textbook authors or, I guess, to God. The formulas are what they are," does not have a firm grasp of the analysis.

Like in any other mathematical analysis of data, the formulas reflect the model assumptions. If the practitioner cannot trace back the connection between the conclusions and the assumptions, he is being sloppy. As Kevin Donoghue points out, it is clearly the *assumptions* in Kane's model (not the "formulas", "God", or nature) that imply that the parameters of the distribution of pre-invasion deaths are in some way connected to those of the distribution of post-invasion deaths.

Keeping in mind my disclaimer (I'm a layperson who'd never heard of cluster sampling before Lancet1), maybe this is an example of the entertaining dispute between Bayesian and orthodox statistics that I've read a little about. A Bayesian, as I understand it, says that orthodox formulas sometimes give absurd results when applied to situations where we have prior information those formulas don't take into account.

So anyway, David, I think whatever formula you are using must presuppose that information about violent deaths in post-invasion Iraq tells us something about violent death rates for pre-invasion Iraq. But that's nonsense. What the Americans did to Fallujah in 2004 tells us nothing about the rate at which Saddam murdered people in 2002. Maybe a bigger survey would have picked up on some really bad Saddam clusters (though I doubt it), but that's not in the data we have. Also, iirc, there were 14 or 15 clusters which had violent deaths post invasion, and only 1, I think, pre-invasion, so that alone tells you things were very different pre and post invasion. I don't think you can get around this problem by pointing at textbook formulas. Maybe you need some clever Bayesian to help you analyze the data in a way that makes sense.

By Donald Johnson (not verified) on 29 Jun 2007 #permalink

To David Kane:

As a layman reading your paper I didn't think much of the implication that those doing the surveys were fraudulent (the mention of them 'forgetting' to ask for death certs, for example).

As far as I can see these people risked their necks to get the data and at least one of them is dead now. So you better have some very good evidence and reasons to rubbish their efforts and question their integrity. From what I can make of this thread here you do not.

I endorse Donald Johnson's suggestion of a Bayesian analysis.

Mu understanding is that most Bayesians do not recognise the existence of outliers or the methods that highlight them. David, then, could hardly object since he is insisting that all the data be used.

Based on the Bayesian courses I have done, a uniform prior distribution for the mortality rate and a Markov chain Monte Carlo method (see http://en.wikipedia.org/wiki/Metropolis-Hastings_algorithm ) to sample points from the posterior distribution would neatly side-step all the agonizing about the Falluja data being/ not being an outlier.

My "gut" tells me the posterior would be bimodal, but I am unsure if I have the level of expertise needed to carry out the analysis (not being an epidemiologist). Any Bayesians willing to take this on .... ?

Thanks as always to Tim for providing such a useful forum for discussion of this topic.

1) Thanks to all for the suggestion/demand for something more than my intuition. The next draft of the paper will feature a mathematical proof, simulation study and pretty graphics, all making the same point.

2) The formulas that are being used here are not "my" formulas. They are the formulas that the authors report using. I do not think that the numbers that they present are consistent with the formulas that they claimed to have used.

3) There is no doubt that a Bayesian analysis would give a different answer. I am a Bayesian myself! But that is not what the authors claimed to have done.

4) Note that one problem with any Bayesian approach can be that you assume the conclusions. There is a lot of (reasonable!) discussion above about how, even before any survey data has been collected, that we can be pretty sure that no place in Iraq is safer now than it was before the war. Fine. But, if you stick that assumption in the model, guess what? You can sure that mortality has increased, even before you collect any data! Now, obviously, one could try to perform a sophisticated Bayesian analysis on this data. I am in favor of that. But it is not my project. (Also, note that the authors will probably refuse to share their data with most of the readers of this blog.)

5) I applaud the bravery of the Iraqi interviews. But, as I demonstrate, it seems highly unlikely that they "forgot" --- Gilbert Burnham's word, not mine --- to ask for death certificates. You have a better explanation?

6) Normally, I like to leave the explanation of basic statistics to Tim, but I'll give it a shot. Confidence intervals are derived from some estimate of the variance, and the formula for variance is what it is. You can look it up! Obviously, Falluja is such an outlier that, if you include it, the variance is huge and, hence, so are the confidence intervals.

7) If readers were interested, I will "tout" my next draft to Tim. Perhaps he would post a link. I certainly appreciate the feedback.

By David Kane (not verified) on 30 Jun 2007 #permalink

If people are getting the Bayesian urge it may be productive to conduct an analysis that synthesizes the results of all of the various Iraq mortality surveys. That should resolve some of the issues around the large confidence intervals.

By The Feral Abacus (not verified) on 30 Jun 2007 #permalink

David Kane,

"I applaud the bravery of the Iraqi interviews. But, as I demonstrate, it seems highly unlikely that they "forgot" --- Gilbert Burnham's word, not mine --- to ask for death certificates. You have a better explanation?"

A better explanation for what, and what is your explanation?

If I understand you correctly you are insinuating that the interviewers made up some deaths, and they then covered up the lack of the death certificate for the made up deaths by writing that they 'forgot' to ask for one.

If so then your explanation flies in the face of common sense. The very first question that springs to mind is why should they go to the trouble of making up the deaths and then not make up death certificates to go with them. For that matter why bother with interviews and why risk getting killed. Just make up the deaths, the interview, and the death certificate and you're done. But that's not what your data show, is it?

David Kane: Normally, I like to leave the explanation of basic statistics to Tim, but I'll give it a shot. Confidence intervals are derived from some estimate of the variance, and the formula for variance is what it is.

If Tim has any idea what you are trying to say, I would certainly appreciate his explanation. The one you give makes no sense at all.

A bootstrapped confidence interval is not derived from an estimate of the variance. Of course you know that; and probably everyone reading this thread knows it, but on the off-chance that some reader doesn't, here is a very good introduction to the bootstrap in a 70-page PDF file. (Seixon, that most entertaining of Lancet-bashers, contributed this link in an earlier thread; BTW does anyone know why he erased his blog?).

As for "the formula for variance", your own link makes clear that the variance of a random variable is defined by reference to an integral. So you need to know what distribution you are dealing with before you can set about calculating the variance.

I don't doubt that you can come up with some distributional assumptions which will lead to the conclusion you have in mind. My suggestion is that you should state your assumptions, rather than just waffling about some unspecified formula being what it is. I second your request to Tim to alert us to your next draft.

By Kevin Donoghue (not verified) on 30 Jun 2007 #permalink

David, if you did try a Bayesian approach, you don't have to assume that Iraq was more dangerous post-invasion than pre-invasion. You only have to recognize that the background changed dramatically beginning in March 2003, so that there is no logical reason to think data about violent death after the invasion tells you anything about violent death before the invasion. Presumably you'd analyze the pre and post invasion data separately.

When you look at what the Lancet authors did, the CI for the "relative risk" when Fallujah is included is 1.6 to 4.2 and when it is excluded it is 1.1 to 2.3. So in that calculation whatever method they're using shows that including Fallujah skews the whole CI upwards in the direction of greater risk post-invasion, as common sense would suggest. On the other hand, I think what you did somewhere was compare the CI for mortality rates before the invasion ( 3.7-6.3) with the two versions of the CI for mortality after the invasion (1.4-23.2 with Fallujah and 5.6 to 10.2 without). I'm guessing the relative risk CI is the more relevant one to use if you're trying to determine if death rates went down, but I admit I don't understand that low end number for the "with Fallujah" mortality rate. Adding a cluster with very high mortality rates to the data set shouldn't, (by my intuition), lower the low end of the CI down to 1.4 when previously it was at 5.6. This is why they tossed the Fallujah data--to me, having read some pro-Bayesian polemics, it sounds like an argument for being skeptical about the statistical method being used, but that's a layperson speaking. I would like to know how this happened, and then I'd like to hear a Bayesian explain whether this is a rational result, or an artifact of the methodology. I'm guessing it's the latter.

By Donald Johnson (not verified) on 30 Jun 2007 #permalink

Donald Johnson wrote:

When you look at what the Lancet authors did, the CI for the "relative risk" when Fallujah is included is 1.6 to 4.2 and when it is excluded it is 1.1 to 2.3.

I haven't been paying too much attention to you guys, but the Bayesian/Frequentist thing is a red herring. The CI's you're quoting are from the Roberts report (i.e., Lancet 1, not the Burnham report), and they're bootstrap CI's. In an earlier R package one of David's assistants (??) put together, he assumed a parametric form for the sampling distribution, and then performed a hypothesis test based on the presumed distribution. [The bootstrap distribution is clearly not well-behaved when Falluja is included](http://anonymous.coward.free.fr/misc/roberts-iraq-bootstrap.png), which is a clue that the standard parametric approach is inappropriate as the basis for statistical inference. That's the reason for running the analysis with and without Falluja--it's the conservative thing to do, not because anyone was trying to hide a statistically insignificant result: including Falluja still doesn't make the CI include 0 excess deaths. If that graph looks familiar, it's cuz I went through all of this a long time ago.

Thanks, Robert. Not that I completely follow you, but I wouldn't expect to. And I'll shut up about Bayes. I knew we were talking about Lancet1, btw--I dug out my copy to find those CI's.

I vaguely recalled your graphs from some earlier threads. What has confused me when I went back to look at LancetI is that the relative risk CI that include Fallujah shifts the whole distribution over to the right (increased excess deaths) and your two graphs show the same effect when Fallujah is included, but the mortality rate for the post invasion period as cited in Lancet1 doesn't behave quite like that. The endpoints are 5.6 to 10.2 without Fallujah and 1.4 (???) to 23.2 with it. That drop from 5.6 to 1.4 is what I find confusing. How'd that happen? I would have thought adding Fallujah would do the same to the post-invasion mortality rate that it did to the relative risk CI--shift the whole CI to the right.

By Donald Johnson (not verified) on 30 Jun 2007 #permalink

Donald Johnson asked:

but the mortality rate for the post invasion period as cited in Lancet1 doesn't behave quite like that. The endpoints are 5.6 to 10.2 without Fallujah and 1.4 (???) to 23.2 with it. That drop from 5.6 to 1.4 is what I find confusing. How'd that happen?

Ugh. This is a tad technical, but the bottom line is that because the bootstrap distribution including Falluja is so ill-behaved, not only is it not particularly well-suited to standard approaches to hypothesis testing, but in addition the endpoints of the CI's are themselves poorly estimated. The technical detail is that there are several ways to calculate bootstrap CI's. For the sample excluding Falluja, all of them agree pretty closely so it doesn't matter which one you use. OTOH, for the sample including Falluja, they vary quite a bit and Roberts (or, probably, Garfield, who was the biostatistician on the team) quoted what's called the "normal" or "standard" bootstrap CI. In these situations, generally I would have used one of the more robust bootstrap CIs which are designed to correct (somewhat) for oddball distributions and that behave more like what your intuition would have predicted, but that's probably overkill. In any event, I view the real lesson as providing more support for treating Falluja differently.