Last year AP-IPSOS surveyed Americans and asked them to estimate how many Iraqi civilians had died in the war. They grossly underestimated the number, with the median estimate being just 9,890. The Atlantic has now published Megan McArdle's latest anti-Lancet screed, where she argues that it would be better if the Lancet studies had not been published at all because they make people more willing to accept higher estimates of Iraqi deaths. Yes, for war-advocate McArdle, the big problem is that people's estimates of Iraqi deaths are too high.
McArdle's piece reminds of me of Neil Munro's hatchet job in the National Journal -- they both pretend to be objective observers, dispassionately recording the arguments between the pro-Lancet and anti-Lancet people, when in reality they are anti-Lancet partisans and the reason why they wrote their pieces was to try to knock down the Lancet studies.
McArdle, you might recall, came up with the macaroni and cheese argument against the Lancet study, and her latest piece isn't any better. She writes:
How many Iraqis have died because of the American invasion? It would be nice to know the local price of Saddam Hussein's ouster, five years on. Many researchers have produced estimates. Unfortunately, these range from 81,020 to 1 million.
This is wrong. The 81,020 number is not an estimate of the number of Iraqi deaths. It's the Iraqi body count number, and it is the number of deaths reported in the media, which is guaranteed to be significantly less than the total number of deaths.
Research by the World Health Organization, published in January in The New England Journal of Medicine, has cast further doubt. It covered basically the same time period and used similar statistical techniques, but with a much larger sample and more-rigorous interview methods.
While the raw sample size of the NEJM study was larger, because they were unable to visit 11% of the clusters and had to extrapolate the estimate to those places, the effective sample size was not larger. And the Lancet study asked about death certificates while the NEJM did not, so it is wrong for McArdle to describe the interview methods as more rigourous
It found that the Lancet study's violent-death count was roughly four times too high. This has a familiar ring to it. A smaller study, released by the Johns Hopkins team in 2004, had been quickly contradicted by a larger UN survey suggesting that it had overstated excess mortality by, yes, about a factor of four.
This is wrong. The UN survey did not measure excess mortality, just war-related deaths, and it covered a different time period. If you compare like with like, the UN survey gets a similar result to the Lancet 1. Furthermore, when you look at deaths over the same time period, the NEJM has a similar number of violent deaths and a larger number of excess deaths.
All casualty studies have problems. But the Johns Hopkins study's methodology was particularly troublesome.
In other words, McArdle is not going to mention any problems with any other studies.
The number of neighborhoods the team sampled was just above the minimum needed for statistical significance, and the field interviewers rushed through their work.
Neither of these statements is true. The minimum size for significance is generally taken to be 30 clusters. Lancet 2 had 47. McArdle has made the claim about insufficient time for the interviews before. It was wrong then. And it is wrong now.
The interviewers were also given some discretion over which households they surveyed, a practice generally regarded as unwise.
This isn't true either.
Cluster sampling was developed for studying vaccination; it has never been validated for mortality.
Cluster sampling was not developed for studying vaccination. And there isn't anything special about mortality that means that it wouldn't work. Does the Atlantic even care whether the stuff it publishes is accurate or not?
Yet though its compromises made it particularly unreliable, the Lancet study remains the most widely known. Its conclusions were the earliest and most shocking of the scientific estimates and thus generated enormous media attention. The more-careful counts that followed prompted fewer, and less prominent, articles.
In fact, the IBC number is the one most widely known. Even though it's not an estimate of total deaths, is usually presented as such, effectively downplaying the number of deaths. This is not a problem to McArdle. In fact she does it in her article. And do you like the way that she works her false claim that the other counts were "more careful" in at every opportunity?
All of this calls into question the idea that even a flawed study is better than no study. Like most people, I believe that more information is usually better; when facts or theories conflict, air the differences and let the facts fight it out. But not every number is a fact. And when the data fall below some threshold of quality, it's better to have no numbers at all.
When articles fall below some threshold of quality, it's better that they not be published. Like McArdle's article here.
Witness the Johns Hopkins team's critics, who triumphantly waved the WHO results at their opponents. But even if "only" 150,000 people have been killed by violence in Iraq, that's a damn high price. Conversely, few of the study's supporters expressed much pleasure at the news that an extra 450,000 people might be walking around in Iraq. After a year and a half of bitter argument, all that anyone seemed interested in was proving they had been right.
McArdle does not disclose that she was one of the more strident critics of the Lancet study and that her article was written to prove that she was right.
Daniel Davies has more criticism here and here, while McArdle threatens to write more anti-Lancet stuff on her blog.
Update: McArdle repeats her threat using more words.
Very good stuff Tim. More to come from me (particularly on the question of McArdle's past views on point estimates versus hypothesis tests of the proposition that excess deaths > 0).
In the meantime, I think it should be noted that Jon Pedersen (whose view on the Lancet results is that the point estimate is too high because the lookback period for prewar deaths is too long) also doesn't regard the ILCS study which he was responsible for as the gold standard in measurement, because the questions about family deaths came right at the end of a long questionnaire.
The reasoning here is quite simple.
Proposition One: America is good.
Proposition 2: Killing is Bad.
Conclusion, since good and bad are different, America must not be killing people.
Or they're bad people who deserve to be killed.
You need to demarcate your first quote of MacArdle:
the one which says "How many Iraqis have died because of the American invasion? It would be nice to know the local price of Saddam Hussein's ouster, five years on. Many researchers have produced estimates. Unfortunately, these range from 81,020 to 1 million."
I've seen that guff about cluster sampling not being validated for mortality studies trotted out quite frequently. Perhaps this is where the meme arose: "In practice, cluster sampling is the most commonly used methodology for mortality surveys. This method was initially designed and validated to estimate vaccination coverage - it was not developed to measure mortality rates." Is there any truth in the claim that it was designed to estimate vaccination coverage? Any historians of statistics around?
In other words, McArdle is not going to mention any problems any other studies.
very good sum up.
i would call this the "David Kane method of analysis".
the number of people who find enormous and numerous problems with the Lancet study, but typically mention ZERO problems with the IBC numbers, is unbelievably high...
As a statistician, I can say:
1) I don't know the history of cluster sampling.
2) In the Lancet articles, and in Tim's blogs, there should be references to previous uses of cluster sampling to estimate wartime mortality. IIRC, there are at least a few cases where there was postwar follow-up, which would allow validation.
3) The history of statistical methods is one of them being developed for one purpose, in one field, and then being used and further developed, in multiple fields. For example linear regression, a favorite of economists, social scientists, and, well, everybody, was developed for astronomical uses.
4) The Lancet surveys were extensively reviewed and critiqued by expert statisticians (Tim, could you help with the references - you've posted on them). The conclusions were (a) not perfect, like everything else, (b) a pretty good effort under the circumstances and (c) the best estimates available.
Speaking non-historically, but as one who doesn't drop things down the memory hole, I recall a certain Megan McArdle recently justifying her support of the Iraq War on the basis of (a) nothing's perfect, and (b) the overall principles of her decision (i.e., the decision rules involved) were sound. By those standards, she should STFU about the Lancet articles.
Barry: The history of statistical methods is one of them being developed for one purpose, in one field, and then being used and further developed, in multiple fields. For example linear regression, a favorite of economists, social scientists, and, well, everybody, was developed for astronomical uses.
Quite so - that's what pisses my off about this "Ooh, cluster sampling hasn't been validated for mortality studies" shite. If these clowns had their way we couldn't use the Poisson distribution for anything except studies of dead horsemen.
I've been trying to locate the source of this notion that cluster sampling is appropriate for studies of vaccination coverage, but not for mortality studies. Megan McArdle didn't come up with this idea all on her own, it's been floating around out there for some time. It seems to have arisen from the fact that numerous papers cite a 1982 study by Henderson and Sundaresan. This paper validated the Expanded Program on Immunization (EPI) simplified cluster sampling method as applied to vaccination coverage. Evidently some readers, when they learn this, get the impression that cluster sampling in general (not just the EPI method) is somehow suspect when applied to other kinds of data.
Here is a link to the paper in case anyone is interested. I should mention that it's a PDF file since it seems to be the done thing to warn readers about that. (Why? Do they cause older computers to overheat?)
Let's apply McArdle's contention to her own argument. She says that irrelevant numbers exhert an anchoring effect. Fine: then wouldn't the earliest estimate of deaths -- the Iraq Body Count numbers -- exhert an anchoring effect? Never mind that the even the IBC admits that their numbers are incomplete. I think it's safe to assume that the weight of that particular anchor is pulling McArdle's concept of a plausible toll down down rather drastically.
It's similar, I suspect, to the downward pressure McCardle exherts on the Atlantic Monthly's reputation.
Thanks, Kevin. Depending on the size of the .PDF file, it can take a while to load; it can stall the browser (that is, if you have IE), and may crash your browser (that is, as you guessed, if you have IE).
And of course some .PDF's are honkin' huge intertubes chokers, if they have a zillion graphs or pictures.
I'm moderately sure that Galton developed the ideas of regression and correlation while studying genetics and inheritance in sweet peas. I'm less sure about cluster sampling but the main theory of it was certainly known by the 1930's and much of the early work was done at various national statistical offices and census bureaus (including but not limited to those of the US, Great Britain, and France).
See the Wikipedia article on the Gauss-Markov theorem:
Gauss was playing with this long before Galton. IIRC, Galton was using regression in bivariate normal case: E[(Y-MuY)/Sigma(Y)] = Rho(Y,X)*(X-MuX)/Sigma(X),
which is certainly important, an extension/generalization, but the concept of a linear model was there. Galton basically invented(?) the concept of correlation and applied it to a previously existing technique.
1) I was sloppy on the term 'regression'; I meant to include any and all linear models.
2) Galton's use of regression for two related biological measurements is still an example of something which was invented in one context (parent vs offspring relationships) and extended like crazy.
I was sloppy on the term 'regression'; I meant to include any and all linear models.
Gauss developed squared loss as an objective function which extends beyond linear models. He was a pretty sharp guy. Galton, although only an amateur mathematician (Pearson had to help him with the hard parts) was also a pretty sharp guy. Speaking of things invented in one context and used elsewhere, another thing Galton did was re-invent stochastic branching processes while studying the extinction of family lines (again with the genetics and heredity thing). Bienayme had figured that stuff out earlier but published in an obscure journal. We now know stochastic branching processes better as the mathematics that describes nuclear fission.
Mathematics was originally developed to calculate and record the details of Sumerian trade deals and tax receipts.
It's clearly inappropriate to use it any other purpose.
And, of course, Sumer was in Mesopotamia, otherwise known as (drumroll) Iraq.
Therefore, all mathematics is objectively pro-Saddam.
Galton was a bright guy, but let's not forget that he was also an early advocate of eugenics. Doesn't detract from his statistical work, of course.
Barton Paul Levenson wrote:
Galton was a bright guy, but let's not forget that he was also an early advocate of eugenics.
Yeah. In fact, I think he was the guy who coined the term. He also coined "regression to the mean," which was version 2.0 of his original phrase "reversion to mediocrity." Nonetheless, it appears that we view the term "eugenics" in, um, a slightly more colored light than he did when the ideas first came to him, deeply influenced as he was by "The Origin of the Species," written by his cousin. Bright family.
Speaking of the history of statistics, much of the work in formal stat toward the end of the 19th C. and beginning of the 20th resulted from "eugenicists" working on "eugenics" problems, loosely defined. Fisher's early work on biometry led to ANOVA, and work on gene frequencies led to the method of maximum likelihood. That much of the early work on the other half of the field, probability, was prompted by investigations into another area with some moral baggage, i.e., gambling, is kinda amusing.
Are there any studies that link refugee numbers to violence? It's broadly accepted that there are 2 million internally displaced Iraqis and 2 million refugees who have fled the country. Isn't it logical that they are fleeing something real and tangible in numbers roughly equivalent to other crisis situations around the world? A 44-to-1 ratio of refugees to deaths (using IBC numbers) seems awfully high. If something bad happened to one person on a block does it make sense for the entire block to get up and leave?
Tim: great post. For those of us who don't keep the differences between the various studies at our fingertips, what would be really helpful is a short table that summarizes the key elements and findings of each study all in one place.
the number of people who find enormous and numerous problems with the Lancet study, but typically mention ZERO problems with the IBC numbers, is unbelievably high...
Posted by: sod | March 28, 2008 5:06 PM
What is this other than a desire for a favoured straw man to enter the debate ?
Wouldn't you assume any discussion on Iraq death estimates would exclude examination of shortfalls in IBC numbers for the same reason Tim cites more than once in the article you are commenting on ?
Namely that they ain't an estimate.
If your desire in debating death estimates is to argue the merits of something that isn't one, why bother pretending this is anyone's shortcoming but your own.
Isn't it logical that they are fleeing something real and tangible in numbers roughly equivalent to other crisis situations around the world?
Posted by: joejoejoe | March 29, 2008 5:36 PM
No, it's not. There's quite a few ways this makes no sense.
In fact the only short way to describe it would be "every single thing these people don't have in common, which is practically everything".
"Wouldn't you assume any discussion on Iraq death estimates would exclude examination of shortfalls in IBC numbers for the same reason Tim cites more than once in the article you are commenting on ? Namely that they ain't an estimate."
Kilo, as you surely read in the post above, McArdle did cite the IBC numbers as if they were an estimate.
Megan has a new pasta recipe up on her blog. Somehow death and pasta seem to have become linked in her mind.
"Kilo, as you surely read in the post above, McArdle did cite the IBC numbers as if they were an estimate."
Posted by: nigel holmes | March 30, 2008 4:14 AM
And the criticism of her doing so was what ? The answer can be found in a re-reading of what I wrote.
"Wouldn't you assume any discussion on Iraq death estimates would exclude examination of shortfalls in IBC numbers for the same reason Tim cites more than once in the article you are commenting on ? Namely that they ain't an estimate.
If your desire in debating death estimates is to argue the merits of something that isn't one, why bother pretending this is anyone's shortcoming but your own."
But Megan's article, which is the subject of this post, used the IBC number as an estimate, so it is appropriate to point out why it isn't and why that number is bound to be too low if used as an estimate.
This is the first section of a piece I wrote early on in this debate.
'Iraqi Death Estimates Celled Too High; Methods Faulted' 2
The recently published study from a John's Hopkins team of epidemiologists regarding mortality in Iraq 1 met with several widely publicised responses, one of which was this:
'George W. Bush immediately dismissed the study, characterizing its methodology as "pretty well discredited."' p1 li 10 2
I think it is important to dwell for somewhat longer than Bohannon had space to on the implications of Mr Bush's statement.
The Centres for Diseases Control, The World Health Organisation (WHO), the Training in Epidemiology and Public Health Network, and others, recently convened to examine training of public health professionals in the context of 'Violence and Health'. A review of the role of epidemiologists was the result 3.
Of interest, after an admittedly poorly specified literature review, the authors noted that 'rapid survey methods are most often used to describe the current needs of a population in conflict', and furthermore 'The most common approaches for selecting participants in rapid population surveys during conflict are simple random sampling and cluster sampling.'
Cluster sampling for rapid surveys represents a trade off between precision and the cost of data acquisition. It has a long pedigree in the non-conflict field, and is perhaps most widely known for it's use within the WHO's Expanded Program of Immunization (EPI) 4,5. For those so interested, I provide a few references, by no means an exhaustive catalogue, of its' use in the immunisation setting 6,7. A glance at the titles will reveal that this method has been of importance in some of the major successes of modern medicine: the eradication of smallpox, and near eradication of poliomyelitis. The method has since been extended to fields such as assessment of bed net distribution for malaria prevention 8, use of drugs to prevent malaria during pregnancy 9 and is providing more accurate insights into the dynamics of the current Sub-Saharan HIV epidemic 10.
It is important to realise that the classic WHO 30 cluster, 7 household survey is not much good for rare events 4, at least if the confidence interval is to be narrow. Mortality, thankfully, is a comparatively rare event. The method however is standard and its' weaknesses and modifications well discussed.
This has not provided a barrier to epidemiologists using it to study mortality related to conflict, often with a modification of the cluster or household numbers, for example in the Democratic Republic of Congo 11, Mozambique 12, Kosovo 13 and Sudan 14. Once again I do not present an exhaustive literature search.
Recently there has been a comparison of systematic and cluster methods for assessing mortality in the same population, near contemporaneously 15. Although there were differences in some indices, none were statistically significant, and distributions were well preserved. Not surprisingly the confidence intervals for cluster sampling were wider and some estimates were not possible. As stated above, this represents the compromise made for expensive data. In the case of Iraq, where those thought to be associated with the invading forces have been systematically threatened, targeted, and sometimes killed 16, the compromise would seem justifiable.
I hope this goes some way towards explaining just how the methodology chosen by Burnham et al has been 'pretty well discredited'.
1Burnham et al, 'Mortality after the 2003 Invasion of Iraq: a Cross Sectional Cluster Sample Survey', Lancet 2006, 368, 9545, 1421
2Bohannon J, 'Iraqi Death Estimates Called Too High; Methods Faulted', Science 2006, 314, 496
3McDonnell S et al, 'The role of the Applied Epidemiologist in Armed Conflict', Emerging Themes in Epidemiology 2004, 1,4
4Bennett S et al, 'A Simplified General Method for Cluster Sample Surveys of Health in Developing Countries', World Health Statistics Quarterly, 1991, 44, 98
5Hoshaw-Woodard S et al, 'Description and Comparison of the Methods of Cluster Sampling and Lot Quality Assurance Sampling to Assess Immunization Coverage' WHO/V&B/01.26, 2001
6Henderson RH, 'Assessment of Vaccination Coverage, Vaccination Scar Rates and Smallpox Scarring in Five Areas of West Africa', Bulletin of the World Health Organisation, 1973, 48, 173
7Balraj V & John TJ, 'Evaluation of a Poliomyelitis Immunisation Campaign in Madras City', Bulletin of the World Health Organisation 1986, 64, 6, 861
8Grabowsky M et al, 'Distributing Insecticide Treated Bed Nets During Measles Vaccination: A Low Cost Means of Achieving High and Equitable Coverage', Bulletin of the World Health Organisation, 2005, 83, 195
9Fylkesnes K et al, 'Studying Dynamics of the HIV epidemic: Population Based Data Compared With Sentinel Surveillance in Zambia', AIDS, 1998, 12, 10, 1227
10Holtz TH et al, 'Use of Antenatal Care Services and Intermittent Preventive Treatment for Malaria Among Pregnant Women in Blantyre District, Malawi', Tropical Medicine and International Health, 2004, 9, 1, 77
11Coghlan B, 'Mortality in the Democratic Republic of Congo: a Nationwide Survey' Lancet 2006, 367, 9504, 44
12Cutts FT et al, 'Child and Maternal Mortality During a Period of Conflict in Beria City, Mozambique', International Journal of Epidemiology, 1996, 25, 2, 349
13Spiegel P & Salama P, 'War and Mortality in Kosovo, 1998-1999: an epidemiological testimony', Lancet 2000, 355, 9222, 2204
14Get the Sudan Reference
I say this every time the topic comes up, but as nobody else has picked it up I'll say it again:
If a party in a dispute doesn't produce figures that they have, or can get, then the standard - the reasonable, the bog-standard legal - assumption is that they aren't producing the figures because the data would undercut their position, and in most cases we can proceed on that basis.
We're perfectly entitled to operate on the assumption that if the US won't release its estimate of Iraqi deaths, that's because it's high enough to make McArdle choke on her macaroni; close to, or higher than, Lancet. If it was substantially lower they'd release it.
We're perfectly entitled to operate on the assumption that if the US won't release its estimate of Iraqi deaths, that's because it's high...
Posted by: Chris | March 30, 2008 7:14 PM
Huh ? After 5 years you've got around the same number of estimates.
As someone who fails this test themselves, wouldn't it be more reasonable for your to assume that a party not producing their own estimate -- particularly one who's on record saying they don't track this data -- simply doesn't have an estimate of their own ?
jodyaberdein, you don't seem to come to any conclusion. I can't tell if you're honestly agreeing with GWB or being sarcastic. I hope the latter.
what are the references 15 and 16 you cite in your text?
Oops, chopped the article to soon.
14 Depoortere E et al. 'Violence and mortality in West Darfur, Sudan (2003-04): epidemiological evidence from four surveys', Lancet 2004, 364(9442):1315
15 Rose AMC et al, 'A Comparison of Cluster and Systematic Sampling Methods For Measuring Crude Mortality', Bulletin of the World Health Organisation, 2006, 84, 4, 290
16Sands P, 'Interpreters Used By British Army 'Hunted Down' By Iraqi Death Squads', UK Independent, World News, 2006, Nov 17th