Andrew Bolt: making it up?

One of the few things that Andrew Bolt got correct in his original criticism of the Lancet study was the sample size, 988 households:

Its researchers interviewed 7868 Iraqis in 988 households in 33 neighbourhoods around Iraq, allegedly chosen randomly, and asked who in the house had died in the 14 months before the invasion and who in the 18 months after.

In a later article, Bolt got the number wrong:

Lancet surveyed 788 Iraqi households.

Since the two numbers differ in just a single digit Bolt's erroneous 788 number looks like a simple typo, but when the mistake was pointed out to him, this is what he wrote:

Just to point out one of the false claims you make (that I claimed the Lancet study involved just 778 households, not 988), here is a direct quote from my original article which analysed the Lancet survey: "Its researchers interviewed 7868 Iraqis in 988 households in 33 neighbourhoods around Iraq, allegedly chosen randomly, and asked who in the house had died in the 14 months before the invasion and who in the 18 months after." In fact, on closer inspection of the survey, you will find that not all those households were considered when Lancet's researchers worked out their final death toll. Excluded were households which refused to answer, were absent, and were in the highly atypical city of Fallujah. That explains the 778 figure. The ILCS survey asked respondents for any "war-related deaths" in their households, which is a far broader definition than you claim (and fairer than the one you'd prefer). The survey took into account not just deaths since the war, but during it as well - another mis-statement of yours. Just accept it, please. The claim that the US has 100,000 dead on its hands is preposterous. Or even a wicked lie.

(He says 778 rather than 788 in the passage above because the person he was responding to wrote 778 rather than 788 and Bolt did not pick up on the typo.)

But here is what the study said:

Five (0.5%) of the 988 households refused to be
interviewed. In the 27 clusters with proper absentee
records, we visited 872 households and 64 were absent
(7%). No households were identified in which all the
household members were dead or gone away, except in
Falluja, where there were 23.

There were five households that refused to answer. There were 30 households in the Falluja cluster. The 67 absent households were not included in the 988 households. (This isn't perfectly clear from the description above, but there were 872-64=808 households with someone home in the 27 clusters with absentee records and 6x30=180 households in the other six clusters. 808+180=988.) Subtracting 5+30 from 988 doesn't get you even close to 788. Wrongly subtracting the 64 absent households as well doesn't get you 788. I suppose it is possible that Bolt made a mistake in his arithmetic, but what are the odds that such a mistake would produce a number exactly 200 less than 988? It looks like his explanation may have been made up on the spot. To echo Bolt's own language, his claim about the origin of the 788 number "is preposterous. Or even a wicked lie."

More like this

Did the survey actually ask for "war-related" deaths? Wasn't it just deaths, period?

Right. I missed that the later (war-related) reference was to the ILCS. Thanks.

Neatly skewered, Tim. The man is a disgrace to journalism. Amazing how far bravado and bluster has taken him, really.

It doesn't affect your conclusions, but the number of absent households in the quoted passage is 64, not 67.

*[Thanks. Fixed. Tim]*

Tim,

You seem to place a lot of importance on this survey and go to considerable length to defend it.
How do you explain that the survey can take 20 excess deaths due to violence over 17.8 months in 14 of 33 clusters and turn it into 98000 deaths across all of Iraq.
I am not going to argue that the survey was poorly designed on the contrary the Economist article you cite suggests it is a well devised survey.
The problem I have is turning 20 into 98000.
I would like to show you how I come up with this figure:

If we look at the raw data pre-invasion there where 45 non violent deaths post invasion there where 69.
Based on the fact that the surveys were over different time periods and included different numbers of interviewees the survey came up with the following figures:
Preinvasion 110538 person-months of residency
Post Invasion 138439 person-months of residency
If we use these figures we get the following approximate death rates:
Pre invasion 5 per 1000
Post invasion 6 per 1000
Based on the size and difficulty of the survey then there is effectively little difference in the pre and post invasion non violent death rates.
This is confirmed by the Economist article which states the following:
"The study can be both lauded and criticised for the fact that it takes into account a general rise in deaths, and not just that directly caused by violence. Of the increase in deaths (omitting Fallujah) reported by the study, roughly 60% is due directly to violence, while the rest is due to a slight increase in accidents, disease and infant mortality. However, these numbers should be taken with a grain of salt because the more detailed the data on causes of death, for instance, rather than death as a whole the less statistical significance can be ascribed to them"
So if we now look at just violent deaths we see the following:
Pre Invasion 1 Post Invasion 73 therefore an excess of 72 of these 52 are in one cluster Falluja.
The survey team state the following in their findings section:
"If we exclude the Falluja data, the risk of death is 1.5-fold (1.1-2.3) higher after the invasion. We estimate that 98000 more deaths than expected (8000-194000) happened after the invasion outside of Falluja and far more if the outlier Falluja cluster is included"
This means that the 98000 figure has been extrapolated out of just 20 reported deaths.
Even if we factor in non violent deaths based on the Preinvasion death rates there was an excess of 13.
That is a total of 33 deaths or 1 death in each of the 33 clusters in 17.8 months.
These are tiny figures on which to base massive estimated deaths.
The Economist article concluded with the following:
"The study is not perfect. But then it does not claim to be. The way forward is to duplicate the Lancet study independently, and at a larger scale"
Well this was done by the much larger UN backed survey which came back with a much smaller figure 0f 24000.
Given your continued attack on Lott and his misuse of statistics, (FYI I think Lott's work is total rubbish and you have done a great job exposing it) to defend an outdated study and attack those who criticise it suggests that you are not interested in defending good statistical analysis but instead defend only those that suit your prejudices. This does not do you or your cause justice.

Roscoe, I went into the differences between the ILCS and the Lancet study [here](http://scienceblogs.com/deltoid/2005/05/lancet34.php). The numbers are different because the Lancet number includes the increases in murder and disease that followed the war, while the ILCS number is for directly war-related deaths.

I defend the Lancet study because it is good statistical practice and the people attacking it tend to be ignorant of statistics and politically motivated. Like Bolt.

Rosco:

There have been three surveys which have had wide press reporting:

* Lancet - measuring mortality, comparing two approximately 16 month periods, reporting 98,000 excess deaths in the latter period.
* UNDP - measuring war-related mortality, over a 12 month period, reporting 24,000 war-related deaths.
* Iraq Body Count - measuring press-reported violent civilian deaths since January 2003, currently 25,000

They're all measuring different things. In particular "mortality" and "war-related mortality" are very different.

By Patrick Caldon (not verified) on 07 Sep 2005 #permalink

Guys,

I realise the other surveys were counting different things.
But if you look at the data in the Lancet survey the difference between non violent deaths pre and post invasion is 13!!! That is 13 excess deaths in 33 clusters over 17.8 months.
They were able to convert that 13 into 28,000.
Look at it 2 of those deaths are attributed to children under 15 having a heart attack or stroke. That translates to 4300 children having heart attacks or stroke. These are diseases of the elderly not children.
Pre invasion there where 3 reported heart attacks/stroke in women aged 15-59 but 1 post invasion. Can we infer from this that the invasion dramatically reduced women's likelihood of death from heart disease/stroke. No because the figures are so small.
If Lott claimed that based on 13 deaths in a limited number of areas were handgun restrictions apply that you can imply a death toll of 28000 if the laws where applied nationally you would, rightly, reject this.
It doesn't matter how well designed a survey it was the reality is that it was extremely limited done in very difficult and emotive environment with to little real data to support its claims.

Rosco, if you break things down enough you get samples too small to be meaningfully generalised. But the roughly 100,000 number wasn't based on two or 13 excess deaths but on 30 odd. That is enough for a rough estimate, and the uncertainty because of the sample size is measured in the confidence interval.

Roscoe,

First, can i say that's it's good to see someone who can disagree over the specific issue at hand while remaining polite and reasonable.

I haven't done statistics since my first year of University over 20 years ago but the confidence interval is designed to measure how reliable the end result is.

The sample size in this case was relatively small (although no smaller given the size of the Iraqi population than the average political opinion poll in the US or Australia).

As a result, the 95% confidence interval was wide - but the 98,000 figure was the most likely value within that range.

There's a small chance that the survey could have gotten the results it did if there were fewer than 8,000 additional deaths - and an equally small chance that the actual figure was over 198,000.

Another way of looking at it - the violent death estimate from the Lancet survey was around 25,000. That's actually fairly close to the equivalent figures from the other two sources. That tends to confirm that the Lancet survey was fairly accurate in its violent deaths estimate. That makes me think it likely that the total figure is also pretty accurate.

Of course, it only covered the first 18 months (IIRC) after the start of the war, so the total excess death figure is almost definitely significantly higher now - because there's little evidence of a dramatic improvement in living standards in Iraq in the last year.

By Ian Gould (not verified) on 07 Sep 2005 #permalink

Tim,
Can I give you this quote by you in the Lott/Mustard debate:
"there were 12 defensive gun uses by CCW holders against persons known to the police over the six year period. That's two per year.

It does not seem highly plausible that those two uses prevented 2,000 crimes."

If a ratio of 1/1000 (2/2000) is not plausible what does that imply when the ratio is 1/2970 (33/98000).

It seems to come down to whether 33 is a small figure. Well if the figure of 33 was used to come up with a value of 100 then no it is not small but in this case the figure of 33 was used to come up with 100,000 then it is a very small number indeed.

I think I would have to say:

"It does not seem highly plausible that those 33 reported excess deaths can be used to calculate at national death toll of 98,000"

Ian,
Yous say:
"The sample size in this case was relatively small (although no smaller given the size of the Iraqi population than the average political opinion poll in the US or Australia). "
If you polled 7500 people and the difference between the two parties was 33 what confidence would you have of calling an election win for one party of 100,000?
Before people start jumping up and down about this analogy I know that this is not comparing apples and apples I am just trying to point out that 33 is a very small number.

Can I also state that to come up with an "excess" death toll you need to start with a baseline. In this case that baseline was 46 deaths out of a population of 24 million. Once again I would have to say that 46 reported deaths out of a 110538 person-months of residency figure for this survey is not a very large figure. It is equally plausible that the baseline could have been the 69 deaths from the second part of the survey or anywhere in between.
I am not a statistician so please correct me if I am wrong but the survey states:
"The crude mortality rate was 5.0 per 1000 people per year (95% CI 3.7"6.3; design effect of cluster survey=0.81).
If the Falluja cluster is excluded, the post-attack mortality is 7.9 per 1000 people per year (95% CI 5.6"10.2; design effect=2.0).
"
The confidence intervals show a decent overlap in the death rate so there could in fact be no excess deaths. As I said above I could be completely misreading what this is saying so correct me if I am wrong.

Roscoe, you have it almost exactly right.

The analogy with an Australian electoral poll is completely wrong. Estimation of the ALP versus Coalition vote depends (on a sample of say 1000), 400 respondents going one way and 450 going another way (the rest are voting for someone else). From this, you can make a reasonable estimate, assuming your sampling error is manageable.

A better analogy for the Lancet study, is trying to estimate the vote for the "Free Beer Party", in which the difference between two respondents and four respondents is enormous in projected voter numbers, but obviously nonsense at the ballot box.

Ian Gould is also completely wrong in stating:

"As a result, the 95% confidence interval was wide - but the 98,000 figure was the most likely value within that range."

It's completely improper to assert that the mid point is "the most likely value". What the wide range of confidence tells us is that the sample incidence is ridiculously small, as Roscoe has pointed out.

I conduct surveys for a living, and the statistics cited by the Lancet study would be laughed out in a commercial situation.

The only exception I can think of is TV ratings, in which a relatively small panel is used to extrapolate large numbers. However in that case there is an almost unbelievable effort to establish the representative constitution of the panel, nothing like the sampling effort employed in the Lancet study. Further, in Australia, almost all households watch TV, as opposed to a small number of Iraqis experiencing violent death.

By James Lane (not verified) on 07 Sep 2005 #permalink

Rosco, the quote from my critique of Lott has nothing to do with the present situation. It wasn't about random sampling and in any case the ratio between the size of the sample and the size of the population does not matter -- you need the same sample size if you are sampling New Zealand as when you are sampling the US.

The confidence intervals for before and after mortality rates overlap, but that doesn't mean anything since the before and after samples are not independent. You need to look at the CI for excess deaths. This is wide, but does not include 0, so we there has been a statistically significant increase in deaths.

Why did the Lancet study choose 988 households and then ask the people in the household the same questions.

When I see something, which says was a household survey I immediately infer that ONE "member" of a house, apartment, family unit living under a roof etc.( call it what you want) was asked a series of questions pertaining to that household unit.

The Lancet study as I understand it interviewed all the members of a household. Why was that? Wouldn't this allow for errors.

On average Lancet asked 8 people per household the same questions. Why?