Flypaper for innumerates

Yet another person has tried to refute the Lancet article. John Brignell dismisses the study just because:

A relative risk of 1.5 is not acceptable as significant.

Actually the increased risk was statistically significant. You won’t find support for Brignell’s claim in any conventional statistical text or paper. To support his claim he cites a book called Sorry, wrong number!. Trouble is, that book was written by…. John Brignell. Not only that, it was drafted by… John Brignell. Brignell is a crank who dismisses the entire field of modern epidemiology as some sort of plot by scientists to scare people. We encountered him before in this post where, armed with no evidence whatsoever, he insisted that the ozone hole had always been present.

To see how silly Brignell’s “relative risk of 1.5 is not acceptable as significant” claim is, consider this: Suppose we had perfect records of every death in Iraq and there were 200,000 in the year before the invasion, and 300,000 in the year after. Then the relative risk would be 1.5 and Brignell would dismiss the increase as not significant even though in this case we have absolutely certainty that there were 100,000 extra deaths.


  1. #1 Heiko
    November 7, 2004

    You’ve found a number of downright incompetent criticisms of the Lancet study.

    That, however, doesn’t mean that the headline grabbing conclusions of said study have much validity in my humble opinion.

    Before I go into what I consider valid criticisms, let me join in dismissing silly arguments:

    1. A relative risk of 1.5 can mean next to nothing, say when you have 3 cancer deaths in 100,000 children and were expecting

    2. When we’ve got 90 deaths, while we were expecting 60, in a population of about 7,500 (as, roughly, in the Lancet study), that’s a rather different kettle of fish, as long as the deaths occur randomly. If they were to always occur in clusters of 30 deaths at a time (just as an example), we’d have 3 rather than 2 clusters of deaths and no significant result.

    But, as you say it’s silly to describe a relative risk of 1.5 itself as always insignificant.

    2. It’s beyond me how Fumento managed to look up CIA factbook estimates/projections for 2004 and to then claim they were falsified by Saddam and actually for the year 2002.

    3. You are, of course, entirely correct that the death rate in a country with a young population will be higher than in a country with an older population, all other factors being equal. Again Fumento’s argument is based on completely getting the science wrong.

    4. Tim Worstall has already profusely apologised for his own poor understanding of statistics.

    Now, maybe you could explain some criticisms I haven’t yet seen a valid counterargument for:

    1. Unicef’s number for infant mortality in 2002 is 102, which is massively higher than the rate of 29 found in the Lancet study. The authors write: “First, the preconflict infant mortality rate (29 deaths per 1000 livebirths) we recorded is similar to estimates from neighbouring countries.” While this is true, their source is a WHO webpage, who themselves reference Unicef.

    a) One counterclaim is that the accuracy of the infant mortality estimate doesn’t matter. Only the relative increase does. However, as the authors themselves state self-reporting tends to lead to undercounting of infant deaths further in the past. Or in other words, the whole increase may be a reporting artefact.

    Before the invasion they’ve got 46 reported deaths (45 non-violent deaths), afterwards, they’ve got 142. Out of those 142, 73 are from violence and 52 from violence in Fallujah. Non-violent deaths after the invasion amount therefore to 69. However, the time periods covered differ (14.6 and 17.8 months respectively) and so do the populations (7438 and 7868 respectively). Adjusting for those two factors (and assuming the same age and gender distribution) we’d expect 58 non-violent deaths post-invasion (45 multiplied by 17.8 divided by 14.6, multiplied by 7868 and divided by 7438). The actual reported number is 69, or 11 more. Which then leads to the following correct (based on their data anyway) conclusion by the authors that: “beyond the elevation in infant mortality and the rate of violent death, mortality in Iraq seems otherwise to be similar to the period preceding the invasion.”

    The CIA estimate/projection of infant mortality for 2004 is of similar magnitude as that calculated by the Lancet study. It is also roughly half the pre-liberation estimate by Unicef, and in line with the CPA goal of halving infant mortality (a historic review of CPA achievements, see their website, it’s referenced in the Lancet study – note that the halving is aimed for by the end of 2005, rather than a projection for 2004).

    b) I’ve also seen one or two people attempt to discredit the Unicef number for 2002. It’s claimed their number is based on old studies. But in their report on the situation of the child in Iraq (see their website, it’s an extensive report), they claim a massive cluster study (much bigger than the Lancet study) in the year 2000. While I am not sure how they arrived at their number for 2002, it’s hard for me to see how they would be able to get it massively wrong. At the very least they’d have a very recent study at their disposal and plenty of on the ground information from Iraq.

    2. The issue of self-reporting and death certificates is a thorny one. They say they tried to confirm at least two non-infant deaths per cluster. Fallujah is responsible for 52 reported violent deaths (the Lancet study does not reveal how many non-violent deaths are attributable to Fallujah). How many of those did the interviewers get confirmed? Any?

    How about the children in particular? Note that 24 children are reported killed in Fallujah, but only 3 women. That implies the killing was either non-random (or extraordinary chance) or lying. Non-randomness would for example by explained by clustering (eg a local school got hit, and a single event would then be responsible for nearly all of those 28 deaths).

    3. It’s not just infant mortality rate pre and post-war that might suffer from a faulty methodology. The pre-war death rate reported in the WHO web page referenced by the Lancet study is 8, which is in line with the Lancet’s own post-war estimate (ex Fallujah), and substantially above the 2004 estimate in the CIA factbook (of 5.66), which, however is similar to the pre-war Lancet estimate.

    The authors point out that their study might underrepresent groups such as military personnel, because of the requirement that the deceased must have lived at least 2 months in the household before their death. That might also exclude prisoners held by Saddam I should note.

    After the invasion, most combatants killed by coalition forces would not have lived in military barracks (which weren’t visited by the researchers) but in family homes.

    4. Violent deaths ex Fallujah are not broken down in sufficient detail. Out of 21 deaths, 4 are children, 13 men, 2 women and 2 elderly. 39% of the population was less than 15 years old, 57% between 15 and 59, under half of which was male. What the violent deaths ex Fallujah aren’t broken down into is cause of the death (crime, coalition caused etc.), that’s only done for all violent deaths, ie including Fallujah.

    That’s unfortunate, because it leaves me guessing as to what caused the deaths I consider more reliably reported. However, based on the population breakdown, men of fighting age are heavily overrepresented.

    And ex Fallujah and for women and children the sample size becomes a real issue. 4 children (out of which one might conceivably be a combatant) and 2 women is a very low number for statistically significant conclusions. We are talking a maximum of six incidents here (6 individual deaths, it’s also possible that one incident would have involved 2 children and one woman dying, cutting down the number of incidents to 4).

    In order to support their contention that most excess mortality was due to coalition bombing, and most victims were women and children, the authors have to rely on Fallujah.

    They don’t say that clearly in their abstract. In fact, they phrase it in such a fashion that you might feel that they exclude Fallujah. However, they only do that for the headline grabbing fatality figures (about 100,000), NOT for their conclusions as to the causes of that mortality.

    SUMMARY: I don’t think one can conclude very much at all from the study.

  2. #2 Tim Lambert
    November 7, 2004

    As far as I can tell, the UNICEF child mortality number is from a study conducted in 2000, as is the WHO mortality rate of 8. It seems quite possible that the oil-for-food program had brought these down to the levels in neighbouring countries by 2002.

    What is quite clear is that the violent death rate has vastly increased since before the war. Yes, once you break things down further, estimates get less reliable, but these are still our best guesses as to what is causing those excess deaths.

  3. #3 Heiko
    November 7, 2004

    And re-read the Unicef report on Iraq.

    The values in appendix 1 of the WHO report linked above are estimates for 2002, and they report values at the 2.5th decile and the 97.5th decile (for under 5 mortality, separately for males and females, eg for males 101 to 139).

    From what I can tell, yes, the survey in 2000 is one source, but Unicef has collated all the available information at its disposal, put it into their models, and come up with a number for 2002, which they believe to be reasonably accurate.

    The Unicef report also makes very clear that between 2000 and 2002 not enough changed to explain the kind of brutal fall in infant mortality implied by the Lancet study.

    Note in particular this paragraph:

    “The attempts to ameliorate the impact of economic sanctions has not addressed the basic or underlying causes for infant and child mortality and under-development. Only sufficient and reliable resources and comprehensive management and planning will promote children’s rights to life and survival. This would include not just repair and overhaul of the physical systems providing services to the population, but also the necessary capacity building and staff remuneration.”

    Your second paragraph ignores most of my criticisms. Yes, I agree that the incidence of violent death for people living in their homes (rather than army barracks or Saddam’s prisons) has likely increased substantially. There is also a good case that violent crime has skyrocketed.

    But the study claims that the main culprit is coalition bombing of women and children, and for that claim they do need to include Fallujah.

    Ex Fallujah, the data they present leaves ample room for coming to the conclusion that most of those killed were combatants, not to mention that the sample is too small, when trying to attach blame. One or two suicide bombings could easily have turned up 21 violent deaths caused by terrorists, for example. The number of violent deaths caused by anti coalition forces in the study is 2. The number of violent deaths of women and children ex Fallujah is 6. These numbers are far, far too small to extrapolate to all of Iraq.

    Finally, I’ve got a statistics question for you. What is the median estimate of violent deaths ex Fallujah (assuming the 21 cases to be real and representative)? Is is closer to 60,000 or to 100,000?

  4. #4 Tim Lambert
    November 7, 2004

    As far as I can tell, the high estimates of infant mortality are based on a survey similar to the Lancet one that was conducted in 2000. They didn’t expect much of a decrease by 2002, but I don’t see how that would trump actual measuring it. Furthermore there is a large uncertainty in the Lancet measured infant mortality. It could have missed the main hot spots for infant mortality leading to an underestimate of the pre-war mortality. That does not invalidate their finding of an increase, since the same clusters were used for the before and after figures.
    The paper also notes:

    In the absence of any surveys, however, they have relied on Ministry of Health records. These data have indicated a decline in young child mortality since February, 2001, but because only a third of all deaths happen in hospitals, these data might not accurately represent trends.10 No surveys or censusbased estimates of crude mortality have been undertaken in Iraq in more than a decade, and the last estimate of under- ve mortality was from a UNICEFsponsored demographic survey from 1999.

    I’ve already noted that the sample size is small enough that the further breakdowns have a high degree of uncertainty. I get a rough estimate of 60,000 violent deaths outside Falluja.

  5. #5 Heiko
    November 7, 2004

    Firstly, thanks for the civil discussion, which is much appreciated.

    The authors themselves point out that self-reporting of infant deaths may lead to artificial underreporting of years further removed in time.

    I don’t know exactly (there are some indications from their reports) what data went into the WHO and Unicef models, but if WHO gives a confidence interval, with 101 as the 2.5% percentile lower bound (for under 5 mortality of boys), I for one do think it’s reasonable to suppose that their accuracy could beat the Lancet study.

    Maybe, you could state explicitly that 8000 is not a lower bound for the number of civilians killed by the coalition, and 100,000 is not a best estimate.

    It’s anything between a few thousand and a few ten thousand, and the Lancet study provides insufficient data to say whether 3,000 or 15,000 is a more reasonable figure.

    In fact, if you were as gracious about it as Tim, I’d expect you to denounce the study for the way it handles Fallujah, so as to make it sound, as if they’d found that of the order of 100,000 innocent Iraqi civilians outside of Fallujah got killed by the coalition.

    Wouldn’t you agree that that’s the way the result of this study gets condensed down to as a soundbite? And that that’s rather misleading?

  6. #6 Dano
    November 7, 2004

    Wouldn’t you agree that that’s the way the result of this study gets condensed down to as a soundbite? And that that’s rather misleading?

    If I may, the toot-tooting of the results of the study shouldn’t get mixed up with the results of the study. It is not the author’s fault the editor felt a need to be a pitchman.

    Picture the Lancet study results in the future, after, say, 5 other papers’ results are published. The Lancet paper is another data point, used to ascertain a rough death toll due to the increased violence in Iraq.

    In the future, you’ll note whether the Lancet study is higher or lower than the mean of the other studies. Are all the papers that have higher death tolls than the mean misleading? Should all those papers be denounced? Why or why not?

    Sorry. I’ll withdraw from this thread now.



  7. #7 Tim Lambert
    November 7, 2004

    The Lancet study says there have been no surveys on infant mortality since 1999, so the confidence interval on the WHO number is probably from that study. Again, I don’t see why infant mortality could not have imporved since then.

    The 100,000 is an estimate of excess deaths, not coalition-killed civilians. The study is quite clear on this, so I find it strange that you think I should denounce the study if some third party misrepresents their findings.

  8. #8 Heiko
    November 7, 2004

    “We estimate that 98 000 more deaths than expected (8000 – 194 000) happened after the invasion outside of Falluja and far more if the outlier Falluja cluster is included. … Most individuals reportedly killed by coalition forces were women and children…. Making conservative assumptions, we think that about 100 000 excess deaths, or more have happened since the 2003 invasion of Iraq. Violence accounted for most of the excess deaths and air strikes from coalition forces accounted for most violent deaths.”

    All right, “denounce” is a pretty strong word, but I don’t think they are terribly clear here. Far from it, it took me a fair while, and only after looking at their study in detail, to get the fact that their conclusion that “Most individuals reportedly killed by coalition forces were women and children” can only be justified by including Fallujah.

    On the other point, if WHO used the 1999 study as the base point, they extrapolated it to the year 2002 based on other data (see their explanatory notes), and I think it would be misleading for them to use that study’s confidence interval. After all, they claim to be using the same methodology for confidence intervals across all countries in that table.

    To take this further, we’d probably be best served by writing to the authors of the Lancet study and Unicef/WHO for further clarification.

  9. #9 dsquared
    November 8, 2004

    I’d note that infant death is a rare event, and it’s likely to be heterogeneous across Iraq. Because of this, it is not unreasonable to suspect that the cluster sampling methodology has undersampled infant deaths. What is unreasonable, of course, is to claim that the survey simultaneously undersampled infant deaths and oversampled deaths by violence.

  10. #10 Heiko
    November 8, 2004

    Let me add a further clarification by WHO on their methodology:
    “To capture the uncertainty resulting from sampling, indirect estimation technique or projection to 2002, a total of 1000 life tables have been developed for each Member State. Uncertainty bounds are reported in Annex Table 1 by giving key life table values at the 2.5th percentile and the 97.5th percentile. This uncertainty analysis was facilitated by the development of new methods and software tools (5).”

    I don’t see how one could read this as “they must have used the confidence interval of that old study in 99”.

  11. #11 Tim Lambert
    November 8, 2004

    Fine, so they recalculated the confidence interval for the old study. The Lancet study is quite clear that their’s is the first survey since then.

  12. #12 Dano
    November 8, 2004

    Wouldn’t you agree that that’s the way the result of this study gets condensed down to as a soundbite? And that that’s rather misleading?

    If I may, the toot-tooting of the results of the study shouldn’t get mixed up with the results of the study. It is not the author’s fault the editor felt a need to be a pitchman.

    Picture the Lancet study results in the future, after, say, 5 other papers’ results are published. The Lancet paper is another data point, used to ascertain a rough death toll due to the increased violence in Iraq.

    In the future, you’ll note whether the Lancet study is higher or lower than the mean of the other studies. Are all the papers that have higher death tolls than the mean misleading? Should all those papers be denounced? Why or why not?

    Sorry. I’ll withdraw from this thread now.



  13. #13 PaulP
    November 8, 2004

    Before criticising Professor Brignell, you might like to read his book. In it he did runs of random number generators and got “risk ratios” on the order of 1.5 (in one case, he “obtained a relative risk of 1.63, when there was no underlying effect at all”. That’s a quote from the post you are criticising. If an absence of an underlying effect can produce a bigger result that the 1.5 quoted then there is no way anyone can impute anything other than normal statistical variation to that 1.5.
    In other words: you should apolgize to the professor for misrepresenting what he wrote. Reading you comment one would think he was setting himself up as an authority and then invoking that authority to dismiss results he does not likne. On the contrary he has clearly and openly presented his arguments about low risk ratios.
    What we got from the Lancet was the politically inspired publication of nonsense. A confidence interval from 8000 to 194000 is useless at the best of times, even if the study’s data had been collected in a perfect manner. To go from a possible 30 excess deaths (we cannot even say this number is accurate) to the “1000,000 killed by Iraq invasion” is not science.

  14. #14 PaulP
    November 8, 2004

    While I’m at it:
    “Suppose we had perfect records of every death in Iraq and there were 200,000 in the year before the invasion, and 300,000 in the year after”.

    Whether the difference is significant depends in the forst instance on the variablility of the 200,000. If the annual death rate has a standard deviation of 100,000 then the 300,000 is unexceptional.

  15. #15 Rob
    November 8, 2004

    To say a confidence interval of 8000 to 194,000 is useless reflects your own misunderstanding. That interval does ot include zero and as such tells us a whole lot. That you don’t like what it says means nothing.

    As you yourself just shown, ignoring the study because that risk factor is “only 1.5” is foolish. It depends on the variation and so dismissing it out of hand is just wrong.

  16. #16 PaulP
    November 9, 2004

    When you base a CI of 8000 to 190000 on a (possible) real number of 30 cases you know you are in trouble. A clue comes from that fact that the upper bound is more than an order of magnitude greater than the lower.
    As Professor Brignell has pointed out, when Fisher was inventing confidence intervals he chose 95% purely for computational convenience. If it had been more convenient then 99% would have been used. (And I am wondering why it wasn’t. Perhaps it would have included 0?)

    You might have a think about these numbers:
    30 excess deaths (90-60) in 7500 gives 100,000 in 25 million (hence the headlines). Do the maths backwards from 8000 deaths: you get an excess of 2.5, in other words 62.5 rather than 60, reported by a sample of 7500 people. That’s not a lot when you take into account the difficulties of collecting the data. Now by the statistically correct interpretation of confidence intervals, all the authors could say is that we can be 95% certain that between 8000 and 194000 excess deaths would be reported in this survey had it been carried out on the entire population. And that means the 8000 is as likely to be the true figure as any other in the CI. Which means the true excess deaths for the 7500 might be only 2.5. Well within the possibility it is down to the problems of collecting the data (such as inaccurate or selective recall or even downright lying) in the first place.

  17. #17 Tim Lambert
    November 9, 2004

    Paul, you don’t understand what the “statistically correct interpretation of confidence intervals” is. A 95% CI does not mean that there is a 95% chance (see here for an explanation). And it is not true that all the values in the CI are equally likely — the ones in the middle are more likely than the ones at the ends. And none of this allows for the fact that the high death rate Falluja cluster was excluded.

  18. #18 Per
    November 9, 2004

    Let’s get this right.
    You have a non-validated measurement of deaths (“i think I know somebody who died…”).
    You have a confidence interval of 8- 190; which is not a small confidence interval. In fact, it is notably large.
    So the measurements are both dodgy, and non-verifiable, allied to a massive confidence interval, and there are real problems about heterogeneity of sampling.

    Science is about establishing what reality is. The caveats associated with this study are of sufficient magnitude that you would be unwise to rely on this study for any significant purpose.

  19. #19 Tim Lambert
    November 9, 2004

    Per, how come most critics of the study manage to misrepresent it? More than 80% of those asked could verify with death certificates. And it wasn’t “I think I know someone”, but a household member.

  20. #20 PaulP
    November 9, 2004

    Tim: regarding the interpretation of the CI. Your version is what I used to think. Then a mathematical statistics book I came across gave the version I used. Here’s one online source I found from Google:
    And I started by asking you to withdraw your slur on Professor Brignell, and explaining why. Care to explain why you are refusing ?

  21. #21 dsquared
    November 9, 2004

    there are real problems about heterogeneity of sampling

    It’s possible to do one’s own amateur epidemiology here, tracing the vectors of innumeracy round the internet. This is a reference to Shannon Love’s critique, originally posted on the Chicago Boyz weblog. I think I actually won this particular game of whack-a-mole; it’s pretty much been established to the satisfaction of all present (or at least, everyone else who posted on that comments thread has given up arguing with me!) that the problem of heterogeneity in sampling would be more likely to lead to an underestimate of the death rate than an overestimate.

  22. #22 dsquared
    November 9, 2004

    Talking as a Bayesian, one might say something similar to Paul’s reference. But Bayesians don’t talk about confidence intervals as a rule.

    Just clearing up a loose end, Brignell’s Monte Carlo experiment is silly. If you let me select the standard deviation and do enough model runs, then for any relative risk ratio R that you select, I can produce at least one run of a random number generator that gives a relative risk ratio greater than R. This doesn’t prove anything about risk ratios; it just proves that at least one hack has nothing better to do with his time than sit around cherrypicking runs of random numbers.

  23. #23 PaulP
    November 9, 2004

    Tim: more on the interpretation of CI. This lnk suggests that the two interpretations come from the two different schools of interpretation of probability, the frequentist and the Bayesian.
    On reflection on this one issue I think we are both arguing away from the point I was making, namely that going from the study’s numbers up to the whole of Iraq means making a statement about what the whole of the Iraqi population would say when faced with the same survey. Which is not the same thing as making a statement about how many people actually died: in other words, the values of the confidence interval relate only to what the responses of the whole Iraqi population would be to the same survey. How “confident” we can be that these responses are the same or close to the actual number of deaths cannot be answered numerically.

  24. #24 Tim Lambert
    November 9, 2004

    Paul, the definition you linked to is wrong. My comments about Brignell are accurate. Just because in some situations a risk factor of 1.5 is not significant, it does not follow that it is never significant. The question of whether the risk factor is statistically significant is answered by looking at the confidence interval. For the Iraq study that interval does not include 1, so the 1.5 factor is significant. This is basic statistical theory which you can find in any elementary text. Or you can read Brignell’s books with his theories about how all the scientists are wrong.

  25. #25 PaulP
    November 9, 2004

    Well Tim, it’s good to know that the whole of the Bayesian interpretation of probability is wrong, just because you say so. (That’s typical crank behaviour, by the way).

    As for 1.5 being “significant”, statistical significance always comes with a level so the statement “a risk ratio of 1.5 is significant” is nonsense in the absence of the level (here 95%).
    And it is always necessary to keep an eye on the variability of the population being sampled. Are you really saying that in a population whose standard deviation is 0.5 the value of the mean, a sample with RR 1.5 is significant?

  26. #26 PaulP
    November 9, 2004

    Prof. Brignell’s point about 1.5 withstands your objection. It is precisely his point that unless you know what’s going on in the underlying population you cannot tell whether 1.5 means anything. In particular you need to know how much the population mean will vary before you can say that one sample mean is significantly out of line.

  27. #27 Tim Lambert
    November 9, 2004

    No, I’m not saying the Bayesians are wrong, but that the CIs reported in the Lancet are not Bayesian. If you agree that the statement “a risk ratio of 1.5 is significant” in the absence of a level. Then you must agree that Brignell’s “a risk ratio of 1.5 is not significant” is nonsense.

    Measures of statistical significance account for variation of the population. That’s the whole point.

    And nice attempt to rewrite Brignell’s point but that’s not what he wrote. Go back and look at again. He stated that
    “even without the major considerations of confounding factors and biases, real science is obliged to reject such small risk increments as insignificant, due to random variation.”

  28. #28 Carl Jarrett
    November 9, 2004

    I wonder how many of the people using the CI as a way to attack the Lancet study are the same people who support Lott’s DGU numbers?

  29. #29 Carl Jarrett
    November 10, 2004

    I wonder how many of the people using the CI as a way to attack the Lancet study are the same people who support Lott’s DGU numbers?

  30. #30 Per
    November 10, 2004

    when you say “More than 80% of those asked could verify “, presumably you are happy with a survey which attempted validation in less than 10% of households, and which failed to get confirmation in 19% of this tiny minority ? That 19% who couldn’t confirm should have been used to correct the figures- but oops, that would then yield a non-significant result ! Or perhaps you don’t like it when people accurately represent what this study did – and didn’t- do ?

    I am also puzzled by your statistics, and your insistence that this study is “statistically significant”, as if this is some gold standard. What you presumably mean is that there is a P<0.05 that the two sample populations overlap, based on the samples; but since when was 0.05 set in stone ? And yes, there can be a difference between “statistically significant”, and a “significant difference”.

    dsquared; you seem to be arguing that the sampling method is unreliable. This is precisely my point. The Lancet paper also makes clear that there was geographical clustering of sampling sites.

    Also, re: your comments about Brignell and Monte Carlo. I think it is obligatory to know what someone is talking about before you characterise it as “silly”. It seems to me that your comments illuminate your ignorance of his argument.

  31. #31 Rob
    November 10, 2004

    For all those talking about taking a Bayesian approach, please tell me your a priori DGP and how this data then affects it.

  32. #32 Scott Church
    November 10, 2004

    PaulP and Per, Roberts et. al. (the Lancet study) conducted their research over an extended period as a before-war to after-war cohort study. They did not use Baysian statistics in their analysis, so arguments about Frequentist vs. Bayesian methods are irrelevant. As for “attempted validation in less than 10% of households”, you make it sound as though there was no rhyme or reason for their connecting of the subsample of death certificate verified deaths to their larger clusters. In fact, Roberts et. al. chose their “10 percent” group for death certificate verification across a broad base that did, in fact, bootstrap well to their larger cluster set, and their confidence intervals were non-parametric and derived from their bootstrap methodology using standard and robust methods. If their clusters were chosen carefully and were well controlled (they were), then this should work fine. In fact, it is common for statistical conclusions to be generalized to larger populations using methods like this, and there is no problem here. It is not correct that their is no correlation between their subset and their larger clusters. Also, they spent some time discussing the various means they went through to evaluate the credibility of death reports within surveyed housholds independent of death certificates (which were often not available for recent deaths), all of which carry quite a bit of weight and have been used with good results in similar studies.

    Remarks about Brignell’s 1.5 factor are also irrelevant, as your criticisms are based on the assumption that Roberts et. al. knew virtually nothing about the sizes and characteristics of their clusters, and there was no way to evaluate whether or not a 1.5 factor was significant. A careful reading of the paper reveals that this is patently false. Tim’s criticisms of Brignell stand. It seems to me that you are investing alot of passion into arguing ivory tower points that have little relevance to the way the study was actually done, and then generalizing to get out of its conclusions. Perhaps less ad-hominem and a closer reading of the Roberts et. al. paper is in order…

  33. #33 Tim Lambert
    November 10, 2004

    Actually, Scott, you are being to generous to Per. The “attempted validation in less than 10% of households” follows from the fact that the great majority of households experienced zero deaths. Yes, they didn’t attempt to verify deaths in households where there were no deaths. Duh.

  34. #34 John Quiggin
    November 10, 2004

    This debate provides one neat way of refuting any study you don’t like. Just say: This study uses classical (Bayesian) statistical techniques and dozens of eminent Bayesian (classical) statisticians have observed that classical (Bayesian) statistical techniques are totally unsound.

  35. #35 Dano
    November 10, 2004

    Great thread, thanks Tim. The Lancet backlash is starting to get some play around my Uni – a bellweather event, perhaps.

    Also, I noticed the crickets chirping got really loud around here after Scott Church’s post. You’ve done that before, sir. My hat is off to you.


  36. #36 Per
    November 10, 2004

    “Interviewers were initially reluctant to
    ask to see death certificates”… Why spoil a good story ?
    “Death certificates usually did not exist
    for infant deaths and asking for such certificates would
    probably inflate the fraction of respondents who could
    not confirm reported deaths.”
    This is the three wise monkeys school of epidemiology.
    Tim doesn’t address the fact that they couldn’t even confirm the deaths in 19% of the households they looked at, and they only asked in <10% of households; both of these highlight the poor methodology, and the impossibility of relying on such flawed methodology. Scott tells us that the 10% group “bootstrap(s) well”; that would be meaningless gobbledegook, then ?
    Never mind the error of post-hoc data manipulation…

  37. #37 PaulP
    November 10, 2004

    1) I may not have spelled out my point about Bayesian versus frequentist. This point solely refers to the words used to the phraseology used to say what a CI is, in general. The second link I gave I read to say there are two different descriptions, one Bayesian (in terms of confidence) and one frequentist (in terms of repeating this exact experiment). In terms of maths we agree.
    1a) My original point was that scaling up to the whole population gives you a CI about how that population would respond to the same survey, not the number of actual deaths.
    2) Your point about CIs having to exclude 0 is well made. In terms of RR it means excluding a value of 1. Now note that the CI for the 1.5 RR is 1.04 to 1.97. The lower bound is very close to 1. Could any of the statistical experts here tell me what changes to the 60-before 90-during deaths would turn that 1.04 into 1? (After all, not all the reported deaths could be verified by a death cert, so it is possible the 90 is too high). Assuming the 60 is correct, then what number of reported deaths would give a CI for the RR including 1? About 87? In which case, given the difficulties reported in the article, the significance of the CI depends on the difference between 90 and 87 deaths reported. And if 19% were unable to produce a death cert, that’s 18 reported deaths without a death cert. Can we be certain that more than 15 out of these 18 were as related to those gathering the data?

    3) Regarding Professor Brignell’s rejection of low RRs. If you read his first book on this subject, you will see why he does so. If an RR is low then the effect producing it, if it exists, must also be very small. Typically the RR is calculated after much mathematical manipulation of the data – when comparing two different populations we try to remove influences due to different age/social class/religion structures etc ( whatever extraneous factors are believed might interfere with the results). The problem is that these manipulations are sometimes so powerful that a slightly different mathematical adjustment, or adjusting for other factors, would produce a different result if the RR is low enough.
    4)Which bring me back to my point 1a. A significance level can only quantify the reliability of the process (data plus mathematical adjustiments) used to derive it. The problem is that because of the mathematical manipulations, there is an uncloseable gap between what is measured and what is desired to be measured (in this case reports deaths were measured, not actual deaths). So when interpreting articles such as this, you also have to ask about the reliablility of the methodology.

  38. #38 Tim Lambert
    November 10, 2004

    1. The CIs in question are frequntist.

    2. Yes, the lower end is close to 1, and if there had been a few less deaths reported it would have been below one. That doesn’t stop it from being significant.

    3. Again, there may be reasons for rejected a RR of 1.5 in other circumstances, but Brignell dismissed it as too small in this case when there wasn’t any face multivariate analysis. Brignell propounds a general principle that factors of 1.5 are not significant. You won’t find that in any of the books (except for the ones written and published by Brignell).

    4. Yes you do need to consider at the methodolgy. Brignell didn’t. He just declared that a 1.5 RR was not significant.

  39. #39 per
    November 10, 2004

    Tim wrote:
    “Yes you do need to consider at the methodolgy. Brignell didn’t.”
    Fascinating. I don’t know what Brignell did and didn’t consider, ‘cos I am not telepathic. However, Tim obviously is, since he claims to know what Brignell thinks.
    One of the points Brignell routinely makes is that there are variabilities in the methodology (e.g. sampling heterogeneity, effect estimation, etc.) which contain considerable uncertainty which is not accounted for in the final statistical analysis.
    And there is that other one- that post hoc analysis is appalling statistical practice. Yes, that would be selecting and removing “outliers” because they increase the variability and they don’t then give a statistically significant result.

  40. #40 PaulP
    November 11, 2004


    1) You should read this on Professor Brignell’s page, which was put there before your last post: “For those who did not follow the link on RR above, the statement about the unacceptability of 1.5 applies to observational studies, not necessarily to properly randomised double blind surveys that produce a highly significant result.”
    As a professor of Engineering with a speciality in measurement, Brignell is only too well aware of the problems of methodology – in terms of physical sciences this means the measurement error. Hence his first book on this subject is subtitled “The abuse of measurement”. If you are no going to read his full website or his book, you leave yourself open to misinterpretations like this. Again I ask you to withdraw your slur.
    2) How do we get at the equivalent of the measurement for this study? Look at these quotes from the paper:
    “No attempt was made to adjust these numbers [January 2003 population estimates] for recent displacement or immigration”
    “Within clusters, an attempt was made to confirm at least two reported non-infant deaths”
    “To estimate the relative risk, we assumed a log-linear regression ..”.
    Continuing my earlier point that if 87 rather than 90 deaths had been reported then the CI includes 1, this converts into about a 3% measurement error. Which looks very reasonable in light of the above quotes. (For instance how reasonable is it that there have been np population displacements since Jan 2003?)

  41. #41 Tim Lambert
    November 12, 2004

    Brignell is a crank because 1. there is no support for his position amongst epidemiologists. 2. he is not a epidemiologist. 3. his notion that the whole field of epidemiology is a scam.

    Just because he is published in the electronics of measurement devices it doesn’t make him an expert in epidemiology. In fact, he doesn’t know what he is talking abot there. Of course, to know this you would have to look at genuine epidemiolgy text….

    I already explained what is wrong with his argument in the second paragraph of my original post. (Even if we had perfect records of all the deaths it is still an observational study.)

    You seemed to focus on the lower end of the 95% CI. You seem to think that if there was enough measurement error to get that doen to 0 excess deaths, the study can be dismissed. It just doesn’t work that way. The increase in violent deaths is clearly statistically significant, no matter how you slice it. And because of the exclusion of Falluja, the number is an underestimate of the death toll.

  42. #42 per
    November 12, 2004

    Tim wrote:
    “Brignell is a crank because 1. there is no support for his position amongst epidemiologists.”
    I see you have moved on from claiming to know what Brignell thinks, to knowing the thoughts of all epidemiologists. I sense a pattern here.
    Specifcally, what you state is wrong. If there was enough measurement error to get the estimate down to 0 excess deaths, the study can be dismissed, because it couldn’t be statistically significant.
    Even as is, the study is not clearly statistically significant. The authors undertook post-hoc manipulation of their data, to remove the Falluja data. If the Falluja data is in, their data is not significant by one of their own measures.
    and as pointed out by PaulP, your second paragraph of your original post is wrong. Perhaps you should learn some elementary statistics ?

  43. #43 Tim Lambert
    November 12, 2004

    You can find out what the consensus view of epidemiologists by reading epi texts. Furthermore, we know that they rejected Brignell’s principle because Philip Morriss spent a large amount of money secretly lobbying them to adopt it and they didn’t. Details here.

    If the Falluja data is included the 95% confidence interval for the risk of death is 1.6-4.2. That interval does not include 1. Hence it is statistically significant.

  44. #44 PaulP
    November 12, 2004

    Tim: you are still ignoring measurement error.
    All the statistical calculations can produce are statments about what would happen if the same process were repeated for the whole population, based on the results for the sample.
    For instance: suppose you have a manufacturing process which is sampled to measure the weight of the product, and the CI is (90, 110) kg. This does not mean that 90-110 kg is like a CI for the actual weight because the weighing device used on the sample may have a consistent measurement error, say always giving a result 10% too high.
    Nothing in the type of statistical manipulation that produces significance levels and CIs can handle measurement error. It has to be introduced as a sort of deus ex machina at the end. That’s why statistical texts differentiate between “statistical significance” and plain old significance (which takes into account measurement error).
    You should by now understand why I have been concentrating on the low end of the CI. If it is not sufficently greater than 1, the measurement error will mean the result loses it significance.
    Now you might say this is a long winded way of saying that the result is statistically significant. Unfortunately it still is not because the authors reported values for actual deaths. If they had confined themselves to talking about what would happen if their process had been repeated on the entire Iraqi population then they would have been safe. By going outside their process to talk about actual deaths, the limitations of their measurement process – assuming unchanged population and population spread since an ESTIMATE made in January 2003, “assum[ing] a log-linear regression” to “estimate the relative risk” and so on – come into play. They inevitably assume a zero measurement error. Which means they came to conclusions the statistics can never support.

  45. #45 per
    November 12, 2004

    Tim obviously doesn’t read the epidemiology textbooks which deal with the issues of sampling error, or recall bias, or post-hoc data manipulation, all of which are at issue in this study; but then why should he ? He has telepathy and speaks for all epidemiologists.
    Likewise, there is selectivity in saying that the Falluja data is statistically significant. In fact, if you read the paper, you will find it says:
    (Before the war) “The crude mortality rate was 5.0 per 1000 people per year (95% CI 3.7-6.3”
    “The crude mortality rate
    during the period of war and occupation was 12.3 per
    1000 people per year (95% CI 1.4-23.2”
    you will notice the confidence intervals in crude mortality rate overlap enormously, and there is no significant difference. Understandably, the authors didn’t want to emphasise this aspect of their study.

  46. #46 Ian Gould
    November 14, 2004

    Paul, measurement error is more of a problem is measuring, say, temperature than in measuring fatalities. Death is a binary state, there’s no such thing 99.2% dead with a variance of plus or minus 1.5%.

  47. #47 PaulP
    November 14, 2004


    If you read the papaer, they were not working with accurate data: for example, they assumed no change in the total population or its regional distribution of Iraq or its regional distribution from an estimate made in January 2003. Also they did not count even death certificates, merely reports of deaths some of which were verfiied with death certs. Then when analysing the raw data they made assumptions like “To estimate the relative risk, we assumed a log-linear regression in which every cluster was allowed to have a separate baseline rate of mortality that was increased by a cluster-specific relative risk after the war”, even though violent deaths were not reported in 18 out of the 33 clusters.
    In short: what went into the calculation of the relative risk was not free of measurement error.

  48. #48 per
    November 14, 2004

    Ian, even medics get it wrong when they have a body in front of them, though fortunately this is pretty rare.
    However, the issue is not whether Mr or Mrs Iraq can tell whether a body is alive or dead. It is about whether they will inaccurately tell you about how many people have died recently, whether it is exaggeration, or failure to recall.
    If you believe everything you hear in office gossip, you will believe that every one of these Iraq statitstics is true. The fact that ~19% of the adult deaths couldn’t be verified won’t worry you, and the amazing statements about infant deaths not getting death certificates anyway certainly won’t bother you.

  49. #49 PaulP
    November 14, 2004

    to make an analogy for the assumption of no population change since an estimate in 2003, and no change in population distribution:
    Suppose you were trying to measure the mength of something with a measuring stick 1 metre in length. You would not be sure if the 1 metre was correct (corresponding to an estimate of total population). Then the measuring stick has divisions of 1 centimetre, but you are not sure each division is the same length as the others (corresponding to the assumption of no change in population distribution since the estimate, which translates into an assumption that each of the 33 clusters has the same population). (For the non-metric, repace metre by foot and centimetre by inch). Now imagine trying to measure the length of something with such a device.

  50. #50 per
    November 18, 2004

    well, looks like everyone else has given up and gone home.
    given the appalling nature of the epidemiology presented, that can only be a good thing. I hope some of the punters have learned something.

  51. #51 dsquared
    November 18, 2004

    Per said:

    “you will notice the confidence intervals in crude mortality rate overlap enormously, and there is no significant difference. Understandably, the authors didn’t want to emphasise this aspect of their study.”

    Understandably indeed; this is a statistical howler and a common error among undergraduates and I for one am not surprised that a distinguished team of researchers didn’t make it.

    You are only able to argue from “the confidence intervals overlap” to “there is no significant difference” if you are talking about two independent samples. The same sample, measured at two different times, is not independent. Therefore, instead of comparing confidence intervals on two estimates, you need to model the risk ratio and establish a confidence interval on that. Which is what the researchers did.

    Per, perhaps you would like to apologise for casting this false aspersion on the JHU team? It will be hard to take you seriously on any other matter if you are not prepared to put your hands up to this simple mistake.

  52. #52 Stephen
    November 19, 2004

    I’m amused by this discussion most of which centres on rather abstruse statistical points. Clearly most on this list either fervently wish to believe this data or fervently do not. The relative risk ratio 1.5 or 2.5 is so so. Your trust in it really has to depend on your trust in the underlying data not the statistical methodology.

    I am somewhat skeptical for a simple reason that I don’t really trust survey data especially when a contentious issue is at stake. For example in the recent US elections all the polls and even the exit polls were completely wrong about the result. With the exit polls in particular it is clear that a proportion of respondents simply lied about their actual vote. This is a well-known phenomenon here in the UK too where many people vote conservative but dont like to admit it in public!

    The actual numbers post-war are 17 outside falluja, 49 within Fallujah from a sample of just under 8000. I would have to be skeptical that all of these are truthful accounts. This of course is not a statistical judgement but a value judgement based on my prejudice about surveys. Having said that they do seem to have looked at 63 death certificates, though I can’t see the proportion out of the violent deaths for that.

    The major cause of death is reported as air-strikes which too seems strange. It is noted in the study that Iraq was dangerous for the surveyors to travel around. Indeed this is largely why they chose cluster sampling. Many journalist and correspondents have noted this too. Generally though they are frightened of kidnap, lawlessness, and insurgency and not looking over their back for air strikes. Perhaps this is just the situation if you are based in Baghdad or Basra or if you are a foreigner???

    Then again if I’m reading this right they are calculating 150-200 deaths from air strikes on most days. This seems likely for instance in Fallujah presently but not consistent with press reports over the period in question.

    At the moment I think the jury is out. Indeed it is such a contentious issue that I don’t know when it is ever likely to be in. personally I think the press figures of 15000 or so is pretty shocking to begin with.

  53. #53 per
    November 20, 2004

    dsquared writes so many long words, so much crap…
    I can take a sample, treat it, and then see that it is different at a subsequent time. Moreover, I can say that there is a significant difference by directly comparing the one sample at two different times. What is this crap you are talking ?
    Your next logical howler is that the samples are not independent, and so cannot be compared. But that you can create a model from these two dependent data points, call it a risk ratio, and all of a sudden the “dependency” issue goes away and you can compare them ?
    you fail to understand the issue. The risk ratio approach that the authors used broke the population down into ~30 groups, and doing an average of the ratio in each of the ~30 groups, which is where the authors fluked a borderline significance result; but this is in no way superior to a direct comparison of the mortality rate before and after the war to create risk ratio which is not significant.
    False aspersions ? I do recall you made a variety of comments about Brignell’s experiments. Have you actually read what Brignell wrote yet ? or are you happy with making false aspersions ?

  54. #54 Tim Lambert
    November 20, 2004

    per, if you want me to take your arguments seriously, you need to support them with reference to statistics texts. Citing Brignell’s tripe does you no good. Brignell is not a statistician and does not know what he is talking about.

  55. #55 Pop Trot
    November 20, 2004

    For example in the recent US elections all the polls and even the exit polls were completely wrong about the result. With the exit polls in particular it is clear that a proportion of respondents simply lied about their actual vote.

    Stephen, you are correct that the exit polls were wrong, but you can’t say it’s because respondents were lying. The fact is we don’t know why the polls were wrong because they refuse to reveal their methods. If you know of a source for your claim, please post it.

  56. #56 per
    November 21, 2004

    Tim, if you don’t know the fundamental issues of epidemiology which apply to this paper (e.g., sampling error, recall bias, post-hoc data manipulation), then it strikes me you have been making outrageous claims based on little knowledge. These issues are fundamental issues in epidemiology, and any textbooks will point them out.
    Yet again, you pour unsubstantiated bile over Brignell’s reputation. Professor Brignell, ex of the University of Southampton, has a considerable academic reputation based around measurement science in engineering. It strikes me that his knowledge of statistics and measurement in engineering may well be greatly superior to yours.
    The major criticism remains. The authors of this study find an effect of borderline significance (1.5 fold). We already know that there are major, and uncontrolled, variables in this study(sampling error, recall bias, post-hoc data manipulation) that could cause an effect bigger that 1.5 fold. It is a leap of faith to say that the 1.5 fold increase is due to the war; when it could equally be due to the poor methodology.
    and so far, you are agreeing that <19% of adult deaths, and an unknown number of neonatal deaths, were unconfirmed. You are agreeing that there was sampling heterogeneity. You don’t have a leg to stand on.
    yours, per

  57. #57 Dano
    November 21, 2004

    Lordy. Who is this per person? Sheesh.

    Clue: since when is sampling error, recall bias, post-hoc data manipulation the sole provenance of epidemiology? Come on.


  58. #58 per
    November 21, 2004

    Dear Dano
    No-one has said that sampling error, recall bias, etc., are the sole provenance of epidemiology. What you will find, however, is that the issues specific to the discipline of epidemiology are discussed in epidemiology textbooks, and these include the issues you cover.
    By contrast, Tim insults Brignell by saying he “is not a statistician”; but the issues that this paper brings up are not pure statistics; they are to do with epidemiology.
    And I notice you are not addressing any of the issues that damn this paper.

  59. #59 Tim Lambert
    November 21, 2004

    Per, if you have actually looked in an epi text you must know that Brignell’s principle that a 1.5 RR is “not acceptable as significant” does not appear.

  60. #60 dsquared
    November 21, 2004

    Per, you are wrong and throwing abuse at me doesn’t make you right.

    For anyone reading, the lack of independence between the two samples is not exactly difficult to see; while people who didn’t die before the invasion can die after it, people who did die before the invasion can’t be counted as alive after it. This is why it is wrong to simply measure death rates and compare the confidence intervals as if you had two independent samples.

    You now need to make two apologies, Per; one for criticising the study on the basis of your own mistake, and one for becoming abusive when your mistake was pointed out.

  61. #61 per
    November 22, 2004

    Tim, I never said that Brignell’s statement was a principle, and I never said it appeared in Epi texts. Brignell’s statement appears to be a judgement on the study- and on many other equally piss-poor bits of science. By contrast, you will find that Epi texts do have a considerable bit to say on sampling error, recall bias and post-hoc data manipupalation; and very little of it is complimentary to this study. I notice you are quite quiet on this issue.
    dsquared, it is death rates, not number of deaths. The only person – so far- who is bringing people back to life is yourself, and I truly do not understand what you are gibbering about.
    You appear to be completely wrong; it is fundamental that you can take the same sample, measure before and after an event, and make statistical comparisons. You seem to be arguing that you can only make the comparisons you want to make !
    perhaps apologies begin at home ?
    a very amused per

  62. #62 per
    November 22, 2004

    “As a general rule of thumb, we are looking for a relative risk of 3 or more before accepting a paper for publication.” – Marcia Angell, editor of the New England Journal of Medicine”

    “My basic rule is if the relative risk isn’t at least 3 or 4, forget it.” – Robert Temple, director of drug evaluation at the Food and Drug Administration.

    “Relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or the effect of confounding factors that are sometimes not evident.” – The National Cancer Institute

    Tim, I didn’t have an epi textbook to hand. However, I have provided quotes from the New England Journal of Medicine, the FDA, and the NCI.
    are they all talking “tripe” ?
    are you going to say that they are all “not statisticians, and don’t know what they are talking about” ?

  63. #63 Dano
    November 22, 2004

    What you will find, however, is that the issues specific to the discipline of epidemiology are discussed in epidemiology textbooks, and these include the issues you cover.

    I didn’t have an epi textbook to hand.

    Yes, it’s apparent from the arguments you get from somewhere else that you don’t quite fully understand. Oh, and the obviousness that you’ve never had the occasion to write a paper, not even a basic lab paper in college discussing findings of a simple experiment.

    The sheer spectacle keeps me coming back to these comments.


  64. #64 Scott Church
    November 22, 2004

    Regarding Brignell, here’s something interesting. Apparently he’s a valuable resource for the Science and Environmental Policy Project (SEPP) – Fred Singer’s Moonie funded front group whose sole mission in life is to bring down mainstream climate science. He appears in one of their weekly reports here. In addition, his book “Sorry, Wrong Number” (in which we’re presented with the ludicrous claim that risk ratios of 1.5 are not significant regardless of sample size – Per and PaulP’s rants to the contrary notwithstanding)… is featured at Steven Milloy’s reading list and web store here along with Michael Fumento’s “The Myth of Heterosexual AIDS”, Herr Lott’s “More Guns, Less Crime”, and other towering works by “scholars” such as Julian Simon, Ron Arnold (who coined the term “Wise Use” for American antienvironmental extremists), and Ron Bailey of the Competitive Enterprise Institute.

    And Over here…. we’re treated to his glowing review of a book titled “The Cholesterol Myths” by one Uffe Ravnskov in which we’re told that saturated fats and high cholesterol pose no risk whatsoever to human health. Wow, the hits just keep on comin’ with this guy.

    While none of this specifically refutes any of the glowing claims made about his work by Per and PaulP, it has been my experience that much can be learned about someone by the company they keep. I’ll bet a $100 bill that there’s a reason why Brignell’s work shows up at Steven Milloy’s web store, and not at the National Academy Press’… šŸ™‚

  65. #65 paulP
    November 22, 2004

    Tim et al:

    Prof Brignell says that a low RR is not signigicant. He is not talking about statistical significance, but practical significance, which has to take into account measurement error.
    No one has addressed my point that you cannot go from the CI to the results claimed. The calculation of a CI is a perfect process when done correctly, but can only produce statements about what was being sampled. Here what went into the calculation of the CI was the end product of of another process that included assumptions (such as that the Iraqi population and its distribution had not changed from a mere estimate of Jan 2003) and surveys into reports of deaths only some of which were verified by death certs. The CI is for this second process not for actual deaths.

  66. #66 paulP
    November 22, 2004

    To Scott Church: it is good to be accused of ranting by someone who uses the McCarthyite tactic of guilt by association.

  67. #67 Tim Lambert
    November 22, 2004

    Sorry Paul, but “practical significance” means something already, and this result is most definately practically significant. (100,000 excess deaths is a practically important difference.)

    In any case, Brignell was not saying that 1.5 doesn’t matter because of measurement error. As I noted in my original post, his argument would still apply if there was no measurement error and no sampling error.

  68. #68 paulP
    November 22, 2004

    1)The figure 100k is only practically significant if the process which produced it has a measurement error whose effect on the CI for the RR is less than the amount by which the CI is different from 1 (among other criteria). Which you are still ignoring. The results of the paper (that 100,000 actually died, with associated RRs and CIs) cannot be justified because what went into calculating the RRs and CIs was not a raw figure for actual deaths from a perfectly accurate sample.

  69. #69 Tim Lambert
    November 22, 2004

    Paul, “practically significant” means something. I explained what it meant. Please do not misuse statistical terms.

  70. #70 paulP
    November 22, 2004

    Tim: I am willing to use any term you like. My point stands: after calculating the RR and its CI it is necessary to look at the measurement error. Which you are still ignoring.
    PS : where did you explain practical significance?

  71. #71 per
    November 22, 2004

    Dano writes that I “never had the occasion to write a paper, not even a basic lab paper in college discussing findings of a simple experiment“.
    the interesting thing is that Dano does not know what he is talking about; yet still he issues a statement which he represents as fact.
    So Dano, I challenge you. Accept that your statement is a complete fantasy, based on no knowledge.
    Or justify your lie.
    It goes without saying that the views of a fantasist and liar will have little merit in any scientific debate.
    yours, per

  72. #72 per
    November 22, 2004

    So Scott, when are you going to start slinging mud at Marcia Angell, Robert Temple and the US National Cancer Institute ?
    They all say that they are very suspicious of RR’s of < 2; which is the same point Brignell is making. Are you afraid that if you throw mud at the NEJM, the FDA and the NCI, you won’t have any credibility ?
    don’t worry; no difference.
    yours, per

  73. #73 per
    November 22, 2004

    Dear Tim,
    it has been pointed out to you twice that you got it wrong in your first paragraph. You would have to know that there was no measurement error, no sampling error and the variability of the samples. Please feel obliged to acknowledge your error.
    It is clear to me that Brignell’s comment is on this study, and does not apply- as he accepts- in more rigorous and controlled studies. This study has a number of possible confounders; and the effects of sampling bias, recall bias, and their post-hoc data manipulation may well have a strong confounding effect.
    this is the definition of “practically significant”.
    yours, per

  74. #74 Dano
    November 22, 2004

    That’s a super post, per. Your hyperbole notwithstanding, it is very hard to write a paper and hard to read one too.

    But I’ll concur – I should have said something like Your little comments seem to indicate you’ve never been through the exercise, judging from the cut-paste quality of your argumentation.

    But, to the task at hand:

    Your argument’s not getting through, you fret? Try what everyone else does – rephrase the argument in a few different ways to elicit perfect understanding in the reader.

    Oh, you can’t? Huh.

    Ah, well. That’s OK. It’s a spectacle, from this perspective.



  75. #75 dsquared
    November 23, 2004

    it is fundamental that you can take the same sample, measure before and after an event, and make statistical comparisons

    It is equally fundamental that you do not carry out this process by taking the confidence interval of your estimate before the intervention, taking the confidence interval of your estimate after the intervention and seeing if the two overlap. That is only a valid significance test for independent samples.

  76. #76 Scott Church
    November 23, 2004

    Per and PaulP, “McCarthyite” guilt by association refers to merely fingering someone in an environment of hysteria in the hopes that the label will stick apart from evidence. Pointing out that someone has a long, documented history of shoddy work in specific, well understood scientific disciplines is another matter altogether. If the head of the Creation Research Institute tells me that there’s new evidence proving that allotropic speciation does not occur in mammals for instance, his past work record by itself does not prove him wrong, but it certainly gives me cause for suspicion! Like it or not, we can learn much about what can be expected from someone today by studying their past. If this were not true, courts wouldn’t allow character witnesses. This is all the more true in science, where peer-review is a critical part of the process of establishing credibility in one’s work. If someone like Brignell is seldom seen on the pages of Nature, Science, or the Proceedings of the National Academy of Sciences, but he’s regularly the flavor of the month at highly partisan hack sites like Steven Milloy’s or the SEPP, there’s a reason why. If this passes for “McCarthyite”, then McCarthyism is why we have cures for many cancers, and I can live with that.

    As for Marcia Angell, Robert Temple and the US National Cancer Institute being “very suspicious of RR’s of < 2”, it is difficult to evaluate these claims, as you have not accompanied them with proper citations. I’m willing to bet that there was a context to those remarks that included much that both of you have left out here – such as qualifying remarks about sample size and methodology, etc. Until you provide this information – contextual information regarding specific one-off comments by these people – this is little more than an argument from authority, which really is a logical fallacy.

  77. #77 Scott Church
    November 23, 2004

    BTW Per and PaulP, Speaking of sample sizes and methodology, details about each are what seems to be lacking here. This blog is now reaching record lengths. But from what I’ve seen so far, most of your comments revolve around a) General arguments about the reliability of confidence intervals and risk ratios with few if any references to sample size, and, b) General claims about what you perceive to be unreliability in the Roberts et al. methods of death verification.

    Point a) is flatly incorrect. You cannot speak about this RR or that CI as being meaningless without addressing sample size. Otherwise there would be no difference between a RR of 1.5 calculated from an N of 3 and one from an N of 3,000,000 (his remarks about this point alone are grounds for labelling Brignell as a hack – Steven Milloy made the same bogus claim in regard to second hand smoke studies some time ago, and has long since been discredited for them). Nor can you make generalized statements about sample size without addressing the Roberts et al. methods of cluster bootstrapping which is well documented and widely used in similar circumstances (Efron B., Ann. Stat. 1979; 7: 1-26). There are risks associated with the use of bootstrapping methods, but this is because the tend to underestimate, not overestimate results. If anything, the Roberts et al. death estimates are low.

    As for death verification, from what I’ve seen, all you guys have done so far is make vague claims about how about you think the deaths weren’t verified. In fact, Roberts et al. go into great detail as to their verification methods. In addition to death certificates (which were included in 81 percent of their base sample, before bootstrapping using known methods – and no, 19 percent is not a huge failure of confirmation in a study like this), they document their discussions with relatives of the deceased, where under-reporting and over-reporting are expected, and why, how their infant mortality figures were checked against known independent figures from neighboring countries, and more.

    You guys haven’t addressed any of this.

    To make your point, you need to cover these bases… and you need to do it with more than just general remarks about RR’s, CI’s, and death certificates, or comments about what a swell guy Brignell is.

  78. #78 per
    November 23, 2004

    Scott Church, are you really making the claim that Brignell has a “long, documented history of shoddy work” ? Just a guess, but I am reckoning you are pig ignorant of Brignell’s publication record, and still you are prepared to throw mud.
    Likewise, you are incredibly eager to throw mud at Brignell, but when other people with big names (NEJM, FDA) make similar statements, you retreat into “ooh, I can’t possibly comment till I have seen the day and time of the quote, and colour of the moon, to make sure it isn’t out of context”. The only context is that you are well and truly busted.
    your point (A) is wrong. Most of the comment I have seen is perfectly well informed about sample size, and yes, I take the view that it is on the small size. Also you are wrong in that the unreliability if cluster estimates has also been addressed. And specifically, that there are conditions where you can systematically overestimate results due to sampling defects.
    If you haven’t read the stuff above, you won’t have read the specific claims; but then again, ignorance may be a form of bliss for you. You also got your figures wrong. They got 81% of non-infant deaths verified in a sub-sample. Yes, 19% difference would make the difference between a significant and non-significant result. And frankly, if you think you can check figures from one country by comparing them with figures from another country, then you have basic problems with the nature of verification.
    In short, these were addressed, and you were too lazy to read.
    yours, per

  79. #79 per
    November 23, 2004

    Dear Dano,
    I am glad you concur that your previous statement was a fabrication based on no data.
    Maybe, next, you will make an argument which is based on science and reason ?
    I await with interest.

  80. #80 Ken Miles
    November 23, 2004

    One of my favourite websites has the following on quotations:

    To sum up, when evolution deniers provide quotations many questions need to be asked including:

    * Is the quote itself accurate?

    * Do the preceding and following passages change the meaning of the quote?

    * Does the creationist use the key terms in the same way as the quoted person?

    * What is the quoted person’s actual opinion on the point in question?

    * Who was the quoted person addressing?

    * Is the quote out-of-date?

    * Who is the quoted person?

    * Is the quoted person a relevant authority to the issue at hand?

    * What do other relevant authorities think?

    * Is the quote from a popular source or from the primary peer-reviewed literature?

    * Is the quoted person actually correct?

    If a verifiable reference is not provided then consider the quote to be hearsay. Also remember that it very easy to find statements by qualified statements strongly supporting evolution and/or objecting to how they have been quoted by evolution deniers.

    From Talk Origins.

    While it refers specifically to creationists, a comparison to Per’s use of quotations seems appropriate.

  81. #81 Scott Church
    November 23, 2004

    Hey Per, That was a lovely little temper tantrum. And yes, those other people with “big names” do get more respect from me… because they’ve earned it, not only by publishing quality work that has survived peer-review, but also by not showing up at places like or pushing theories about how cholesterol and saturated fat pose no health risks. As for your previous posts, I have read them, and you haven’t provided any content on the relevant points. As for your “authoritative” quotes, you still haven’t produced anything in the way of proper, professional citations for them – citations I can check for context. Until you do, you’re blowing smoke. Please take note of Ken Miles’ excellent post prior to this one, as it is most relevant to you on this point. As for the 81 percent, I never said a word about non-infant vs. infant deaths – I was merely referring to your previous, and rather emotional use of the 19 percent figure as being unreliable, which it is not in a study like this one, without properly qualifying that judgment with an examination of the Roberts et al. methodology. I see also that you still haven’t bothered to address the question of bootstrapping or the details of the paper’s death verification methods apart from death certificates, where they were not available. If, and when, I find that you’ve made an attempt to address these things, I’d be happy to consider them. Otherwise, at this point I’ve got better things to do. Now if you’ll excuse me…..

  82. #82 paulP
    November 23, 2004

    You are missing my point: the CI refers only to values put into the calculation of the CI. To go beyond the CI you need to take into account measurement error. Which this paper and Tim do not. As I pointed out a long time ago, a measurement error of about 3% would close the gap between the CI lower bound for the RR and a value of 1.
    In case you do not understand this:
    Suppose you use a weighing device to find the average weight of some manufactured product. You take an appropriate sample and get your CI. The CI gives a range of values, not for the real average weight, but for the average weight as it would be measured by the same weighing device. No statement could be made from this about the average weight as would be measured by another weighing device. That’s because each device has its own measurement error.
    To make a statement about the actual average weight you would need to take the CI and adjust it for the measurement error of the weighing device used, assuming you can quantify it.
    (In case you missed it, there’s a measurement error right at the beginning of this paper: “We obtained January,2003, population estimates for each of Iraq’s 18 Governates…No attempt was made to adjust these numbers for recent displacement or immigration”. )

    BTW your defence of your use of “guilt by association” is laughable. Before you attack Professor Brignell, have a look at

  83. #83 Stephen
    November 23, 2004

    I think a few of you who are unfamiliar with Brignell are misrepresenting him. He maybe oft quoted by many unreliable sources, but he himself spends most of his time debunking health scares and dubious surveys. At this he is usually quite prepared to strike at the idiocies of both sides of an argument.

  84. #84 dsquared
    November 23, 2004

    take an appropriate sample and get your CI. The CI gives a range of values, not for the real average weight, but for the average weight as it would be measured by the same weighing device. No statement could be made from this about the average weight as would be measured by another weighing device.

    This is cobblers. If I weigh myself on the “Speak Your Weight” machine at Paddington Station and it says I weigh 90kgs, then I am entirely within my rights to assume that I will also weigh 90kgs on my bathroom scales at home, my doctor’s scales at his surgery and indeed any set of scales within a reasonable distance of sea level on the planet Earth.

    Paul, you are engaged in what I identified as “Kaplan’s Fallacy” – trying to use arguments about measurement errors as if they were arguments specifically for an overestimate. There are very numerous reasons indeed to believe that the method used would be more likely to underestimate the death rate rather than overestimate it, and you haven’t even mentioned them. This above anything else makes it difficult to take you seriously. You don’t want to end up like Per, who I am no longer taking seriously at all, because a) he has refused to admit his mistake and apologise for slandering the JHU research team based on his own mistake and b) he is now trying to resurrect issues (like cluster sampling) which have been thoroughly dealt with, simply by claiming that “he remains unconvinced” without giving any specifics. You don’t want to be like that.

  85. #85 per
    November 23, 2004

    Scott Church
    I suspect you know as much about Marcia Angell, Robert Temple, as you know about Brignell- in short, absolutely nothing.
    As for your claim that the reason you distrust Brignell is because he is cited by a web site you dislike- you know, I believe you are that shallow. Presumably, if “Junkscience” cites einstein or newton, you wouldn’t believe in them either ?
    It is interesting to see your response to quotes you don’t like; you run like a scared rabbit. So for the NCI quote, it is a National Cancer Institute Press Release, October 26, 1994. Is that specific enough for you ?
    the Marcia Angell quote is from Science, July 14, 1995.
    Do you go all weak at the knees when someone says something is published in science ? Will you accept Brignell is right, just because this quote is published in SCIENCE ?
    not that that would be shallow…
    And by the way, you have the death analysis wrong. They asked people in the house, then (in a small number of cases) attempted to verify by death certificate. That was it for verification.
    You said that 19% of deaths were unverified. I was pointing out to you that the 19% did not include the infant deaths which were poorly verified if at all. Apparently you missed the subtlety of my post.

  86. #86 per
    November 23, 2004

    you have the population numbers for death rate before and after invasion with Falluja included. Why don’t you do the calculation on whether these two figures are statistically significant and tell us about it ?
    I have a feeling we will be waiting a long time on this.
    your main argument on heterogeneity of sampling seems to be that you bored everyone to death on another forum, and that cluster sampling can be unreliable. I am not sure quite how you proceed from that to announcing you have won the argument.
    just fascinated, per

  87. #87 per
    November 23, 2004

    Ken Miles
    isn’t it also the case that creationist nut jobs use smear tactics and ad hominem abuse as a debating target ?
    Perhaps I can say that your tactics bear comparison to the creationist nut jobs you so despise.

  88. #88 Dano
    November 24, 2004

    per’s inability to properly cite the quotations speaks volumes.

    Volumes that he likely hasn’t read.


  89. #89 Tim Lambert
    November 24, 2004

    Scott and Ken are quite right about the quotes Per presents as being taken out of context. The first two were from a 1995 Science article by Gary Taubes. Taubes also quoted Harvard epidemiologist Dimitrios Trichopolous as saying something similar. Trichopolous wrote a letter in reply (Science 269 p1326)

    Taubes writes that I have expressed the view that only a fourfold risk should be taken seriously. This is correct, but only when the finding stands in a biological vacuum or has little or no biomedical credibility. we all take seriously small relative risks when there is a credible hypothesis in the backgeound. Nobody disputes that the prevalence of boys at birth is higher than of girls (an excess of 3%), that men have 30% higher rate of death compared to women of the same age, or that fatality in a car accident is higher when the car is smaller.

    I guess he is wrong about the nobody would dispute part when a crank like Brignell would.

  90. #90 dsquared
    November 24, 2004

    Why don’t you do the calculation on whether these two figures are statistically significant and tell us about it ?

    Because it was done in the original paper, and they are. This is the very definition of madness, Per – repeating the same calculation and expecting a different result. I refuse to join you on this journey into insanity.

  91. #91 per
    November 24, 2004

    dsquared all of a sudden cannot see the difference between analysing numbers based on a cluster approach, and analysing on a total population basis, i.e. are the rates of mortality different before and after the invasion for the whole population. Which is strange, because this is the context he was arguing about.

  92. #92 per
    November 24, 2004

    Tim Lambert writes that the quotes I used are taken out of context, and correctly identifies the science article by gary taubes I referred to. He quotes a point by a Dimitrios Trichopoulos, but this has no relationship to the quotes I used.
    What is it about the context of these quotes that means I have taken them out of context ? I put it to you that your statement is an outright lie.
    The issue is that you don’t like the content of the quotes.
    The quote from Trichopoulos isn’t that impressive. He quotes examples where we believe in small relative risks on the basis of extremely large sample sizes, in direct contrast to the reasons he proposes.

  93. #93 per
    November 24, 2004

    dsquared fails to follow the point of PaulP‘s argument (again). The point is that weighing machines are calibrated- they have some objective check on their function. You can put a 90kg weight on a machine, check it works and compare it to your weight.
    the point made was that the survey measurement has not been reliably calibrated. Only a small subsection of the sample was assessed, and there was a failure to verify ~19% of this small sample.
    or to put it in nice simple terms that you can understand. Here is a balance that has been calibrated for a 100 mg weight; when it was calibrated, the 100 mg weight showed a reading of 81 mgs- but it could have been 100mgs. Now will this balance measure a 90 kg individual reliably ?

  94. #94 dsquared
    November 24, 2004

    Per, as far as I can tell, you’ve now simply become logorrheaic; you’re throwing terms around like “cluster” and “population” without any real regard for whether the resulting sentences make sense. Could you state your specific argument cleary (preferably first admitting that your original calculation above, in the post dated 13/11/2004 04:55:27, dealing with confidence intervals in crude death rates, was a mistake).

  95. #95 per
    November 24, 2004

    dsquared, if you read my post of 13/11/2004 04:55:27, you will see that I did not make any calculation, contrary to your assertion.
    Nonetheless, I assert that there is no significant difference between these population means for death rate.
    I will be delighted if, at long last, you can follow simple english.
    Now: Prove me wrong.

  96. #96 Tim Lambert
    November 24, 2004

    Per, your assertion is that there is no significant difference is wrong. You have no clue about even elementary stats. You don’t seem to even know what independence means.

    And I find it remarkable that even though my quote from Trichopolous was only three sentences long, you failed to understand it. He’s not talking about sample size at all.

  97. #97 paulP
    November 24, 2004

    1)I am not arguing that the measurement error can only go in one way, but that we are completely ignorant of its size and direction.
    2) As for your cobblers remark, I suggest you talk to a physical scientist about a little thing called “calibration”. In the real world we distinguish statistical “error” (which is the product of natural variation, and for the analysis of which statistical tools like CIs were developed) from real errors like measurement errors. The local weighing machine may be good enough for you but in science we have higher standards. Your attitude would be laughed out of a lab in a physical sciences college department.

  98. #98 paulP
    November 24, 2004

    Are you seriously saying we should treat the 3% excess of male births over female births as being on the same level of accuracy and correctness as the sort of work in this paper? On the one hand you have an almost perfect count of every birth, a count which has been carried out for decades, and on the other a once-off survey based on a tiny number of cases with assumptions that could have a greater effect on the result than the war itself? If you really want to do this I suggest you might have a look at the range of the CI for that 3% and compare it with the range of the CI in this paper.

  99. #99 dsquared
    November 24, 2004

    Per: yes you did make a calculation. You said that the two confidence intervals overlapped and that therefore there was no significant effect. That’s a calculation of probabilities (specifically, it’s an erroneous calculation) and the fact that you didn’t realise you were making one doesn’t change the fact that you were.

    Paul: Your misplaced analogy with calibration of an instrument is causing you to say things about confidence intervals which aren’t true. In particular, you appear to be arguing that because the confidence interval is wide, then it is more likely that the confidence interval does not measure the uncertainty surrounding the estimate correctly. This isn’t true.

  100. #100 per
    November 24, 2004

    Tim Lambert, you now have the citation for my quote, and you can see them in context. If you persist in your claim that I took these out of context, that would leave you open to the charge that you are deliberately falsifying.
    I urge you to withdraw your claim that I took these out of context.

    Re; Trichopolous, I am dumbfounded by your ability to throw mud, and fail to understand the comment.
    There is no credible hypothesis in the background why there should be an excess of male births (~3%), or why males should die quicker than females (~30%), the examples he cites. We take seriously the small magnitude of these findings because of the thoroughness with which these facts are known and verified (e.g. all births and deaths are recorded in the UK), and the extremely large sample size for these figures. This directly contradicts his suggestions that we believe these numbers because of a credible hypothesis.

New comments have been temporarily disabled. Please check back soon.