Lancet study and cluster sampling

In an earlier post I
observed that “Seixon does not understand sampling”. Seixon removed
any doubt about this with his comments on that post and
two
more
posts.
Despite superhuman efforts to explain sampling to him by several qualified people
in comments, Seixon
has continued to claim that the sample was biased and therefore “that the
study is so fatally flawed that there’s no reason to believe it.”

I’m going to show, without numbers, just pictures, that the sampling
was not biased and what the effect of the clustering of the governorates was.

Let’s look at a simplified example. Suppose we have
three samples to allocate between two governorates. Governorate A has
twice as many people as Governorate B, so if they are not paired up, A
gets two samples and B gets one sample. (This is called stratified
sampling.) If they are paired up using the Lancet‘s
scheme, then B has a one in three chance of getting all three
samples, otherwise A gets them. (This is called clustered sampling.)
Seixon claims that this method introduces a bias and what they should
have done was allocate the three samples independently with B having
a one third chance of getting each cluster. (So that, for example, B
has a (1/3)x(1/3)x(1/3) chance of getting all three. This is called
simple random sampling.

We can see the difference each of these three procedures makes by
running some simulations. I used a random number between 1 and 13 as
the result of taking a sample in governorate A and one between 1
and 6 for governorate B and ran the simulation a thousand times. The
first graph shows the results for stratified sampling. The horizontal
lines show the distribution of the results. 95% of the values lie
between the top and bottom lines, while the middle one shows the
average.

i-e497654df91c66fa7f702b76989cb8c0-stratified.png

The second one shows the result of clustered sampling. Notice that the
average is the same as for the first one. This shows that by
definition
, the
sample is not biased. However, the top and bottom lines are further
apart—the effect of using cluster sampling instead of stratified
sampling is to increase the variation of the samples.

i-d4a4804d034966444912410c20346bd8-cluster.png

The third one shows the result of simple random sampling. The average
is the same as the previous two. There is less variation than for
cluster sampling.

i-75e14b060713c3a505d07e7e19f74afa-simple.png

The last graph shows simple random sampling but with two samples
instead of three. The average is the same as for the others, and the
amount of variation is about the same as for cluster sampling. In
other words, the result of cluster sampling is just like simple random
sampling with a smaller sample size. The ratio of the sample sizes
for which cluster sampling and simple random sampling give the same
variation is called the design
effect
. In this
case it is roughly (3/2=1.5). In our example governate A was
quite different from governate B (samples from A were on average
twice as big). If A and B were more alike then the design effect
would be smaller. That is why they paired governorates that
believed were similarly violent. If the governorates that they
paired were not similar, it does not bias the results as Seixon
believes, but it does reduce the precision of the results,
increasing the width of the confidence interval.

i-61d4c660abf4b60869d39cd2e22935ba-simple2.png

Seixon offers one more argument against clustering—if clustering is
valid, why not put everything into just one cluster? The answer
is that although that would not bias the result, it would increase
the design effect so much that the confidence intervals would be
so big that the results would be meaningless.

This article by Checchi and Roberts goes into much more details of
the mechanics of conducting surveys of mortality. (Thanks to Tom
Doyle for the link.)

Comments

  1. #1 Donald Johnson
    October 20, 2005

    Seixon, I’ll break it down into three categories.

    Males killed by criminals–Statistically, I think it’d be unlikely that these guys would be insurgents, because most males in Iraq aren’t insurgents and I’m also guessing that criminals pick on a random sample of the male population. If anything, they might tend to avoid well-armed males who are insurgents. (I’m leaving aside the possibility of lying about who killed them, which could skew things in various ways.)

    Males killed by accidents, heart attacks, etc…–I think Tim (and I) were only talking about the violent deaths, but sure, it’s possible that some of these others were insurgents. Again, though, think of the odds. Most people in Iraq aren’t insurgents and heart attacks don’t single out insurgents (who might be younger and healthier than the average Iraqi male), so if you list, say, 10 heart attack deaths (a number I invented), it’s possible, but statistically unlikely that any of these were insurgents, unless insurgents make up 10 percent or more of the male population.

    Males killed by insurgents–I don’t think there were that many in the Lancet study, but if there is any insurgent on insurgent violence (and there has been, with shooting between insurgents who want to focus on the occupiers and insurgents who want to kill civilians), then some of these might be insurgents. I’d still guess it’s statistically more likely that these are cvilians either deliberately killed or caught in the crossfire, but because of internecine fighting among insurgents this group has the strongest possibility of containing insurgents among its number. (As usual, assuming truthful responses.)

    On Fallujah, you’re making my own favorite point for me–the Lancet cluster there showed a huge death toll from aerial attacks and it makes one wonder what went on there when Western reporters weren’t around. You have air support when ground troops go in, so you’d expect deaths that way back in the spring 2004 assault. If they did things the way I read they did them in Vietnam, you might have helicopters or planes taking out homes where fire came from–that’s safer than sending infantry in to root them out, if you ignore the risk to civilians. But the largest single death toll in the Fallujah cluster occurred in August 2004. The study ended in September 2004 and the final assault occurred in November, so you don’t expect to see deaths from that one. I know from a NYT report that the US was bombing Fallujah between the first and second assaults, and this one cluster suggests (but of course doesn’t prove) that the bombing might have been causing very high casualties. I’ve always thought the Fallujah cluster (if the responses were honest) is the most interesting and possibly revealing thing about the entire study–the rest of it gives us an overall violent death rate which isn’t that far off the UNDP report for the shorter period of time. The Fallujah data might be telling us how bad things were in one part of the country right after the UNDP report ended.

    BTW, I think the one real contribution you’ve made (along with Bruce R) was the idea of troop deaths as violence indicators. That was a good idea, though I think when you combine what you found with Bruce’s point, it suggests that the various sampling flukes (my own private technical term) more or less cancelled out. It’ll be interesting if you find something in the UNDP report that can be used as an indicator. You won’t take my advice, but it would have been good if you’d stuck to these empirical approaches and left the statistics theory alone. (Though again, I’ve personally learned a few things from all this.)

  2. #2 Seixon
    October 20, 2005

    Donald,

    I like you, you’re one of the more honest and intellectually honest persons I have dealt with here.

    Those 11 males, I think I may have have found a key to them in the Lancet study:

    Table 2 includes 12 violent deaths not
    attributed to coalition forces, including 11 men and one
    woman. Of these, two were attributed to anti-coalition
    forces, two were of unknown origin, seven were
    criminal murders, and one was from the previous
    regime during the invasion.

    There we have at least two more that are unaccounted for as far as why or how they died, of course one of them could have been the woman. Seven criminal murders, as you said, most of these would probably not be insurgents, but some of them could be, especially taking into consideration what you said about lying.

    I find it severely lacking by Lambert to claim as fact that only 2 of those 13 males could have been insurgents, especially given that 1-2 more were never assigned a category, and that some of the criminal murders might also have been insurgents. We just don’t know, so claiming to know is just disingenuous. Roberts said he didn’t know, why isn’t this good enough for Lambert? Oh, right, because he took to heart the fact that the Lancet website overstated the conclusion of the study on the eve of a presidential election…

    As far as heart attacks and strokes, without knowing more specifics on these deaths, I wouldn’t think it unusual for an insurgent who has been fighting for their life, running away, being in an intense state of fear having this happen to them. Once again, I am not claiming that any or all of these were insurgents. I am only soliciting the assurance that it is a possibility.

    You said that you thought possible that some of the ones who got killed by insurgents might be insurgents themselves due to fighting between them, 1-2 of the males were killed in this fashion.

    In other words, claiming that tops 2 males out of the 13 were insurgents is sweeping a lot of possibilities under the rug.

    About Fallujah, the cluster they had there gave a point estimate of 200,000 deaths for that 3% of Iraq (739,000 people). The population of Fallujah has been said to be around 200-300k. As Fallujah was the main focus of the 739,000 people in that area, I think it would be quite unlikely that the Coalition killed almost the entire city of Fallujah…

    Would be great if they at some point did a wider study on Fallujah to get a more representative look of the casualties there.

    I have looked in the UNDP study, and I can’t really find anything that would be a very good indication of Iraqi mortality. I have written down 5 indicators, but I don’t feel any of them are more accurate than the Coalition death rates we already derived. They are: forced change of residence; damage to dwelling from military action; weapons never being shot in neighborhood; weapons being shot everyday in neighborhood; and household member a victim of crime in the past 4 weeks.

    On the first, 3 pairs are similar. On the 2nd, 4-5 are similar. On the 3rd, none are similar. On the 4th, 1-2 are similar. On the last, the differences are too small to discern similarity.

  3. #3 Donald Johnson
    October 20, 2005

    I’m trying to cool down the flame war, Seixon. I got myself banned from a well-known liberal blog (deservedly so) when I got very heated and rather obnoxious with the guy who ran it. I still think I had a good point to make, but when you make people angry that gets lost. (Incidentally, you’d have been on the other guy’s side–I was attacking this liberal from the left.) It’s best not to accuse people of lying and fraud unless you’ve got rock solid evidence. I exclude politicians, because their very job makes them almost certain to be liars, but it’s a good rule for most other people.

    On the number of insurgent deaths, if you put aside the possibility of respondents twisting the truth for various understandable reasons (which would make all surveys in Iraq very hard to interpret, not just the Lancet one), I think Tim’s number is probably right, but not certain. First, the people most likely to be killing insurgents are Americans, so if you’ve got male Iraqi deaths of military age and you know some were killed by Americans, then those are the ones most likely to be insurgents. The victims of violent crime are statistically almost certain to be mostly innocent civilians, simply because insurgents are a small portion of the male population.
    The insurgent-on-insurgent violence does occur, but as far as I know it’s very sporadic–I’d assume most of the insurgents who get killed get killed by coalition forces. If the largest number of people in the survey (taken at face value) who could be insurgents killed by Americans is two, then it’d be a real fluke if you had one or two more in the same survey killed by criminals or enemies in the insurgent movement. Though that said, the murder rate is incredible in Baghdad–Robert Fisk said there were 1000 bodies in the Baghdad morgues in July. So who knows who is killing whom and for what reasons? That murder rate could also include death squad killings by people associated with the Iraqi government, for that matter.

    I doubt Tim was even thinking of heart attacks and car accidents ( wasn’t), but again, with insurgents being a small chunk of Iraq, you’d expect them to make up a small fraction of the heart attacks and car wrecks, though one could argue about whether they’d be over or under-represented in relation to how many insurgents there are.

  4. #4 Donald Johnson
    October 20, 2005

    I forgot to comment about Fallujah. Roughly 25 percent of that neighborhood supposedly died, so I think that gives the point estimate of 200,000 for the province. Nobody believes that, of course. You use common sense–for one thing it wasn’t the entire province that was being bombed, AFAIK. One could argue that maybe it was 25 percent of Fallujah, around 50-75,000 and I saw an interview or a letter by the Lancet authors where they suggested maybe that was true. Still pretty hard for me to believe–I think we’ve had a major news blackout with regard to what goes on when Americans bomb places where Western reporters can’t go, but I have trouble believing that many people could have died without some indication leaking through. Maybe I’m wrong. But cut it down some more and my skepticism drops quite a bit. I did google a bit and found articles claiming there was a Red Crescent official who said 6000 people died in Fallujah and it wasn’t clear if he meant the final assault or the whole period. (How did this unnamed official know? I don’t have a clue.) It was hard to find anything about the period between the assaults, except for a NYT article that I clipped in the fall, talking about civilian casualties in the bombing, but giving no numbers. So I’m guessing many thousands of civilians died there, both in the bombing and in the final assault, and the Fallujah cluster in the Lancet reflects what happened in the hardest hit areas during the bombing. (Since I’m just guessing, maybe many of those Fallujah males were insurgents, presuming that their presence made the neighborhood a target.)

  5. #5 Seixon
    October 20, 2005

    In summary, saying that it could have been at most two is being a bit conservative. I’m not claiming any number, such as two, I am just saying that Lambert is being disingenuous by claiming that there is a certain number to be used here. The only reason he is doing it is because he wants to put up some kind of defense for the Lancet saying it was 100,000 excess civilians, when even the study doesn’t say that a single time in its text.

    As I also said, I’m not sure how Lambert can discount the possibility of insurgents dying in accidents, getting heart attacks, and so on. I also cannot see how he knows that those two Iraqi adults who were not attributed to anything are not insurgents.

    Face it Lambert, the number is not just for civilians, and the study itself never even claims this to be the case.

    Even if it was 95,000 civilians, and 5,000 insurgents… how can you say that saying “100,000 excess civilian deaths” is anything but a false statement, especially when the study never even says this??

    Geez…

    And the lack of comments about the whole sample of size n being equally probable of selection, why is that? Because the selection of sample size n to include households from Basra and Missan was impossible, whereas one that did not was possible? Is that why? Or am I just opening up yet another cupboard in hope of an elephant?

    What about the bootstrapping method, and its inapplicability to detecting the confidence interval to compensate for excluded areas? As far as I understood, that isn’t what bootstrapping is even for…

    What about the assumption that the provinces they paired were similar, without providing any substantiation, and me giving at least circumstantial evidence that this is not the case?

    No?

  6. #6 Seixon
    October 21, 2005

    As is becoming clear to me now, the Lancet study did not, and could not, have figured in the increased variance caused by the exclusion of the 6 provinces. Some at this blog talked about bootstrapping, and that this would solve this problem. From reading about bootstrapping, I do not believe this to be the case as it only mimics the results of resampling your population many times based on your original sample. In other words, if your original sample excluded 6 provinces, your bootstrap is not going to help you determine anything based on those exclusions since it depends on your original sample being a representative sampling of the population.

    In the Lancet study, they say this:

    This clumping of clusters was
    likely
    to increase the sum of the variance between
    mortality estimates of clusters and thus reduce the
    precision of the national mortality estimate.

    It “was likely” because they couldn’t determine what it was. Thus, the national mortality estimate does not reflect this added imprecision. Thus:

    As a check, we also used
    bootstrapping to obtain a non-parametric confidence
    interval under the assumption that the clusters were
    exchangeable.

    In other words, their bootstrap assumed that the 6 excluded provinces had similar levels of violence to their partner provinces, thus not alleviating the problems with their exclusion and the “likely” increase in variance they talked about earlier.

    Some were asking earlier whether they corrected for this pairing process. I think this just about answers that. What do I know though? I’m just an innumerate.

  7. #7 Seixon
    December 15, 2005

    Iraq Body Count released a new report including a break-down by province. Now we no longer have to go by the coalition death rates to find out if the pairings the Lancet study did were correct. Civilian death rates (deaths per million, to the nearest 10 deaths) by pair from the IBC report:

    1. Ninawa/Dehuk: 560/0
    2. Sulaymaniyah/Arbil: 80/130
    3. Tamim/Salahuddin: 970/960
    4. Karbala/Najaf: 920/750
    5. Qadisiyah/Dhiqar: 90/650
    6. Basra/Missan: 1250/50

    Yup. Those are all “similar” alright. Nothing to see here, Lancet is right, I’m wrong, moving right along….

  8. #8 Scott Church
    December 16, 2005

    Iraq Body Count derives their minimum and maximum estimates using specific data extraction methods that rely almost exclusively on a specific list of pre-approved media sources. Any deaths that went unreported or unmentioned by these media outlets was not counted. Their emphasis is on tabulating only those deaths for which there are tangible and specific records–no unreported deaths of any kind are counted.

    Their FAQ section at their web site contains the following statement;

    We are not a news organization ourselves and like everyone else can only base our information on what has been reported so far. What we are attempting to provide is a credible compilation of civilian deaths that have been reported by recognized sources. Our maximum therefore refers to reported deaths – which can only be a sample of true deaths unless one assumes that every civilian death has been reported. It is likely that many if not most civilian casualties will go unreported by the media. That is the sad nature of war. (my italics)

    In other words, Iraq Body Count themselves state that their count is low, and likely does not even account for a simple majority of civilian casualties. This qualification puts their estimate well within one standard deviation of the Lancet study mean.

    Seixon, it’s not enough to tabulate some convenient numbers–you need to consider where the data came from and what it actually represents. This is doubly true if you’re going to accuse another team of having questionable data and methods.

  9. #9 dsquared
    December 16, 2005

    Interesting … these figures are for the entire period since the war, though, not the period covered by the Lancet survey. I think that makes a massive difference because the Dhi Qar figure is going to be swelled by fact that Nasriya has been a big centre of the insurgency over the last year while nowhere in Qadisiyah has to the same extent. Also worth noting that IMO Iraq Body Count massively underestimates even what they are trying to count because when they get a report of “a family” having been killed they count it as four deaths when the average size of a nonextended family in Iraq is six. This is a completely ad hoc assumption made “for conservatism”.

    The problem pairings appear to be 1, 5 and 6 on Seixon’s list – the others look OK [ and I suspect that 5 looks bad because of the different sampling periods]. Of these, Ninawa/Dehuk doesn’t really contribute anything to the central estimate; they found a more or less unchanged death rate in this province as you’d expect because it’s in the far North. Missan/Basra was a quite significant contributor to the estimate, but in this case they sampled Missan which is the province with the lower IBC. So Seixon’s case that the grouping process makes a practical difference has to rest on Qadisiyah/Dhi Qar.

  10. #10 Tim Lambert
    December 16, 2005

    I put the numbers in a spreadsheet. In three cases the clusters were moved to a governorate with a higher death rate, in three cases to a lower death rate. Making the dubious assumption that the IBC isn’t a biased measure of the death rate in each governorate, I find that the net effect of the pairing process was to make a small reduction in the estimate of about 4,000 deaths. So Seixon should go with 102,000 instead of 98,000.

  11. #11 z
    December 16, 2005

    “In three cases the clusters were moved to a governorate with a higher death rate, in three cases to a lower death rate. “

    That’s bias! In fully half the cases, the higher numbers were used, whereas the lower numbers were used in only three cases!

    But seriously folks; these are total deaths, not death rates.

  12. #12 Seixon
    December 22, 2005

    As I posted somewhere else, the majority of the deaths cited by IBC have been sourced by mortuaries, medics, Iraqi officials, and police (in that order). Journalists were only the primary source for 8% of the deaths in the IBC database. So it seems like yet again certain people are resorting to convenient arguments that are not based in fact.

    In fact, four of the pairs had the most violent one chosen, with two having the least violent one chosen. Using the IBC numbers from BBC, and using the UNDP figures for the current population of Iraq, I tabulated the death rates on my blog.

    Here’s some numbers:

    Paired provinces (the 12):
    Population – 14.8M
    Civilian deaths – 7602
    Death rate – 510/M

    Sampled provinces (from pairs):
    Population – 8.4M
    Civilian deaths – 4348
    Death rate – 520/M

    Excluded provinces (from pairs):
    Population – 6.4M
    Civilian deaths – 3254
    Death rate – 510/M

    Sampled provinces (all):
    Population – 20.2M
    Civilian deaths – 23995
    Death rate – 1190/M

    Sampled provinces (excluding Anbar):
    Population – 18.8M
    Civilian deaths – 21700
    Death rate – 1150/M

    Unsampled provinces:
    Population – 7M
    Civilian deaths – 3375
    Death rate – 480/M

    All of Iraq:
    Population – 27.1M
    Civilian deaths – 27370
    Death rate – 1010/M

    All of Iraq (sans Anbar):
    Population – 25.8M
    Civilian deaths – 25075
    Death rate – 970/M

    If we go by what we would have expected following the IBC’s numbers, and the methodology of the Lancet study, we would have expected a mortality of 1150/M since Anbar was excluded at the end for being an outlier.

    Had all of Iraq been sampled, we would have expected 1010/M. Excluding Anbar, we would have expected 970/M.

    So, Lambert, if we’re going to play that game (which I think is pretty dumb really since it doesn’t really prove anything) is that the Lancet methodology would have overestimated it by about 16,000 deaths.

    970/1150 = 84% * 98,000 = 82,660

    Of course, this doesn’t really mean anything, since the 100,000 number is too imprecise to use such as this.

    My main point is that the pairs aren’t “similar” as they claimed in making their study, which throws their whole methodology off the wagon. Their methodology is only valid if those provinces were in fact similar, which they unfortunately, by two different data sets, were not.

  13. #13 Seixon
    December 23, 2005

    Have a holly jolly Xmas everyone!

  14. #14 Seixon
    December 28, 2005

    Obviously Lambert keeps ignoring this, which he has proven in recent comments where he continues to claim that the net effect of the pairings was that less violent ones were chosen on average, which isn’t true.

    I would appreciate an elaboration on this claim:

    I find that the net effect of the pairing process was to make a small reduction in the estimate of about 4,000 deaths

    Would you be so kind, Mr. Lambert?

  15. #15 Eli Rabett
    December 28, 2005

    He’s on vacation

  16. #16 Seixon
    January 1, 2006

    Obviously, and mine is just about finished. Meanwhile, I have solicited data from FAFO and the contact for the ILCS. Hopefully I will get it!

  17. #17 Seixon
    January 1, 2006

    In the mean time, I correlated some of the available ILCS numbers with the IBC numbers.

    The ILCS gives their number of 24,000 “war-related” deaths in Iraq, and then carves that number up into regions. I took these same region definitions, applied them to the IBC numbers, and did a cross-check.

    The only problem with this is that the ILCS data is current as of May 2004 (August 2004 for the northern region) and the IBC data is current as of December 2005. Not taking this into consideration, I found the following by adjusting around Baghdad’s rate as it was defined as a sole region…

    South:
    ILCS – 2420
    IBC – 630

    Central:
    ILCS – 990
    IBC – 870

    North:
    ILCS – 250
    IBC – 60

    The ILCS figures are adjusted by multiplying the original ILCS figure by 1.95, which was the factor between the ILCS and IBC numbers for Baghdad.

    This rough evaluation seems to show that the North and South were both underestimated by a factor of approximately 4, while the figure for Central is very close. This is what we would have expected as far as media reporting bias. Here I have attempted to quantify the factor this bias played.

    This matters little in the end… since most of the pairings are in the same region, and thus would be affected by reporting bias similarly (something I hadn’t even though of until now!).

    If I get more precise data from ILCS, I will be able to determine with more certainty that, in fact, most if not all the pairings conducted by the Lancet study were not correct.

  18. #18 Tim Lambert
    January 2, 2006

    Seixon, I gave details of my calculations in the linked spreadsheet. Your calculations in comment 212 are wrong. You fail to account for the fact that some governorates were oversampled after the pairing process.

  19. #19 Seixon
    January 2, 2006

    Lambert,

    I did not have any software to open your ODS file, but I guess I will find some, hopefully it is freely available.

    You fail to account for the fact that some governorates were oversampled after the pairing process.

    I’m not quite sure what you mean by this. The rate found in each governate was meshed into a national rate, while each governate in a pairing was supposed to represent two governates, not one.

    I will have a look at your numbers if I find software to open it, and get back to you.

  20. #20 Tim Lambert
    January 2, 2006

    You can use Open Office to open ODS files.

  21. #21 Seixon
    January 2, 2006

    From a cursory look at your spreadsheet, it seems like you are trying some hocus pocus here.

    First of all, you use the populations that the Lancet study gives, which are not correct. The UNDP figures for population size are more current than the ones the Lancet study uses.

    Second, you do deaths per cluster, then multiply this by the different numbers of clusters received in each of the rounds.

    The problem with this is that if they only had 2 clusters, they would of course correlate this differently to the entire governate they were sampling than if they had 3 clusters.

    In other words, let’s take an example from your spreadsheet. Ninawa.

    You give deaths per cluster in the initial sampling as 1216 for Ninawa. Then for deaths per cluster after grouping, you give Ninawa 1621 due to it receiving one more cluster from the grouping process.

    The problem with this is that you are not doing death rates, but the number of deaths. In other words, it doesn’t matter if additional clusters are given, because this is made up for by the population of the governate versus the number of clusters sampled.

    In other words, first we have 1216 for 3 clusters. Applied to Ninawa as a whole, each cluster representing 739,000 people… This would give a death rate for Ninawa of 1288 deaths per million.

    Second we have 1621 for 4 clusters. Applied to Ninawa as a whole, this would give a death rate for Ninawa of 1288.

    The death rate is still the same.

    So I might want you to explain how this shows that the study methodology actually underestimates the number by 4,000 deaths.

  22. #22 Donald Johnson
    January 2, 2006

    Seixon, I had a couple of questions about your 217 post. I’m confused about what you’re saying. Were the IBC numbers for Baghdad 1.95 times lower than the ICLS numbers (presumably we’re talking about violent deaths)? Also, what’s this adjustment you made? Were the IBC numbers 4 times smaller in the North and South?

    If I understand you correctly it sounds like you’re presenting evidence that the IBC numbers are a serious undercount. But that may not be what you’re saying.

  23. #23 Seixon
    January 2, 2006

    Donald,

    The ILCS and the IBC data are not directly comparable in their size, since the IBC data is current as of Dec 2005, while the ILCS data is only current as of May 2004. Beyond that, the ILCS numbers are a projection based on a survey, while the IBC data are based on actual reported deaths.

    What I was demonstrating is the differences between the different regions with the two data. It demonstrates that it is possible that the IBC data undercounted the Northern and Southern region by about a factor of 4, while the Central region was undercounted by a factor of a negligible 1.1.

    Lambert and others had bemoaned the reliability of the IBC data, saying that the more rural regions (North and South) would be undercounted. I challenged them to go on the record to say that the discrepancy was so much that the provinces would still be similar. No one has.

    The three most dissimilar pairings according to the IBC data differ by factors of 9, 23, ~500. However, two out of these 3 pairings lie within the same region as defined by the ILCS.

    Thus an underreporting bias with a factor of 4 would still not change the fact that these pairings were incorrect.

    I have solicited the raw numbers for each governate from the UNDP and FAFO to determine more accurately whether the inherent reporting bias in the IBC numbers plays any role in whether or not the pairings were similar or not.

    If this initial finding is any indication, Lambert & Co might want to start scrambling for the next excuse.

  24. #24 Tim Lambert
    January 4, 2006

    Seixon, you are confused again. Column F in my spreadsheet is a death rate. It’s not deaths per million people, but deaths per 739000 people (the cluster size).