Lancet study and cluster sampling

In an earlier post I
observed that "Seixon does not understand sampling". Seixon removed
any doubt about this with his comments on that post and
two
more
posts.
Despite superhuman efforts to explain sampling to him by several qualified people
in comments, Seixon
has continued to claim that the sample was biased and therefore "that the
study is so fatally flawed that there's no reason to believe it."

I'm going to show, without numbers, just pictures, that the sampling
was not biased and what the effect of the clustering of the governorates was.

Let's look at a simplified example. Suppose we have
three samples to allocate between two governorates. Governorate A has
twice as many people as Governorate B, so if they are not paired up, A
gets two samples and B gets one sample. (This is called stratified
sampling.) If they are paired up using the Lancet's
scheme, then B has a one in three chance of getting all three
samples, otherwise A gets them. (This is called clustered sampling.)
Seixon claims that this method introduces a bias and what they should
have done was allocate the three samples independently with B having
a one third chance of getting each cluster. (So that, for example, B
has a (1/3)x(1/3)x(1/3) chance of getting all three. This is called
simple random sampling.

We can see the difference each of these three procedures makes by
running some simulations. I used a random number between 1 and 13 as
the result of taking a sample in governorate A and one between 1
and 6 for governorate B and ran the simulation a thousand times. The
first graph shows the results for stratified sampling. The horizontal
lines show the distribution of the results. 95% of the values lie
between the top and bottom lines, while the middle one shows the
average.

i-e497654df91c66fa7f702b76989cb8c0-stratified.png

The second one shows the result of clustered sampling. Notice that the
average is the same as for the first one. This shows that by
definition
, the
sample is not biased. However, the top and bottom lines are further
apart---the effect of using cluster sampling instead of stratified
sampling is to increase the variation of the samples.

i-d4a4804d034966444912410c20346bd8-cluster.png

The third one shows the result of simple random sampling. The average
is the same as the previous two. There is less variation than for
cluster sampling.

i-75e14b060713c3a505d07e7e19f74afa-simple.png

The last graph shows simple random sampling but with two samples
instead of three. The average is the same as for the others, and the
amount of variation is about the same as for cluster sampling. In
other words, the result of cluster sampling is just like simple random
sampling with a smaller sample size. The ratio of the sample sizes
for which cluster sampling and simple random sampling give the same
variation is called the design
effect
. In this
case it is roughly (3/2=1.5). In our example governate A was
quite different from governate B (samples from A were on average
twice as big). If A and B were more alike then the design effect
would be smaller. That is why they paired governorates that
believed were similarly violent. If the governorates that they
paired were not similar, it does not bias the results as Seixon
believes, but it does reduce the precision of the results,
increasing the width of the confidence interval.

i-61d4c660abf4b60869d39cd2e22935ba-simple2.png

Seixon offers one more argument against clustering---if clustering is
valid, why not put everything into just one cluster? The answer
is that although that would not bias the result, it would increase
the design effect so much that the confidence intervals would be
so big that the results would be meaningless.

This article by Checchi and Roberts goes into much more details of
the mechanics of conducting surveys of mortality. (Thanks to Tom
Doyle for the link.)

Tags

More like this

Devil's advocate: how was the researchers' estimate of the success of pairing violent and non-violent provinces reflected in the Lancet study's final confidence intervals? Realistically, how could they have factored in whether a subjective assessment of provincial violence levels was accurate or not?

Recall that one of the study's most surprising conclusion was the upshot of the lower confidence interval, pegging a lower limit of net deaths at 8,000, suggesting there was almost zero possibility U.S. actions to that point could have saved more Iraqis than they killed. Many commenters (Dsquared, etc.) made a big deal of this at the time, and rightly so. But a minor adjustment of the variance at that fringe might have had a significant effect on the study's reception.

Not arguing with you (I think the "pairing bias" is just another red herring) but I do think you're begging the question a little, there.

(And, truly, the researchers could have avoided this line of criticism entirely by doing more to document their province-pairing rationale than just attributing their choices to their own "belief:" an unfortunate choice of word, that.)

BruceR,
From what little I've learned from Wikipedia and an old Navy manual, any "adjustment" to the study that would have included zero (or less) in the CI would have made the results wildly inconclusive. The only thing that makes the study meaningful is that the bottom limit of the CI is significantly above zero (aka, our CI was huge, but the bottom limit shows an effect exists). In my ill-informed layman's perspective, this is why the study isn't really useful as there a few factors (ambiguous child mortality data, unmeasured "faith" in pairing) that, taken into account, would move the bottom limit below zero.Someone please correct me if I'm wrong.

I'd like to strike my "unmeasured "faith" in pairing" example in my previous post. In the previous Lancet study thread, Kevin Donoghue says that this has been accounted for.

BruceR, they used the results they found to boot-strap a probability distribution, so the more different the pairs are, the more variation in the distribution and the larger the confidence interval you get.

jet, you are wrong. Both about what it would mean if the CI just dipped under 0, and about your suggestion that those factors would do so.

I guess Lambert has redefined stratified sampling and cluster sampling.

Stratified sampling is where you pick out randomly from different strata. Say, you pick a total of 600 people from one strata and 300 in a strata that has 50% the population, according to PPS. You pick these people across the whole strata randomly.

Cluster sampling is the same thing, except you would bunch 30 and 30 people together, making 30 clusters of 30 people. Then you would distribute these 30 clusters across the two strata (areas) randomly. When you get into choosing the persons you will sample, you distribute the clusters that each area gets via SRS according to city, commune, whatever according to PPS. So say a cluster lands in a city, you interview the 30 people in that cluster in that same city.

The difference between stratified and clustering is that with stratified, you would choose each person randomly across the area. With cluster sampling, you bunch up 30 people and they are all from the same vicinity just as a single person would be in stratified.

Now, Lambert here has apparently redefined cluster sampling to something that is completely incongruent with every example of cluster sampling out there.

In cluster sampling, you do not cluster clusters as the Lancet did. No Lambert, that is not called multistage clustering either. Stop lying.

So now you have succeeded once again in putting up a smokescreen strawman, and then demolishing it. Good job. Not only that, but you are being intellectually dishonest here. What you just demonstrated - that's not cluster sampling. You also gave a very misleading definition of stratified sampling. Governate A would not get "two samples" while B "one sample". With stratified sampling, each governate gets one sample, but they are of different sizes according to PPS. The elements of these samples are chosen via SRS across the entire strata.

Then you proceeded to invent a new definition of cluster sampling. I asked you to show me a single example of a "clustering of clusters" study, literature, anything. Instead of doing so, you just further your dishonesty and whip up a few charts that will make the less inclined believe you. "Lookie, pictures!"

I know perfectly well what sampling is, and I know what stratified and cluster sampling are as well. What you have demonstrated here is nothing short of ludicrous. All you have to do is read the excerpts I have printed in my most recent post on this subject to see that Lambert is not being candid.

Also missing from this "rebuttal":

1. Explaining how it was legitimate to non-randomly pick provinces to pair up.

2. Explaining how it was legitimate to pair up provinces without any rationale other than a "belief".

3. Explaining how clustering of clusters is even supported at all in statistics.

4. Explaining how a sample is still random when 2 consecutive non-random decisions affect the entire sample.

5. Explaining how a sample is still random when the provinces in the pairings ended up with a far higher chance of being excluded from the sample than the rest of the provinces.

This is just ridiculous Lambert, I am laughing my ass off.

This is what I have learned from this and other threads:

The man has more energy than you, or I, or a batallion of arguers-from-Enlightenment-principles.

Best,

D

Congratulations, Seixon. Just when I thought you couldn't possibly be less persuasive...

Seixon,

1. Every aspect of a survey design does not have to be chosen randomly. Do you think that the sample size should be chosen randomly? The size of each cluster? The sampling unit?

2. This was expliened in my post. Pairing dissimilar provinces increases the width of the confidence interval.

3. Ummm, it's text book stuff. You really did do a course in statistics? Look at, for example, the [notes](http://www.ms.unimelb.edu.au/~s620374/sampling/index.html) from a University of Melbourne course on sampling. Read the chapter on cluster sampling. Though you should probably read the one on stratified sampling as well.

4. This one has been explained to you multiple times.

5. I drew you a nice picture in my post. Did you notice that the average was the same no matter which method was used?

OK, I answered five of your questions. Now answer two of mine. Where did you do this stats course where you claim to have gotten an A? And what was the name of your onstructor?

Bruce R: "how was the researchers' estimate of the success of pairing violent and non-violent provinces reflected in the Lancet study's final confidence intervals?"

Tim L. "they used the results they found to boot-strap a probability distribution, so the more different the pairs are, the more variation in the distribution and the larger the confidence interval you get."

Bruce, the paper states they used the bootstrap method to determine that all clusters were interchangeable (using the null hypothesis that they were interchangeable-really uninformative). So, they determined that they did not have to account for the variance between clusters in the final analysis. But, we can see from the results that the clusters varied enough and the usual 'shrug, good enough for such a population' is only reasonable in a study that is replicated and gives the same result.

Tim,

1. No, but every decision to do with actually choosing the sample has to be random. You know this, so why are you trying to run away from the truth?

2. Would you like to show me any literature or ANYTHING that shows clumping of clusters in a 2nd stage of cluster sampling??? What you said would be true if they had just started out with 11 clusters instead. Then the CI would be expanded as you say, but the sample wouldn't be biased. The way the JHU team did it, they biased it by treating different provinces unequally and ensuring that some had more of a chance of being in the sample than others, even taking PPS into consideration.

3. Want to point out where in that PDF file it talks about clumping of clusters??? It even says in that file that you do SRS of clusters. So again, when are you actually going to show something other than the Lancet study that uses clumping of clusters??

4. Yes, Kevin tried to explain it, but he was using expected values, which is not what we were after. We were after the probabilities, which were altered by the pairings. For example, Missan would have had a 61% chance of being sampled in the initial round, while in the 2nd paired up round, it suddenly only had 34% of being sampled. In other words, in the first round, Missan had a 39% chance of not being in the sample. Yet after being paired, it suddenly had a 66% chance of not being in the sample. This was of course due to a completely non-random process.

That's not biasing the sample??? Give me a break Tim.

5. Yes, very nice pictures. Only problem is that they don't mimick the entire Lancet process, and you cannot claim that Missan still had an equal chance (PPS) of being sampled because the numbers are right there to tear you down.

What you said is "cluster sampling" is not cluster sampling. In cluster sampling, you don't distribute clusters with a winner-takes-all approach. You do it via SRS, just like the PDF you linked to says. Do I really need to quote your own source against you?

Come on Tim, level with us.

But, we can see from the results that the clusters varied enough and the usual 'shrug, good enough for such a population' is only reasonable in a study that is replicated and gives the same result.

If I can tease this out a bit:

I used to replicate studies in lab with test tubes, petri dishes, 2" pots - controlled environments. When I did field studies of veg crops, I did not 'replicate' studies exactly, as I could not duplicate weather, soil moisture, insolation. A team going back to Eye-rack and performing a survey will not be replicating the ululation-inducing Lancet study.

Best,

D

Dano,

I think the point was that Lambert's little strawman doesn't even begin to draw parallels with the Lancet study.

What he is essentially doing is showing a graph of E(X) by doing 1,000 trials. The problem with this is that he has chosen some arbitrary numbers, and we don't get to see how often governate B is really chosen for the sample. Not to mention that his "cluster sampling" isn't really cluster sampling. I'd be tempted to make a much more illustrative example, one with real cluster sampling, and then one with the Lancet method, and see how those compare.

In fact, him saying that stratified sampling means that A gets two samples, while B gets one is just wrong. In stratified sampling, there is only one sample per strata, although the size of it will vary with the population in each strata. Lambert's simplistic example doesn't give us any sense of this. Also, the mortality rates in Iraq are not random, such as his numbers in this example are. Just one of many things wrong with this smokescreen...

It seems that, with one exception, we are all agreed that the method used produces an unbiased sample. The interesting questions relate to the possibility that the sample may be a freak. To my mind the best answer to this is to look at Figure 1 on page 3 of the study. What jumps out at you is that Sulaymaniyah is the only place where mortality fell and Kerbala is the only place where it held constant. Everywhere else things got worse and in several cases they got a great deal worse. In the light of this it is very hard to believe that mortality could actually have fallen.

The CI tells us the same thing in a more erudite way. An interesting thought-experiment is to ask: how much do you have to increase the standard deviation of the sample in order to get the result Bruce R and Jet are interested in, with enough of the distribution in negative territory to leave intact the hypothesis that the invasion actually reduced mortality. Bear in mind that a "one-tailed" test is appropriate since the alternative hypothesis is that mortality rose, whether by a little or by a lot. So we have to go from having the 2.5% mark at 8,000 excess deaths to having the 5% mark at zero deaths. By my rough calculation, using a normal distribution, we would need to increase the standard deviation of the sample by about one-third.

By Kevin Donoghue (not verified) on 03 Oct 2005 #permalink

Eudoxis, Dano:

Regrettably, nothing you're saying would make one assess the Lancet study's methodology as equal or superior to, say, the UNDP study's methodology. The big advantage, of course, is the Lancet study could be performed *faster.* Concur the results are non-replicable.

I'm not suggesting the probabilities are not adequately accounted for in the confidence interval, nor does it appear any problem with province-pairing could possibly have influenced the mean so significantly as opponents have suggested. But a study that at first glance appeared to say Iraqi fatalities were 98,000, plus or minus 90,000, is hardly an example of sterling precision: dare I suggest it may even not be worth Tim and others nailing themselves to the prow for it?

The research team evidently made some decisions about personal safety, cost, and time to publication that are still fair game for second-guessing, even if the statistics themselves are beyond question (sorry, Seixon). But given a choice between this and the UNDP paper for a cite on the cost of the war, would anyone now pick this one? Which still makes me wonder why the team took the road they chose, opting for publication speed over the greater measure of accuracy (and narrower interval) one would think they might have been able to obtain through a broader sampling of Iraqi households, for instance.

Kevin,

I don't think anyone is suggesting that the mortality in post-invasion Iraq went down. You are thrashing yet another strawman.

BruceR, you still haven't admitted that your own calculations cemented my findings that the pairings of the provinces were completely fraudulent using coalition mortality as an indicator, since the JHU team used NO indicator, well, aside from "belief".

I guess you all see it fine for Lambert to warp the definition of cluster sampling, where his own source proves him wrong, and toying around with the definition of stratified sampling.

His picture examples here don't even demonstrate what the Lancet study was all about, or anything that has to do with my main points.

The sample was biased towards central Iraq, and biased towards the more populous regions in Iraq. The probabilities for exclusion from the sample were greatly amplified by the pairing process, e.g. violating the definition of a random sample.

You all still don't seem to give a damn that Lambert keeps claiming the pairing process is consistent with cluster sampling, although he has not given a single example or any literature citing this to be the case. In fact, the literature he just cited proves him wrong, as it says cluster sampling is done by distributing the clusters via SRS.

Lambert's example demonstrates, I guess, a single cluster sample. Not enough with that, the random numbers his samples generate doesn't even have anything to do with the Lancet study, as the samples there did not produce random numbers.

The denial is astounding here. None of these issues are relevant, it seems, because they are too bothersome to try and invent excuses for.

That Lambert's own sources prove him wrong seems to alert nobody. It seems I have come to an echo chamber.

The UNDP study is superior because it didn't cut corners and bias their sample for the sake of convenience.

If the JHU team wanted to cut down on travel, why didn't they just use 15 clusters instead of 33, and triple the sample sizes of those 15 clusters?

Oh, right, because 15 clusters would be frowned upon, even though that would have been an unbiased sample. Instead, they toyed around with their sample to get it the way they wanted it, so they could still claim to have 33 clusters (important since 15 would give a considerably higher DE).

BR:

I merely chose the opportunity to clarify replication, as some of The PosseTM innocently or purposefully don't understand replication, and they have spread the confusion.

Certainly the Lancet paper can be improved upon when bombs aren't raining down from the heavens. I'm not arguing that one paper is better than another, and never have. I merely argue that the Lancet study is robust - and a first - and thus can be improved upon.

Hint: loud, long ululation is not a valid rebuttal.

Best,

ÐanØ

I don't think anyone is suggesting that the mortality in post-invasion Iraq went down.

From the bottom of my heart, thank you. That is the only important conclusion. If you believe it, then you believe in the Lancet study.

I actually believe that Seixon got an "A" in a statistics course. Everyone who did stats at university goes through this stage of believing that any deviation from the Platonic Form of the statistical study is a horrible sin which cannot be redeemed. The attitude usually survives about ten minutes into the first practical assignment.

Or to put it another way, Seixon, let's cut to the chase. The only way in which the grouping of the clusters would have affected the randomness of the sample, is if it was informative. In other words, the sample was random unless the clusters were grouped by someone who knew that he was doing so in order to group low-violence clusters with high-violence ones in order to eliminate the low-violence ones. In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey.

Do you really have the balls to accuse the survey authors of this? Are you prepared to do so under your real name and accept the potential legal consequences of doing so?

Bruce R,

The main reasons given for doing a small-scale study were: a limited budget, the risks faced by the survey teams and the hope that the occupation authorities could be pressured into doing a larger study by having it demonstrated that the thing can be done. The theory that the work was rushed doesn't hold up. A rushed job wouldn't stand up to hostile inspection the way this paper has.

As to choosing between the Lancet and the UNDP, we don't really have to. The sensible thing to do is look at all the respectable work which is available. In one of the papers which Tim Lambert links to (and BTW, thanks to him and Tom Doyle for that) Checchi and Roberts list seven estimates of violence in Iraq. If Roberts is happy to cite other evidence there's no reason why the rest of us should hesitate. But attempts to discredit serious work using arguments like Seixon's fully deserve to be shown up for what they are.

Roberts also raises an important point about confidence intervals. In the sort of humanitarian emergency he is mostly concerned with, in places like Darfur, should relief agencies really insist on 95% confidence before they announce a finding that a serious increase in mortality has taken place? That approach was originally adopted for laboratory use, for testing whether certain drugs enhance the sex-drive of hamsters and suchlike questions. It doesn't make much sense when applied to famines, wars and epidemics. Perhaps the sensible thing would be to publish a table or figure giving a whole range of confidence levels. If nothing else, it would discourage the Kaplan's of this world from waffling about dartboards.

I actually believe that Seixon got an "A" in a statistics course.

dsquared, I think you might usefully look at the previous thread, where Tim Lambert and John Quiggin, amongst others, formed a different impression. Still, I know you don't shy away from expounding surprising theories. I look forward to a post on CT or your own blog about the great potential of Seixonian Probability Theory.

By Kevin Donoghue (not verified) on 03 Oct 2005 #permalink

I think you might usefully look at the previous thread, where Tim Lambert and John Quiggin, amongst others, formed a different impression.

I have the greatest respect for Tim and John both, but working in universities, I think they both have overoptimistic estimates of the correlation between grades and understanding of the underlying subject. For what it's worth I got an alpha minus from Oxford University in International Economics but I still have to have the Ricardian theory of comparative advantage explained to me every couple of months.

Tim Lambert:

This article by Checchi and Roberts goes into much more details of the mechanics of conducting surveys of mortality. (Thanks to Tom Doyle for the link.)

You're quite welcome, and thank you for the gracious acknowledgment.

Unfortunately, Relief Web changed the URL for the Checchi/Roberts article, so your link (and the one I provided in an earlier thread) doesn't work. The current correct URL is:

http://www.reliefweb.int/rw/lib.nsf/db900SID/KKEE-6GEQEN?OpenDocument

All the best,

*[Fixed. Thanks again. Tim]*

By Tom Doyle (not verified) on 03 Oct 2005 #permalink

Bruce R.

"Regrettably, nothing you're saying would make one assess the Lancet study's methodology as equal or superior to, say, the UNDP study's methodology."

My comment was merely a specific answer to your specific question about estimating error associated with the difference between violence in the regions. Shorter answer: they didn't estimate that error.

Dano, I think everybody is very aware that this study can't be replicated on the ground. That's why the less robust method of bootstrapping can be used to estimate errors for a cluster data set with heterogeneity between clusters. You might want to read up on bootstrap replications.

Dano, I think everybody is very aware that this study can't be replicated on the ground. That's why the less robust method of bootstrapping can be used to estimate errors for a cluster data set with heterogeneity between clusters. You might want to read up on bootstrap replications.

I was unclear. My comment was for Stevie Mac's PosseTM.

Apologies.

D

I apologize in advance for hijacking this thread, which truly is about something else, but I do think the more interesting question here is the one Kevin D. has raised more eloquently than I was able to: specifically, given the nature of the topic, the degree to which the first serious study (and I do agree it is that) on any highly contentious and politically charged sociopolitical issue such as this, should sacrifice exactitude for timeliness/impact. Kevin is of the perfectly defensible humanist view that given the choice, scientists might want to bend toward what could be seen as the greater good (ie, getting the word out in time to save lives). I find I have qualms with that. But to be fair, my training is as a historian and perhaps I reflexively am taking the longer view.

The statistics lessons have been both useful and a fun college refresher. But I'm still gravitating back to what I see as the larger issue.

Take it as a hypothetical, instead. If the Lancet had a choice between this article in October, before the American election, and another survey with a significantly tighter distribution that wouldn't be ready until January, should this paper have been the one they published, assuming they could only publish the one? What about a study with tremendous precision that wouldn't be publishable until 2010? I just wonder how far one should go in keeping one's statistical powder dry in such situations.

Good questions, BR, and this gets back to the philosophy of reductionist science.

Many who wish to maintain objectivity use the Platonic model and Cartesian methods - the subject-object relationship. What we've found, however, is that this relationship allows the object to be devalued and thus expolited.

My experience is with plants, and Russian botanists don't do random sampling, they do relevés and do other things to eliminate bias. This method also allows them to become intimately involved with a place, which narrows the gap between subject and object.

Narrowing the gap between subject and object makes it harder to exploit the object.

Now, Bruce, in your example I'd say the scientists who published when they did (your presumed early) had a narrower gap between subject and object. Is the object they studied less subject to devaluing?

Well, we have a long way to go, but we can see where the start is from here.

Best,

D

dsquared, working in a university I think I have a pretty good idea that students can get a good mark in a course with only a superficial understanding of the subject matter. Their knowledge is often extremely fragile -- change things around a little bit and they are lost, though most realize this, unlike Seixon.

Seixon, I would like to contact the person who taught the statistics course you did. Please tell me their name and institution.

dsquared,

"From the bottom of my heart, thank you. That is the only important conclusion. If you believe it, then you believe in the Lancet study."

OK, let me see if I follow you. Since I am not mentally retarded and understand that mortality went up in Iraq post-invasion, you know, due to the war and all... Then I agree with anything the Lancet study says? Then I believe that the Lancet study is bullet-proof? What kind of logical disconnect is that?

I know for a fact that mortality went up post-invasion, that is only logical. There was an invasion, the US military shot down thousands of Iraqi soldiers, and thousands of civilians always get killed in any invasion, especially considering that Saddam stored weapons in hospitals and within civilian infrastructure like the coward he is.

What does that have to do with the Lancet study? This study takes almost every shortcut possible, and completely undermines their own study by using methodology that it seems they have invented entirely on their own in order to either get the results they wanted, or ensure that their study looked more robust than it was.

No one has commented on how Lambert reinvented what cluster sampling is, and was very misleading about what stratified sampling is. Why is that?

No one seems to want to elaborate on why Salah ah Dinh, for example, was paired up while other provinces were not.

These were not random decisions, but arbitrary decisions made by the team. As Lambert's own link shows, when you do cluster sampling, you distribute the clusters via SRS.

That is what the Lancet study did in their initial phase. If they had left it at that, there would be virtually nothing to complain about with this study. Alas, they didn't.

I have challenged Lambert again and again to show me any shred of evidence that cluster clumping is an accepted methodology, only to have him create a bunch of irrelevant strawman pictures, redefine cluster sampling, and call it a day.

dsquared:"In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey."

Not necessarily. They could have genuinely believed that the provinces were similar, even though there was no rationale for explaining that, and they still were, by any indicator, wrong.

That is why I say that they introduced an unknown bias to the study, or at least, that they biased the study towards central Iraq and towards the more populous provinces of Iraq. What the result of this bias is impossible to know, since we don't know what the real mortality is like in the unsampled provinces.

To use an example, they paired up Texas and Arizona by assuming they were similar without showing any reason to believe so, and then oversampled Texas by excluding Arizona. Now, doesn't this bias the sample towards Texas? Yes, it does. The only way we could establish that this wasn't a bias was to show that from a previous stufy, that the two states were similar along the lines of what we were sampling for. The Lancet team didn't do that.

So, it doesn't matter if they had a nefarious intent or not. The result is still the same.

dsquared:"Do you really have the balls to accuse the survey authors of this? Are you prepared to do so under your real name and accept the potential legal consequences of doing so?"

Well, given the other evidence, that they cut out about 25% of Iraq in the name of "safety" and then ensured that Fallujah was in their sample... that the Lancet website lied about the conclusion of the study... that this led to it being used in headlines across the media right before the presidential election... that they made up rationales to pair up provinces... that they used unsupported methodology... that it's no secret that Les Roberts is against the war...

I don't know, seems to me that the evidence points in the direction of some intent to guide the results of this study in a certain direction.

What legal consequences would there be for me to come out and say that they manipulated the study for political reasons? Exactly, none. Not trying to stifle dissent now are you?

I cannot prove that this was the case unless I was given access to all of their work and files, but the evidence is definitely there that something doesn't smell right.

Dano,

"Certainly the Lancet paper can be improved upon when bombs aren't raining down from the heavens. I'm not arguing that one paper is better than another, and never have. I merely argue that the Lancet study is robust - and a first - and thus can be improved upon."

Robust? Oh man. A study that has a CI of 8,000-194,000 is "robust"? A study that uses an unsupported method of clumping clusters is "robust"? A study that pairs provinces according to nothing other than a "belief" is "robust"?

Wow. I'm guessing if you were a Bush-supporter, you'd say that the Iraq war was the most brilliant thing ever undertaken, if your standards are that low.

Oh, apropos bombs raining down: the UNDP carried out their study in April-May 2004. This was when the battle in Fallujah was taking place, and was the most violent time of the entire post-invasion period.

Magically, they managed to interview 21,668 households, in all 18 provinces of Iraq. This also shows that the JHU team could have carried out a better and more thorough study if they had really wanted to do so.

Which brings me to BruceR's point about the hypothetical of them waiting longer to do a better study. What about it folks?

Why didn't the JHU team go with their original cluster sampling, you know, the initial one they had before they decided to slice and dice it as they saw fit?

Hell, they could have cut it down to 20-25 clusters and just increased the number of households in each and this would have cut down the amount of travel since not all provinces would have been sampled via SRS distribution of the clusters.

This would have made a more precise study, but would perhaps have taken a few weeks more to conduct, possibly a month. What was the profound need to release the study before the presidential election? If that was such a need, why did they not carry out the study earlier and spend more time on it?

The UNDP was there in April-May 2004 (and August 2004)... why couldn't the JHU team do that?

See, I'm asking a lot of questions, because I know that most of the answers will be very uncomfortable for most of you to answer.

Lambert has invested his entire credibility into this study, so I don't foresee him ever conceding anything about it. Everyone should take notice that he has gone into redefining cluster sampling in order to seem like he is correct. I am still waiting for any shred of evidence that supports cluster clumping as statistical methodology.... Tick, tock.

Seixon: name of institution and your instructor please. People are going to start wondering if you ever did a stats course if you fail to answer.

Lambert,

Look, I know you didn't like getting caught with your pants down redefining cluster sampling at all, but this arguing from authority thing is really starting to piss me off.

I took stats in high school, and I took stats in college in 2004 and got an A. What college, and what teacher I had, is none of your damn business. Especially when you can't even answer my questions and continue to create strawmen instead of actually debating my points.

Your own source proved you wrong, you gave misleading or false definitions of cluster and stratified sampling, you won't even comment on the obvious bias of arbitrarily selecting some provinces for this and that, and you won't comment on how the probabilities for being included in the sample were fundamentally altered by the unsupported clumping process. The clumping process for which you still have not even shown an example of in either literature or a study.

That's a whole lot of loose ends Lambert, and knowing the school and teacher I had for stats won't really help you at all in tying them up.

Here's a report from July 12 2005 by an Iraqi group claiming that 128,000 Iraqis have been killed in the war thus far--

http://washingtontimes.com/upi/20050712-090927-2280r.htm

On the Lancet survey, I suspect the authors would have done a larger one if they'd had the resources and time. They probably would have liked to have done more specifically on Fallujah (just to see if their one neighborhood was a fluke). I don't see anything wrong with trying to publish before the elections, though it was naive to think many Americans would change their vote as a result (if that was the idea).

And Seixon, the point of Lambert's simulation is to illustrate the effects of clumping clusters--it doesn't change the expected value, but increases the spread, which is what everyone has been saying. The more unlike the provinces are, the more the spread. I've found this discussion to be educational, but the point has been made pretty clearly now, over and over again. As for whether the Lancet authors engaged in deliberate fraud, it's all speculation. The number they got wasn't that out of line with the UN survey.

By Donald Johnson (not verified) on 04 Oct 2005 #permalink

Seixen, I'm sorry to say this but I think you're out of your depth. Look, I don't like Lambert's style either and I think he has few qualms about being misleading if it helps him nail an ideological enemy to a cross... but on this issue you're not fighting Lambert, you're fighting the English language. And losing. I think this sums up the argument about bias:

dsquared: "In the bluntest terms possible, the sample was random unless the grouping procedure was fraudulently carried out by the survey team in order to intentionally create a dishonest survey."

Seixen: "Not necessarily. They could have genuinely believed that the provinces were similar, even though there was no rationale for explaining that, and they still were, by any indicator, wrong."

The bias you mention above is just about getting things wrong. But the point is if they were being honest, a priori, the survey team could equally have been wrong in either direction. So no bias.

That doesn't mean the conclusion of the survey is correct. Kevin Donoghue raises the question of whether the result was a freak result... which is perfectly possible in an unbiased survey and a reasonable question to raise. But that is unrelated to issues of bias.

The bias you hint at elsewhere is intentional fraudulent sampling (as dsquared said). Even if this is true (and I have no comment on that), this is not sampling bias. You sould admit this semantic point to show you are sincere.

btw, given our disgraceful defamation laws you could get sued for calling the survey team liars. Indeed -- Tim could sue you and you could sue Tim for comments on this board! Defamation laws should be changed, but that's a different debate.

Lambert's simulation is a fraud. The only reason it works out that way is because he let the sample result be random, which it wouldn't be if you were measuring mortality. Also, his simulation is the same as conducting a cluster sample with one cluster, just like the Lancet method is. That is not cluster sampling.

Let's say you want to interview 90 households in Basrah and Missan.

With stratified sampling, you would sample 60 in Basrah and 30 in Missan, randomly chosen within each.

With SRS, you would sample 90 across the entire area of both combined.

With cluster sampling, assuming 3 clusters with 30 households each, you would randomly distribute the 3 clusters with SRS using PPS. This would most likely end up with Basrah getting 2, and Missan getting one. Within each provinces, the clusters would be placed randomly, and then 30 households from each of those locations would be sampled. Obviously the precision rises with the number of clusters, as if you have 90 clusters, it is the same thing as stratified sampling.

Now, Lancet didn't do any of this. As Lambert so dishonestly portrayed as cluster sampling, they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it. That leaves 90 households sampled in one of the provinces, and none in the other. That would be fine, except that the cluster would be larger than all the others clusters used otherwise in the study, which biases it towards Basrah as the one who most likely would win it.

Lambert, you should redo your example a little more honestly. Meaning, the simulation must be more realistic and the mortality rates of each provinces will not be random as you made them.
If you do this, you will get a different result.

Of course, this also doesn't take into account that the provinces who went through this were not randomly chosen to forego this process, as Lambert's simulation takes as a given.

In other words, if you used what Lambert describes as "cluster sampling" above, then Baghdad would end up with all 33 clusters. That isn't cluster sampling.

Doing a cluster sample with one cluster, with all the households in that single cluster, would be a more honest approach of accomplishing the same thing. Of course, this would, I'm guessing, provide a horrendous DE.

John,

"The bias you mention above is just about getting things wrong. But the point is if they were being honest, a priori, the survey team could equally have been wrong in either direction. So no bias."

A bias is when some part of the population is more or less likely to be chosen than the rest. The way they conducted this study, the population that existed within the provinces not selected for the grouping process, they were given a higher chance of being chosen than the others. That is exactly what bias is.

The sample was biased towards central Iraq and the more populous regions of the pairings.

Whether this produces a higher or lower mortality rate doesn't matter, because we cannot know this due to not knowing the mortality rates of the 6 provinces that were excluded.

Let me throw this one out to Bruce: I suspect that Roberts knew

1. That there was no excess mortality survey being done

2. That he did not have much funding (he knew that for sure)

3. Any survey large enough to come to the attention of the US or UK would be closed down.

4. No further surveys would be allowed once the initial results were published.

So he had one chance to do a limited survey.

A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

my mistake, this is just nonsense and I no longer have any idea whether Seixon is being a blackboard nitpicker or just blowing smoke.

I know for a fact that mortality went up post-invasion, that is only logical. There was an invasion, the US military shot down thousands of Iraqi soldiers, and thousands of civilians always get killed in any invasion

Indeed, which is why one would obviously expect the excess deaths number to be positive if the USA invaded, say, Sussex. However, the country that they actually did invade was Saddam-era Iraq, in which one would have thought that the pre-invasion death rate was higher than it needed to be because Saddam was murdering people. For the excess deaths to rise, it would have to be the case that, over a period of eighteen months, we killed more of them (or more exactly, more of them died as a consequence of our actions) than were dying before. In, as I say, Saddam's Iraq.

So, there was decent reason to believe that the estimated excess deaths figure would have been negative, or that if it was positive it would be a low enough number that zero would be well within the 95% confidence interval. So if you believe that zero is *not* in the 95% confidence interval, then you believe this either on the evidence of the Lancet study (which is the only study to have given an estimate of total excess deaths) or on no evidence at all. Since you claimed to have done a statistics course, I assumed that you had concluded that the death rate went up based on the evidence rather than based on strange and incorrect a priori ideas of your own.

You are also guilty of two fairly fundamental misrepresentations of the study's results. Since soldiers in the Iraqi Army would not have been part of households during the immediate pre-war period, their casualties are unlikely to have been a material contributor to the excess deaths estimate. And you are also wrong in your implied claim that the excess violent deaths are concentrated in time around the months of March and April 2003; there is a chart which demonstrates that they are not.

Meanwhile, I am glad to see that you have retreated from your claim that the grouping process was "like grouping Texas with California" to "like grouping Texas with Arizona". You are still a fair ways off in your analogy though, since Texas is not geographically contiguous with Arizona. Perhaps the next stage of your analogy ought to be "like grouping New Mexico with Arizona" or "like grouping Wyoming with Montana", which would rather make it clear how weak an argument you have here.

by the way, this assertion:

they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it

is wrong, isn't it?

and this assertion:

Hell, they could have cut it down to 20-25 clusters and just increased the number of households in each and this would have cut down the amount of travel since not all provinces would have been sampled via SRS distribution of the clusters. This would have made a more precise study

is also wrong for most sensible assumptions about within-cluster and between-cluster variance, isn't it?

Be clear here; you are accusing a team of eminent scientists of either incompetence or dishonesty here. If proved, this would be enough to wreck a career, which is why accusations of this sort ought not to be made anonymously.

IANAL, but from what I know of libel law an allegation against a respected scientist made in, say, an unsigned pamphlet printed by a bunch of school kids is unlikely to result in damages being awarded. Apart from the fact that the kids probably have no money, it would be difficult to argue that the scientist's reputation has really been harmed.

Les Roberts is about as likely to sue The Onion as to sue Seixon.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

Eli,

1. There was a survey being done. The UNDP was doing one in April-May and August 2004.
2. Why even fund a study that will provide such lackluster precision? Oh wait, I know!
3. Ludicrous. The UNDP was doing and did a study comprised of 21,668 households.
4. Again, ludicrous for the same reason as above.

Thus again showing that there was no imminent need to do this survey... other than trying to influence the US presidential election... and the precision? Didn't matter, apparently. Why not? Guess.

Kevin,

"A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one."

If Missan had not been paired up, it would have had a total chance of 61% to be in the sample (using a binomial distribution with p=0.028 and n=33). The result of the pairing changed this to 34%.

Yup, 61%, 34%, same thing.

dsquared,

Even taking Saddam Hussein's knack for killing people into consideration, sending his entire army out to get their brains blown out, using civilian infrastructure, etc. would undoubtedly given a higher mortality rate during a war than even at "peace" time with Saddam Hussein. Let's not forget the insurgency now, eh?

"So if you believe that zero is not in the 95% confidence interval, then you believe this either on the evidence of the Lancet study (which is the only study to have given an estimate of total excess deaths) or on no evidence at all."

I don't believe 0 is in the confidence interval because it is ludicrous to believe that it would be. A war between 150,000 American forces and tens of thousands of Iraqi forces, and a insurgency on top of that, how in the world would that be the same as Saddam's "quiet" 1.5 year period from before the war?

No evidence? Well, just relying on a little thing called logic, is all. Oh, and plus the UNDP study has been conducted, which you have seemingly ignored here.

"You are also guilty of two fairly fundamental misrepresentations of the study's results. Since soldiers in the Iraqi Army would not have been part of households during the immediate pre-war period, their casualties are unlikely to have been a material contributor to the excess deaths estimate. And you are also wrong in your implied claim that the excess violent deaths are concentrated in time around the months of March and April 2003; there is a chart which demonstrates that they are not."

Huh? The soldiers would not have been part of the households in the preceding 18 months? Uh, yes they would have. I don't think even the Lancet study says anything like that.

I never claimed that the deaths were concentrated around the months of March and April 2003. Not sure where you pulled that out from...

"Meanwhile, I am glad to see that you have retreated from your claim that the grouping process was "like grouping Texas with California" to "like grouping Texas with Arizona". You are still a fair ways off in your analogy though, since Texas is not geographically contiguous with Arizona. Perhaps the next stage of your analogy ought to be "like grouping New Mexico with Arizona" or "like grouping Wyoming with Montana", which would rather make it clear how weak an argument you have here."

The reason I was using Texas and California is because they comprise almost 26% of the US population, the same as was excluded from the Lancet sample.

Yes, it might be better to use, in the case of Missan and Basrah, Montana vs. Wyoming. However, that is misleading because we know a lot more about those two states than we do about Basrah and Missan. Not only that, but the violence between these two could have been very different, which the Montana vs. Wyoming comparison doesn't stand up to. Also, there's a difference between violence during regular times, and violence during war. As we saw with Fallujah, one small area can get massacred, while other comparable areas didn't.

"by the way, this assertion:

they gave all 3 clusters to one of the two provinces, in effect the same as only having one cluster with 90 households in it

is wrong, isn't it?"

No. Why? Isn't distributing 3 clusters with 30 households with a single trial the same as distributing 1 cluster with 90 households? It is.

"Bla bla...

is also wrong for most sensible assumptions about within-cluster and between-cluster variance, isn't it?"

It would have been more precise than what the Lancet study came up with. Lancet oversampled 6 provinces with their pairing process. If you just had fewer clusters to distribute with larger sample sizes, you would not be oversampling any of the provinces, and your result would have been more precise than what the Lancet study did.

"Be clear here; you are accusing a team of eminent scientists of either incompetence or dishonesty here. If proved, this would be enough to wreck a career, which is why accusations of this sort ought not to be made anonymously."

They did what they did to reduce the places they needed to travel to. Given the circumstances, it can be defended to have compromised the precision of the study in order to do this, even though again, this gets murky when they quite purposely went to Fallujah.

Dishonesty? Well, the Lancet journal lied on their website about the results of the study. Can you answer me why they did that?

Seixon, I asked about the name of your instructor because I think he or she would have been interested in your use of statistics. But why don't you contact him or her yourself and find out what he or she thinks of your use of statistics. OR perhaps you realize that there is something wrong with the stuff you have written.

One cluster of size 90 is not the same as three clusters of size 30. If you believe that, then presumably 90 clusters of size 1 is also same, right?

And if Missan wasn't paired up it's chance of getting a cluster would have been 100%.

Let's try again. A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

And if Missan wasn't paired up it's chance of getting a cluster would have been 100%.

Tim, not for the first time, you have underestimated Seixon's propensity to muddle. He is trying to calculate Missan's probability at the outset of getting one or more clusters with and without pairing. Needless to say, he gets part of the answer wrong.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

Lambert,

I seriously doubt my teacher would give a damn about what I was doing with statistics, and quite frankly, wasting his time because you won't be candid seems a bit frivolous. Thus, I will not waste his time with this, as I am quite sure he has enough to do as it is.

"One cluster of size 90 is not the same as three clusters of size 30. If you believe that, then presumably 90 clusters of size 1 is also same, right?"

I guess I wasn't clear...

When you clump 3 clusters of 30 households and distribute them together, you get almost the same result as you would by having 1 cluster with 90 households. The similarity, as I explained, is that one province gets all of the households. Of course, within the province that wins the 3 clusters, those 3 clusters will be distributed differently than the 1 cluster. Yet with such a small geographical area, this will not have such a profound difference.

The similarity was with how the clusters and the households in them were distributed. With both 1x90 and 3x30-clumped, all of the households would be in one of the two provinces. This would not be the case with genuine cluster sampling.

"And if Missan wasn't paired up it's chance of getting a cluster would have been 100%."

During the initial cluster sampling, Missan had, as I calculated, a 61% chance of being sampled. This after distributing 33 clusters via cluster sampling, using PPS SRS. Do you want to explain how its chance would have been 100%?

Kevin,

"Let's try again. A household in a governorate not selected for pairing had the same chance of being chosen as a household in a paired one."

Ah, so the households' chances have no relation to the province's chances? How do you figure that?

If Missan doesn't get sampled, neither does its households. I don't see how you are sidestepping that fact.

"Tim, not for the first time, you have underestimated Seixon's propensity to muddle. He is trying to calculate Missan's probability at the outset of getting one or more clusters with and without pairing. Needless to say, he gets part of the answer wrong."

Yes, at the outset. What else are we supposed to be calculating? I am not muddling at all, you guys are muddling what I am saying and sidestepping things I am saying. How did I get part of the answer wrong? Geez. This is like talking to a wall that says, "you are always wrong," on it.

Seixon: How did I get part of the answer wrong?

I left that as an exercise. You will never get the hang of it unless you do your homework. (Hint: Missan's population is 34% of the combined population of Basrah and Missan.)

Ah, so the households' chances have no relation to the province's chances?

There is a relation but it is more subtle than you think. For a clue, see my comment (number 158) in the earlier thread:

http://scienceblogs.com/deltoid/2005/09/lancet40.phpall-comments/#comme…

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

dsquared,

Strike my comment about the Lancet study not saying something about military perhaps not being accounted for. I see it says that the deceased had to have been living in the household at the time of death, and must have been living there for 2 months up until that point. Now I don't know anything about how the military worked in Iraq, so I can't know whether or not military deaths were therefore accounted for.

Kevin,

"There is a relation but it is more subtle than you think. For a clue, see my comment (number 158) in the earlier thread:"

Yes, here you are pretending that the probability of Missan not winning the clusters at all is irrelevant. As I said, if you operate with a GIVEN that Missan has won the clusters, then and only then do the households have an equal chance of being selected in either case.

Then the prickly fact remains that with the Lancet method, Missan had a 66% chance of ending up with zero, and thus no households sampled, and thus the households would have 0% chance of being sampled. Without the pairing, this figure would have been 39% (or according to Lambert, 0%).

There would be no reason to conduct cluster sampling between Basrah and Missan, as the result would be virtually the same as in the original cluster sampling.

You're just sticking to your already backfired guns Kevin. Give it up.

Seixon, do you understand that when people say that pairing done as the Lancet study did doesn't change the expectation value but does increase the variance, that in plain English it means that there's a chance the bloodier of the two provinces will get all the clusters and sends the calculated death rate upwards, or that the more peaceful province gets all the clusters and sends the calculated death rate downwards? That's what Tim's simulation showed and furthermore, I did the same thing in a simple example where I assumed that the survey actually counts all the deaths in the chosen province and then extrapolates that number to include both. If the provinces are different, they'll get a result that is too high or too low, but the expectation value is equal to the true value. If a million different groups did the Lancet survey exactly as they did on the same days, the average value of those million groups would probably be right on top of the true value of the death toll, but the individual surveys are going to get results more scattered than if they hadn't engaged in pairing. So the study isn't biased--it's just going to be less accurate than a survey without pairing. Is there something wrong with what I just said?

The interesting point is whether, by chance or if you prefer, devious liberal plotting, the Lancet team happened to survey bloodier-than-average places. You were trying to show that with troop casualty figures, I think, and someone else disagreed with your figures. I have no idea who is right, but that'd be a much better use of energy than what you've been doing with the sloppy use of the term "biased".

By Donald Johnson (not verified) on 04 Oct 2005 #permalink

On a non-Seixon related topic, I read part of the Roberts article that Tim linked and aside from the other estimates of Iraq casualties, there was also a statement that said (paraphrasing from memory) that sensitivity analysis shows that the Iraq Body Count coverage is 20 percent. Anyone know anything about sensitivity analysis and how you'd arrive at a figure in Iraq? I assume it means the fraction of the dead that are likely to be counted by IBC. I was trying to do something like this comparing the UNDP death toll for children in the first year with the IBC death toll for children in two years, but I don't know the error bar on the UNDP number. Taken at face value, though, the UN survey found about 3000 deaths in people under 18 in their time frame, a little over a year, and IBC found about 1300 in the first two years, so IBC is finding much less than half. If you just assumed most of the UN death toll is composed of civilians and compared it to IBC, the ratio is about 2 to 1, I think, but of course we don't know what fraction of the UN numbers are civilians.

By Donald Johnson (not verified) on 04 Oct 2005 #permalink

Seixon:

Now I don't know anything about how the military worked in Iraq

Do you think it is likely that it would have worked in a way which had its soldiers at home living with their families during the two months leading up to the war?

Donald:

I think that Roberts is probably referring to a study that's mentioned in the closing paras. of the Lancet study in which a passive reporting system similar to IBC counted roughly a seventh of the deaths later established.

Yes, here you are pretending that the probability of Missan not winning the clusters at all is irrelevant.

No, I am not. As I explained previously, when calculating the probability that any particular household in Missan is sampled, we can make use of the fact that if Missan gets no clusters then the probability that the said household will be sampled is precisely zero. So the probabilities which have to be summed are: Probability that Missan gets 1 cluster and said household is one of the 3,262 households surveyed) + (Probability that Missan gets 2 clusters and said household is one of the 1,631 households surveyed) + (Probability that Missan gets 3 clusters and said household is one of the 1,087 households surveyed) + ...etc., up to a maximum of 33 clusters. The thing to notice is that as the number of clusters increases, the probability of the household being surveyed also increases.

For nifty ways of calculating these sums, see any introductory textbook.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

As I understood it, calculating the expected value depends on a binomial distribution. The Lancet pairing does not provide a binomial distribution, as it is equal to one trial, whereas SRS would be equal to 3 trials.

Donald,

All that was a waste of space. I have asked anyone here to show me that clumping clusters is accepted statistical methodology, and so far Lambert and the crew have come up with Null.

Kevin,

Does your nifty textbook also talk about clumping clusters? Or is Tim Lambert the only one who has such a textbook? I'm seriously getting tired of waiting for verification from statistical literature that I am wrong about the cluster clumping being unsupported methodology.

Also, your leaning back on the expected value thing is really getting boring.

66% vs. 39%. That's all I really have to say, and you haven't had anything that defeats that single fact. (I'm wondering if Mr. Lambert is going to explain how it is really 0% instead of 39%...)

66% vs. 39%. That's all I really have to say....

Is 39% supposed to be a correction of one of the figures you gave in comment number 39? When it popped up in comment number 46 it looked like a typo. Lest you think it corrects the mistake I referred to, it doesn't.

Incidentally, you don't have to refer to a textbook. If you do it on a spreadsheet you will get the same result: A household in a governorate not selected for pairing has the same chance of being chosen as a household in a paired one.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

Kevin,

For Missan, with the pairing, there's a 66% chance that none of Missan's households will be sampled, no matter how many clusters there are. Regardless of anything, Missan's households have a 66% chance of being sampled with the pairing process.

Now, without the pairing, Missan's households have a chance of being sampled if Missan gets one or more clusters. The chance of Missan getting 1 or more clusters is 61%. That is using a binomial distribution with p=2.8% (Missan's PPS value) and with 33 trials. Which means that Missan has a 39% chance that it will not be sampled at all.

Now, when Missan gets paired, regardless of how many clusters the pair gets, Missan's households will have a 66% chance of not being sampled.

In comparison, Basrah had a 16% chance of exclusion from a genuine random cluster sampling, as was done initially. Then, because of the pairing, it had a 34% chance of exclusion, regardless of any number of clusters or anything else.

For Missan, with the pairing, there's a 66% chance that none of Missan's households will be sampled, no matter how many clusters there are.

Here you refer to the situation after the initial allocation of clusters and before it is determined whether Basrah or Missan will get them.

Now, without the pairing.... Missan has a 39% chance that it will not be sampled at all.

Here you refer to the situation before the initial allocation of clusters. Do you see why Tim Lambert misunderstood what you were trying to do earlier? You jump from one stage of the process to the other.

However, your main problem is that you continue to focus on the probability of Missan getting sampled. That's part of the calculation of course. What you need to move on to is the probability of a Missan household getting sampled. Your best course is to do it on a spreadsheet. If you do it right the probability will be the same for the two sampling methods.

By Kevin Donoghue (not verified) on 04 Oct 2005 #permalink

Kevin,

"Here you refer to the situation after the initial allocation of clusters and before it is determined whether Basrah or Missan will get them."

Uh, it doesn't matter how many they get, kiddo. The probability is tied to their populations, no matter what, Missan will have a 66% chance of exclusion.

"However, your main problem is that you continue to focus on the probability of Missan getting sampled. That's part of the calculation of course. What you need to move on to is the probability of a Missan household getting sampled. Your best course is to do it on a spreadsheet. If you do it right the probability will be the same for the two sampling methods."

Yes, if Missan isn't sampled, neither are its households. Right?

We have to compare the initial sampling with the grouping process. Just forget everything about conducting SRS in the 2nd phase. That would be meaningless to do anyways.

The fact remains that Lancet's 2nd phase is not random sampling. Random sampling entails distributing each element independently, or each cluster independently.

Now if you, or Lambert, or anyone wants to show me ANYTHING that indicates that clumping clusters together in a 2nd phase into a single cluster, and then breaking them off again once they are distributed as a block, is supported, scientific, and good statistical methodology, please, for the love of human intelligence, show it to me.

Tim claimed this clumping action was cluster sampling - bzzzt, wrong.

Tim claimed this clumping action was multistage clustering - bzzzt, wrong.

Come on Tim, put up or concede.

Kevin,
I have a different probability question for you. If Seixon Googled "Total Probability Theorem" what would be the probability that he figures out what you're talking about?

The intial assignment of clusters to governorates was not done by simple random sampling as Seixon believes. I quote:

>We assigned 33 clusters to Governorates via systematic equal-step sampling from a randomly selected start. By this design,
every cluster represents about 1/33 of the country, or 739 000 people, and is exchangeable with the others for analysis.

Since Missan has 685,000 people its chance of getting a cluster in the initial assignment was 685/739=93%. (In an earlier comment I incorrectly said that it was 100% because I thought Missan had more than 739,000 people.) Note that the probability of Missan being sampled is different from what you get with SRS. But the expected number of clusters in Missan and hence the probability of a household in Missan being sampled is the same whichever scheme you use.

Seixon, I've given you a reference on clustered sampling. Your failure to understand it is not my fault.

Sorry to interrupt all the good fun you're having beating up Seixon, but can I point out two things:

1. The Lancet 2-stage sample design, with 12 provinces selected in the first stage, makes it almost impossible to accurately calculate the correct standard errors. That's because the methods for calculating SEs in a complex random sample depend on having a "large" sample. And 12 provinces doesn't count as a large sample.

In fact, that's probably the reason why the standard methodology for these types of surveys uses 30 clusters: 30 is the smallest number that can be considered "large" in this context.

2. The Lancet article makes it fairly clear that the authors did not take into account the two-stage sample design when calculating the SEs. For example, the software they use can't estimate SEs for 2-stage sample designs.

I can't be completely sure exactly what they did to calculate the SEs, but I'm reasonably confident that the Lancet SEs are based on ignoring the first stage province sampling, and treating the design as simply 33 clusters with 30 observations in each one.

In the end, getting the SEs exactly right isn't the be all and end all (compared to reducing bias). So maybe it's better to bash Seixon. You're innumerate Seixon! You still haven't posted your transcript Seixon!

Ragout, you seem to be implying that the criticism of Seixon is somehow unjustified. Do you agree that his claim that the two-stage design produces bias?

I'm open to persuasion on the question of whether they accounted for the pairing when they calculated the CIs. It certainly seems unlikely to make much difference if they treated it as a one stage design when boot-strapping.

Lambert,

Missan was not the first province on the list, thus it would not have the probability you say.

Again, you just operate with some assumption so that what you say will sound correct. Regardless of that, the provinces had the same proportional chance of receiving a cluster, as that process emulates SRS with PPS.

I love how you claim I don't understand the link you gave me on cluster sampling. Your arrogance reeks all the way across the world. So does your dishonesty. Not once in that paper did it say anything about clumping clusters, in fact, it talked almost exclusively about using SRS to distribute clusters.

But hey, what do I know? I'm illiterate! Just another addition to the "adjectives to use for Seixon" list.

The two-stage design introduces a bias because of the way it was done. The provinces chosen were not randomly chosen, and the pairings were also not randomly chosen. The pairings weren't chosen due to any rationale other than an arbitrary one. That no randomy a samply make.

Now I have whipped up a more honest simulation of a cluster sample vs. the Lancet sample.

Unlike Lambert, I do not think it realistic to think that a sample of mortality in a certain region will vary from 1-13 or 1-6. Instead of these unrealistic measures, I used the coalition death rates from Basrah and Missan. Using these as a mean in a normal distribution, I conducted 250 trials.

For Basrah, the random normal distribution was with u=27, sd=2. For Missan, u=13, sd=1.

Here are the results of this more realistic simulation:

As you can see, wildly different from Tim's dishonest smokescreen "simulation". The mean does not matter as much as the median, because we are looking at what result we will be getting most often. This shows that the Lancet method is vastly different than cluster sampling, and why there are serious problems with doing things in this manner.

This simulates the problem before us much better because it is implausible that the mortality rates in the provinces will vary as much as 1-13 or 1-6.

Yet again Lambert, instead of explaining or proving your case, you resort to arguing from authority. I mean damn, you can't even find a single paragraph or a page that says that clumping of clusters is an accepted methodology? That's all it would take. I've been waiting over a week for it, tick-tock.

Me's a starting to a think that it's not a going to be appearing....

This is amazing.

Seixen: A bias is when some part of the population is more or less likely to be chosen than the rest.

Then there was no bias... because the probability of any household being selected was equal. Unless you are suggesting fraud. And if you are suggesting fraud, then there is still no sampling bias. So admit your mistake.

Seixen: Whether this produces a higher or lower mortality rate doesn't matter

Well then there is no bias in the expected value... only a potential increase in variance. Which is exactly what everybody has been trying to tell you.

Kevin, I admire your restraint and patience!

Seixen: "Uh, it doesn't matter how many they get, kiddo. The probability is tied to their populations, no matter what, Missan will have a 66% chance of exclusion."

But what matters is each households chance of exclusion. That is a function of the province's chance of excusion, and the households chance of exclusion from within the province. And the chance of each Iraqi households exclusion is identical (and equal to 1 minus their chance of inclusion, funnily enough).

Unless you're suggesting a fraudulant survey, in which case -- there is still no sampling bias.

In order to demonstrate bias, you have to be able to point out exactly where it comes in. I.e. "in this step, a province with the higher death rate has a higher chance of getting selected than the province with the lower death rate". I don't see it. Yes, I think we're all convinced that the error estimates are going to be off; but it's not been demonstrated to be asymmetrical.

Lambert,

No of course I don't agree with Seixon that the Lancet's multi-stage sampling scheme causes bias. But hasn't that point been kind of beaten to death?

And you're ignoring Seixon's potentially valid point: that the particular sample drawn in the Lancet study seems to have been more violent than average? Admittedly, someone (BruceR?) seems to have refuted this point.

Seixon,

Are you aware that almost every survey that involves a personal visit has a multi-stage sampling scheme much like the Lancet's? Here's one example.

In your example, what precisely is wrong with the "Lancet cluster grouping"? It's true that the estimate from this sampling scheme will never be right on the nose: sometimes it will be too high, and sometimes it will be to low. It's also true that it's right on average (unbiased). Also, the estimate would get a lot better if you had a few more provinces (say, 12 as in Lancet).

It's simple to see where asymmetry could have been introduced.

The goal of pairing regions was to reduce travel time. So they picked distant regions and paired them with neighbors that to the author's best estimation (see? bias need not be fraudulent )were similar in levels of violence. However, levels of violence relates directly to the parameter of interest. We know that distant regions were less violent that proximal regions. We know that the distant regions were more often paired with more violent regions.

It doesn't matter that the clusters were assigned randomly between one of two paired regions because it is the choice of paired compared to unpaired where the bias occurs.

Perhaps it's easier to understand when this is turned around. Let's say the authors wanted to increase travel time. The best way to do this would be to find some of the regions with main highway arteries and major cities, pair them up with "similar" neighboring regions, randomly distribute the clusters between regions in a pair and, with some probability, several of the most urban regions are left out of the study while all of the most distant regions are left in the study.

I must be missing something because this just seems too obvious.

Seixon, when you did your simulations I'm sure you discovered that no matter what distribution you used (even the ridiculous ones you used) the mean was the same. Your justification for this is absurd:

>The mean does not matter as much as the median, because we are looking at what result we will be getting most often.

The mode is the result that you get most often, not the median. And neither is relevant since the Lancet was reporting the mean.

Also, this is wrong:

>Missan was not the first province on the list, thus it would not have the probability you say.

The ordering of the provinces on the list does not make a difference to their chance of being sampled. If it did, the process would be biased.

The Lancet article makes it fairly clear that the authors did not take into account the two-stage sample design when calculating the SEs. For example, the software they use can't estimate SEs for 2-stage sample designs.

Ragout,

Which software are you referring to here? They mention three: Mark Myatt's, EpiInfo and STATA. I'm not familiar with any of them, nor with bootstrapping; my stats are pretty old-fashioned. Given that the CIs were bootstrapped, do you reckon it matters? (I'm not implying it doesn't, I don't know enough about it to have a view.)

As to the possibility that the Lancet sample just happened to be unusual, I think if any point has been beaten to death it's that one. Heiko Gerhauser (spelling?) specialised in that critique. Everyone seems to have finished up with the same view as they started with. I'm agnostic about it. The sensible thing to do is combine the Lancet figures with other sources and cobble together a guess.

By Kevin Donoghue (not verified) on 05 Oct 2005 #permalink

Tim,

Correct, I mixed up mode and median. Median is still a better indicator than the mean in this instance. The mean was off by about 1 between the two graphs I gave.

Also, because of using a random start, Missan will not have the probability you speak of at all. It depends on what number they come up with for the random start.

Also, as long as you chose the order of the list randomly, there will be no bias according to which province is first on the list.

I see you still haven't sourced the allegation that the Lancet methodology is supported.... zzzzzz....

Ragout,

That study you linked to isn't even similar to the Lancet one at all. The study you linked to conducted sampling according to the norm. There was no cluster clumping in that study. So again, can anyone show a study that used cluster clumping, can anyone show that this is supported statistical methodology??

True, the study, if conducted hundreds of times, would eventually average out in result. How unfortunate, then, that the Lancet study was only conducted ONCE! It doesn't matter that if you conducted the Lancet study 1000 times that the result will average out. We are talking about a one time thing there. As you can see from my simulation, Lancet's result will either be too high or too low in the paired provinces (when they have different violence levels). With real cluster sampling, the result will always be within the right range. That's what's important here.

I'm taken back that you, Ragout, cannot see the vast differences in methodology between the Lancet study and the one you linked to. It is like night and day.

z,

I did point out where this happened. It happened when they decided to pick 12 provinces arbitrarily, this made those provinces have different chances than those who were left alone. On top of that, each pair was also biased by the fact that they clumped the clusters. This biased it towards the more populous of the two. This would not have been the case if they had used SRS for distributing the clusters in the 2nd phase. As you could see from the study, 4 of the 5 pairs that were unequal in population ended up giving ALL the clusters to the most populous. As I have pointed out numerous times, this type of methodology is nonexistant and I have yet to receive any evidence from Lambert or anyone else that it is valid to do things in such a way.

eudoxis,

Thanks for nailing it right on the head. The denial "in the room" is astounding. I can't even get a single source to show that the methodology is valid! I've been waiting for almost 2 weeks now....

The software I was referring to is EpiInfo, which allows a single clustering variable. They also mention using some specialized package "written for Save the Children." I don't know what that is, but it seems unlikely that it would allow for a very complex design.

I'm not that familiar with bootstrapping for survey SEs either. But as I understand it, the bootstrapping is supposed to replicate the original sampling scheme. So, I think they resampled clusters rather than individual observations. That is, they repeatedly drew a random sample of 33 clusters from their original dataset of 33 clusters (with replacement). They could have done something much more complex with bootstrapping, of course, but they present it like they're just replicating the design they specified in EpiInfo.

I think the SEs will be underestimated from this bootstrapping procedure (or other methods that ignore the sampling of provinces) if the paired provinces have population means that are different. So it depends on how well they were able to pair similar provinces.

If Seixon is right that the pairings were poor, then the reported SEs could be underestimated substantially. Seixon isn't that persuasive, so I think Lambert is probably right to guess that the pairings don't affect the SEs that much.

The median or the mode is the best indicator to use in my simulation, as Wikipedia summarizes:

The median is primarily used for skewed distributions, which it represents more accurately than the arithmetic mean. Consider the set { 1, 2, 2, 2, 3, 9 }. The median is 2 in this case, as is the mode, and it might be seen as a better indication of central tendency than the arithmetic mean of 3.166.

Calculation of medians is a popular technique in summary statistics and summarizing statistical data, since it is simple to understand and easy to calculate, while also giving a measure that is more robust in the presence of outlier values than is the mean.

You sure that the mean is the best figure to use Tim? I think you're being intellectually dishonest, again. In non-skewed distributions, the median and mean will be very similar, as is evidenced by my cluster example: mean = 21.65, median = 21.78. In the Lancet cluster grouping example, mean = 22.42, median = 26.

Now, I seem to recall from way back in the day that if the median was more than the mean, the distribution was skewed upwards, while if it was lower, the distribution was skewed downwards.

Correct? So since the Lancet cluster grouping is skewed upwards, it would be faulty to compare that with the other according to the mean.

Feel free to correct me, I'm just recalling what I was taught back in high school. Just seems that Wikipedia is in agreement with my methods.

Seixon: A simple google search of (mean median skewed) would have corrected you. But since it appears that you need help, see:

http://www.amstat.org/publications/jse/v13n2/vonhippel.html

Many textbooks teach a rule of thumb stating that the mean is right of the median under right skew, and left of the median under left skew. This rule fails with surprising frequency. It can fail in multimodal distributions, or in distributions where one tail is long but the other is heavy. Most commonly, though, the rule fails in discrete distributions where the areas to the left and right of the median are not equal. Such distributions not only contradict the textbook relationship between mean, median, and skew, they also contradict the textbook interpretation of the median.

By Chris Jarrett (not verified) on 05 Oct 2005 #permalink

Eudoxis, you could deliberately pair provinces with wildly different death rates and the expected value still wouldn't be affected. The variance would go up, because when you made your choice you'd have picked a province which was either peaceful or exceptionally violent and so your calculated death toll for the sum of the two would be too high or too low.

By Donald Johnson (not verified) on 05 Oct 2005 #permalink

Tim Lambert: The intial assignment of clusters to governorates was not done by simple random sampling as Seixon believes.

Thanks for that. I must admit I had been taking Seixon's word for it instead of referring back to the study. That was foolish of me. My excuse is that I'm not the guy who is accusing a distinguished group of researchers of using a procedure which generates a biased sample, then sneaking it past the referees of a prominent journal. What's Seixon's excuse? Where does that leave his critique? Are we now asked to believe that the sampling was biased anyway, even though it was not actually done in the way he thought it was done? (Clearly there would have been no bias even if it had been done the way we both supposed, but that's neither here nor there at this stage.)

By Kevin Donoghue (not verified) on 05 Oct 2005 #permalink

Chris,

In my haste, it seems I got it backwards. That still doesn't mean that using the median isn't the better choice in this example. Clearly with the Lancet distribution, the median gives us a better indication of the typical result, and not the mean. In fact, if you put the mean as a horizontal on that graph, it won't virtualy touch a single result. Now how's that for a typical result?

Donald,

"Eudoxis, you could deliberately pair provinces with wildly different death rates and the expected value still wouldn't be affected. The variance would go up, because when you made your choice you'd have picked a province which was either peaceful or exceptionally violent and so your calculated death toll for the sum of the two would be too high or too low."

Quite correct sir, except there's only one problem: the pairings were not random, and neither were the choice of provinces to be paired. If they were, there would be no bias. There would just be a horrifying lack of precision, as the Lancet study demonstrates quite well with its gigantic CI. You're leaving out the facts Donald.

Lambert,

How do you know that the Lancet study used the mean? It doesn't say so. Regardless of that, in the case I was demonstrating, the median would be a better choice to show what result was typical. How you doing on finding documentation for cluster-clumping?

Kevin,

"What's Seixon's excuse? Where does that leave his critique? Are we now asked to believe that the sampling was biased anyway, even though it was not actually done in the way he thought it was done?"

The clusters were picked via SRS, just in a different way. Not only that, the method they used is heavily accepted and documented as statistical methodology. There is nothing wrong with how they distributed the clusters in the initial sampling. Claiming that it was not SRS is very misleading. I suggest you guys read up on a simple guide over at the Center for Disease Control website that details how to use this method.

SRS entails distributing each cluster independently with a random trail each. That was done. Lambert, you are starting to fall completely off the wagon now.

Claiming that it was not SRS is very misleading.

Seixon,

Well, how would you describe your claim that "Missan has a 39% chance that it will not be sampled at all" - extremely misleading, perhaps?

Even if you had been right about the method used, that would still leave you with the wrong answer for the Lancet approach: your 66% figure for Basrah and 34% for Missan only makes sense if one or other of them gets at least one cluster in the initial allocation. With SRS, which is what you were assuming when you calculated those figures, there is no guarantee of that.

You simply haven't thought this thing through as you should have done before publishing your critique. You are making up your case as you go along and tossing a baby off the sled whenever the need arises.

By Kevin Donoghue (not verified) on 05 Oct 2005 #permalink

No Seixon, even if you pair wildly different provinces with wildly different death rates there's no bias if you choose which one to give the clusters to based on using the procedure outlined in the Lancet paper, where the chance of province A getting all the clusters is proportional to its population. It'd be dumb to pair wildly different provinces, because it decreases the precision (by increasing the variance), but the expected value wouldn't be changed and it wouldn't be a biased sample in the technical sense.

By Donald Johnson (not verified) on 05 Oct 2005 #permalink

A question for Seixon (don't ask me why I bother, I really don't know): you obviously attach great importance to the fact that the survey's pairing method changes the probability that Missan will receive zero clusters. Even a cursory look through these threads will show that you consider this a very telling point. Yet until very recently you were under the impression that, without pairing, Missan would have a 39% of getting zero clusters.

Yet, when Tim Lambert informed us that "since Missan has 685,000 people its chance of getting a cluster in the initial assignment was 685/739=93%", you were quite unmoved by this. You wrote: "There is nothing wrong with how they distributed the clusters in the initial sampling."

So, without pairing, 39% is fine by you and 7% is also fine by you. Yet, because pairing changes the odds, pairing is reprehensible. You are quite unmoved by the argument that any particular Missan household has the same probability of being selected under any of the systems being considered (which is what matters to the rest of us). When pairing comes into it, and only then, Missan's probability of getting zero clusters is important to you. How come? If that probability matters when we pair governorates, surely it should also matter when we don't?

By Kevin Donoghue (not verified) on 05 Oct 2005 #permalink

Kevin,

How was 39% misleading? That was according to a binomial distribution with 33 trials with p=2.8%. Of course with the equal-step method used, this changes slightly, but I'm not sure how that would be calculated, especially since the list is made randomly, and a random start is used.

Moving on to the pairings, I said that Missan's households had a 66% chance of exclusion. This does not depend on the number of clusters that either Missan or Basrah gets. That probability is entirely dependent on the populations of Missan and Basra, and nothing else. Even if Missan got 3 clusters, and Basra got 0, then it would still be the same. Of course, they would not have paired up Basra if it got 0, so I think you can start to see where the bias comes in...

Yet until very recently you were under the impression that, without pairing, Missan would have a 39% of getting zero clusters.

Yet, when Tim Lambert informed us that "since Missan has 685,000 people its chance of getting a cluster in the initial assignment was 685/739=93%", you were quite unmoved by this. You wrote: "There is nothing wrong with how they distributed the clusters in the initial sampling."

Well, Lambert was wrong because he made an unsupported assumption, thus I was unmoved. Not only that, but the methodology they used for the initial sampling is accepted and widely used. You should have read the walk-through over at the CDC website. Or are you and Lambert now proposing that the Lancet study suddenly got this wrong? lol

Lambert did bring up a good point though, that the probability of initial exclusion would not be exactly 39%. Well, if Missan was first on the list, then it would be as Lambert said, 7%. However, the probability that Missan would be first on the list is 1/18. As for the probability of it getting a cluster on other places on the list, well frankly, I think that's going to be a bit too hard to calculate. So basically, generally, it would be 39%. If you want to calculate all the probabilities depending on it being 1st, 2nd, 3rd, and so forth on the list, be my guest.

When pairing comes into it, and only then, Missan's probability of getting zero clusters is important to you. How come? If that probability matters when we pair governorates, surely it should also matter when we don't?

Hehe. Maybe because all the provinces had that exact same probability (PPS) in the initial sampling, and not just Missan? Then we get to the pairing, and then its only some of the provinces that have that probability changed, including Missan.

Take your own advice, think things through before writing.

Donald,

"No Seixon, even if you pair wildly different provinces with wildly different death rates there's no bias if you choose which one to give the clusters to based on using the procedure outlined in the Lancet paper, where the chance of province A getting all the clusters is proportional to its population. It'd be dumb to pair wildly different provinces, because it decreases the precision (by increasing the variance), but the expected value wouldn't be changed and it wouldn't be a biased sample in the technical sense."

First, due to the one-off methodology, you are automatically biasing it towards the more populous of the two pairs. If it wasn't a one-off methodology, then this would not be true.

Secondly, due to the fact that the pairings were not done randomly, this also represents a bias due to the choice being made by the author of the study for their own reasons, instead of randomly. That's what bias is all about baby.

You continue to overlook these two facts again, and again, and again.

What you're saying would be true if:
a. the pairings were done randomly
b. the clusters were not distributed with an unsupported methodology

Seixon, the reason I keep saying this is that I convinced myself of it sometime early in the previous thread. I divided Iraq into three imaginary provinces with three different populations and three different numbers of deaths and then, to keep it simple, assumed that one counted all the deaths in whichever provinces one chose to sample. I arbitrarily said province C would be counted and provinces A and B would be paired, with the probability that province A (or B) having its deaths counted would be proportional to its population. The winning province then has its death toll extrapolated to cover both provinces A and B. It sounds like you're calling it a bias to pick the "winning" province by weighting its chances by its population, but it isn't, because the resulting "expectation value" is exactly equal to the true death toll. Of course the actual calculated death toll will be too high or too low if the provinces have different death rates, which is why you should pair similar provinces if possible, but the expected value comes out equaling the true death toll. It's all very simple algebra and the results are still valid even if it's true (which it might well be) that the more populous provinces suffer disproportionately more fighting and more disease and so forth, because in my model the death tolls were simply DA, DB, and DC, with no assumptions whatsoever on how they might depend on the populations of my three imaginary provinces, and the expected value of my procedure came out equaling DA + DB + DC . It took about two lines of algebra. Of course the real procedure doesn't involve making perfect counts of the death tolls in the selected provinces, but so long as the real procedure is an unbiased sample of the people within the selected provinces, the expected value for each province would equal the true value and my little analogy would still be valid. At any rate, pairing provinces wouldn't be a source of bias, just a source of imprecision.

I think you'd do better for your anti-Lancet case if you'd focus on trying to show that by sheer bad luck (or malevolent liberal plotting or whatever), the Lancet team ended up sampling places that had higher than average death rates. That's quite possible no matter how valid the statistical methodology might have been and how honestly it was carried out--Tim's computer simulations show a lot of dots that are way above the average. You and Bruce R (or somebody--I forget who) were having a disagreement about what your military casualty data showed--whoever is right, that's a more relevant way to approach the problem, I think.

By Donald Johnson (not verified) on 05 Oct 2005 #permalink

Donald,

The expected value is a red herring that Kevin and others have latched onto at my blog to obfuscate the issue at hand here. An expected value is only relevant if you are determining an event that will be reoccurring. Like, what's the expected value of flipping a coin 6 times. You don't calculate an expected value for flipping the coin a single time.

The Lancet study flipped the coin a single time in each of the pairs, so whatever expected value with all your algebra is completely irrelevant since it is only happening one time. What matters then is the probability of that one event in each pairing.

You can see this quite clearly with the cluster grouping graph I created. It's either going to be high, or low. The "expected value" you are talking about is more like the mean, which doesn't appear on that graph at all.

You calculate expected values for a series of random events. Not a single random event. Right?

So why are you guys talking about expected values?

Thus, because it was a single random event, it biased the pairings because of the populations in each. If each cluster was given its own random event, then this would not be the case, because then the smaller province would truly have a proportional chance to get clusters.

That is why Lambert and everyone else have been unable to show even a single piece of evidence that pairing in such a manner is statistically sound. It's not.

Think about it this way...

We are going to distribute 30 clusters, to the 18 provinces in Iraq. However, instead of distributing them with SRS or an equivalent method, we only have two random events, distributing 15 clusters in each. Now, where do you think these are going to end up?

Yesterday I recreated the Lancet study, using the equal-step sampling method with a random start, as is so gracefully outlined over at the Centers for Disease Control website I referred to earlier.

Now, since we are doing a 15x2 distribution, I would conduct two random trials, each consisting of drawing a random number between 1 and 24,393,000. Now which provinces are most likely to land these two? Baghdad (21%) and Ninawa (9.6%). Don't you think that this biases the sample towards these provinces?

It does, because all of the clusters are suddenly gone with two drawings. You can just throw those expected values right out the window. Why?

With the Basra-Missan example, you're always going to end up with either 0 or 3. No matter what. 0 or 3. Therefore there is no use in defining E(Missan) = 1.02 clusters and E(Basra) = 1.98 clusters. They will never get anywhere near those results. They will get 0 or 3.

With genuine cluster sampling, these clusters will be distributed with SRS or an equivalent method, and then once again our expected values become meaningful. Basra or Missan can get 0, 1, 2, or 3. The expected value is still the same as before, 1.02 for Missan, and 1.98 for Basra. But now, there is an actual chance of Missan getting 1, and Basra getting 2. In fact, if you repeated a genuine cluster sample, you would expect Basra to get 2 most of the time, and Missan 1.

With the Lancet method, it will always be 0 or 3. No matter how many times you repeat the experiment, they will always get 0 or 3, and Basra will end up getting 3 most of the time, while Missan will get 0 most of the time.

The expected value is meaningless with the Lancet method, something I have been trying to drill into certain heads around here for a long time now. Of course, the only way of defeating my point about that is to claim that the expected values mean something.

I think I have just conclusively shown that not to be the case. Which I also did with my graph.

So the Lancet researchers sought to pair "similar" governorates to reduce the variance between clusters and the imprecision in their national estimate. Perfectly understandable. Would anyone care to venture:

(a) whether their results were in the end imprecise;

(b) what this says about the variance of their clusters, paired and unpaired;

(c) what this suggests about the phenomenon under study;

(d) and why precision matters.

The question of bias is rather moot if the result spans a range so wide that the error bars nearly equal it. On the other hand, when the central estimate is asserted to be "conservative" despite the error margin, perhaps there's some value in examining the study for hidden biases.

PS. Though I can respect dsquared's minimalist position of "more than zero excess deaths," this is not how the Lancet survey's results are usually presented by others, its lead author included.

Oh, and something I forgot to address, Donald, which relates to your comment slightly, Nom, is that BruceR's tabulation shows that my conclusion was correct: the military death rates of the coalition proportional to population shows that 5 out of the 6 pairings were in fact not "similar". In fact some of them were wildly different.

BruceR tried to dance around the bush by not mentioning this and instead leaving out Anbar while claiming that the coalition death rates for all of Iraq was similar to the coalition death rates of the excluded provinces. Of course, this only worked out if you exclude Anbar, which was in fact not excluded from the sample. Nor can I really see any reason to be excluding Anbar at all. Must have been because otherwise, he seemed completely wrong and only bolstered my claim.

Which is exactly what he did.

We've been talking about expected value because it's relevant on the subject of bias. The fact that pairing provinces decreases the precision is well understood by everyone and so the really interesting question is whether the provinces picked were (presumably by the luck of the draw) bloodier than average and how one would tell if this is the case. I haven't looked at the data you and Bruce R argued about--sounds too much like work to me. But that seems more like the argument people should be having.
Though that said, whatever its flaws it seems to me that the Lancet estimate of violent deaths isn't that far out of line with the UN survey (given the different time periods), so we can safely say that "several" tens of thousands of people had died violently by Sept. 2004. If the Lancet midrange number is a bit higher I'd go with the UN number for the period it covered since it was a larger survey. What I've always found interesting about the Lancet paper (as I keep mentioning) is more what I'd call the reporting element it contains about what was happening in Fallujah. You can't do much with it statistically, it's just one neighborhood, but along with what they report they saw in Fallujah (vast areas that looked as bad or worse), it suggests that there might have been massive civilian casualties there. I've googled around a bit and in my casual searching I couldn't find a single journalist who interviewed Fallujah refugees about what happened there in the months between the first and second assaults and for that matter, there also aren't that many interviews with Fallujah residents concerning the assaults themselves--there's one leftwing guy (Dahr Jamail) who seems to be by far the most interested. Basically it either hasn't been possible for most of the press to do this or they just aren't interested in that story. I suspect that this is generally been the case with respect to civilian casualties caused by coalition forces--once it became dangerous for Western reporters to roam freely around Iraq (it was safer in the earlier part of the occupation) information about civilian deaths caused by our side was likely to dry up.

By Donald Johnson (not verified) on 05 Oct 2005 #permalink

Donald,

"We've been talking about expected value because it's relevant on the subject of bias. The fact that pairing provinces decreases the precision is well understood by everyone and so the really interesting question is whether the provinces picked were (presumably by the luck of the draw) bloodier than average and how one would tell if this is the case."

The expected value is relevant on the subject of bias? Because...? Actually, altering of probabilities is more to do with bias than expected values.

I'm sorry you didn't get the point in my last comment, which I felt demonstrated it quite nicely. The expected value issue is a red herring. Assigning an expected value to the way the Lancet study did things is meaningless because their methodology would never ever come up with those expected values.

I don't care whether their survey overestimated or underestimated the total, the point for me is that they introduced bias to the sample at least once, if not twice. First by arbitrarily (or out of convenience) picking out 12 provinces that would be exclusion candidates. Then the actual pairing up of those 12 along nonexistant/wrong/fraudulent reasons.

Whether or not they did it with fraudulent intent is irrelevant: the bias is still there.

Yes, tens of thousands of Iraqis have died. I think everyone here is on board with that. I suscribe to the notion that the real number is somewhere around 50,000.

The UNDP study and the Lancet study don't mesh at all. The wildly different rates on baby deaths or whatever it was shows this. I think that the UNDP study probably underestimated the death toll to some degree, but it in no way meshes with the Lancet study.

I find it hilarious that after over a week, Lambert still hasn't shown a single shred of evidence that cluster-clumping is valid methodology.

He claimed that the clumping constituted "multistage clustering" which is ridiculously false.

He claimed that the clumping constituted cluster sampling. Again, false.

He made plots of a supposed simulation of cluster sampling, which he tried to pass off as what the Lancet study. I have shown with my plot that this was a dishonest trick.

He claimed that equal-step sampling is not a simple random sampling process. Again, false.

He claimed that the Lancet study reported the mean of the distribution. Again, unsupported.

He claimed that the order of the list in equal-step distribution did not matter, or else it would be biased. Again, false since the list order is supposed to be produced randomly.

How much more dishonesty is it going to take to convince you people that Lambert is futilely defending a study that he has hooked his entire credibility on for almost a year?

Donald Johnson: "Eudoxis, you could deliberately pair provinces with wildly different death rates and the expected value still wouldn't be affected. "

No, that would just be wrong. Clusters can be paired if they are similar. This can be repeated until, at the end, you end up with just one, but a very imprecise or degraded cluster that won't be useful. But if you only pair some of the clusters, they have to be similar in order to maintain the integrity of the study. And therein lies the problem.

The error introduced by incorrect decisions in pairing is more difficult than the increase in variance caused by correct pairing of clusters.

The expected value is not a red herring. A sampling bias would change the expected value of an survey. The expected value wasn't changed by the survey method, therefore there was no sampling bias.

This is not an opinion, it is a fact. Seixen is denying this fact. Unfortunately, when a person continually denies an obvious fact there are only two reasons and neither are pretty -- dishonesty or stupidity.

Seixon says (my emphasis):

>Lambert did bring up a good point though, that the probability of initial exclusion would not be exactly 39%. Well, if Missan was first on the list, then it would be as Lambert said, 7%. However, the probability that Missan would be first on the list is 1/18. As for the probability of it getting a cluster on other places on the list, well frankly, **I think that's going to be a bit too hard to calculate. So basically, generally, it would be 39%.** If you want to calculate all the probabilities depending on it being 1st, 2nd, 3rd, and so forth on the list, be my guest

I give you Seixon's law of probabilities: If a probability is too hard for Seixon to calculate then its value is 39%.

The answer, by the way, is that the probability is 7% no matter where it is on the list. Which is why you don't need to randomly order things when you do systematic sampling.

[The UNDP study and the Lancet study don't mesh at all. The wildly different rates on baby deaths or whatever it was shows this]

Whatever it was, it wasn't infant mortality.

Pairing wildly different provinces doesn't change the expected value, Eudoxis. It's something to be avoided if possible because of what it does to the variance, but it doesn't change the expectation value. I agree along with everyone that if pairing is to be used, it should be done with provinces that have similar death rates. Here's the algebra--

Two provinces A and B have populations NA and NB and death tolls DA and DB. You sample one of them using the Lancet procedure to choose between them. A has a chance of being sampled of NA/(NA + NB) and B has a chance of NB/(NA + NB). If you pick A, you take whatever death toll you measure there (for simplicity, assume it's the exact value DA) and multiply it by
(NB + NA)/NA.

Then the expected value for the death toll is

NA/(NA + NB)times DA times (NA + NB)/NA +
NB(NA + NB) times DB times (NA + NB)/NB

which equals DA + DB.

Nothing in the algebra changes if the death rates DA/NA and DB/NB are wildly different. What does matter, of course, is that if they are wildly different your two possible answers are also wildly different, one too high and one too low and if the two death rates are the same then your two answers are the same. So it'd be a good idea to pair provinces with similar death rates if you're going to pair them at all.

Anyway, this is why I think people who suspect the Lancet number is too high should concentrate on trying to show that by sheer bad luck (which could happen), they picked out an overabundance of places that were more violent than average. I note that they did pick out a really violent neighborhood in Fallujah, but they also picked out a neighborhood in Sadr City where nobody had died a violent death, so perhaps the statistical flukes went both ways.

By Donald Johnson (not verified) on 06 Oct 2005 #permalink

Seixon "In my haste, it seems I got it backwards."

It seems that that is one of your favorite outs you use when you make a mistake.

You should just simply admit that you're wrong. But you won't. You'll continue to demonstrate to everyone that you don't have a clue.

By Chris Jarrett (not verified) on 06 Oct 2005 #permalink

On the point which Donald Johnson and Eudoxis are discussing (the consequences if the pairings were deliberately chosen to consist of a relatively violent governorate and a relatively peaceful one): it seems to me Donald is right in saying that we should avoid the term sampling bias to describe this sort of thing, since it wouldn't affect a household's probability of being chosen; but Eudoxis is obviously right that it would have implications for "the integrity of the study" (or more bluntly, the integrity of the researchers).

However there is a subtle point which tends to get overlooked here: the theory is a load of crap. If the study team wanted to cook the results in that way, they would presumably pair violent governorates with peaceful ones which have a significantly smaller population. If they are about equally populous, there is a roughly 50:50 chance of drawing the "wrong" one. Looking at the study, we see that one pair, Sulaymaniyah and Arbil, had equal populations (1.1m) while another, Kerbala (1.05m) and Najaf (1.02m), were pretty nearly equal. Not only that, but by creating six pairs Roberts was leaving a high probability that one or more of the smaller governorates would get selected. This of course happened, as Missan (0.69m) got in at the expense of Basrah (1.33m). So if we ignore the two pairs with similar population levels, on the assumption that the team presumably wasn't trying too hard to fix the outcome there, we get three "successes" out of four. And the "successes" are? Ninawa, Salah ad Din and Kerbala - three of the least violent clusters in the study.

In order to swallow the idea that the pairs were chosen with a view to cooking the outcome, you have to believe that the study team were not only dishonest but also pretty clueless about probability theory. From where I'm sitting the Lancet authors are not the most obvious candidates for that description.

By Kevin Donoghue (not verified) on 06 Oct 2005 #permalink

Chris,

I have admitted many times when I got it wrong. In that instance, I seriously just got it backwards. Go to my blog, there I admit what, 5 mistakes? How many have admitted mistakes on here? Exactly.

Besides, it was irrelevant that I got it backwards, because the point still stands. The median is a better figure to use when the distribution is skewed, which is the case with the Lancet method.

Lambert,

I told you if you really wanted to find the general probability of Missan getting a cluster, to go ahead and figure it out. Instead of doing that, you make some lame attack on me.

The figure would not be 7% no matter where Missan was on the list. The real number is irrelevant as I tried explaining to Donald or whoever it was: ALL THE PROVINCES HAD THE SAME PPS PROBABILITY IN THE INITIAL SAMPLING.

Good job on once again being intellectually dishonest Lambert. What you're doing is basically like pointing out irrelevant spelling mistakes in someone's argument.

Donald,

For the love of...

The expectation values are irrelevant with the Lancet method. Can't you see that? I've drawn you a plot. I have written it out in plain English. I have written numbers. What more do you need? Expected values are used to determine what one would expect after repeating a random trial many times.

The Lancet only flipped the coin once. Therefore, the expected value is irrelevant.

From calculating the expected value, you will get 2 clusters for Basrah, and 1 for Missan. No matter what strategy you use. That's nice, and I have agreed with that. However...

With the Lancet method, you will ALWAYS GET 0 OR 3. ALWAYS. Therefore the expected values of 2 and 1 are completely irrelevant. You will never ever attain these values, because you are not doing a series of trials, but ONE.

I guess I have to put up my plot graph again:

See this? See the Lancet cluster grouping? See how there are no results in the middle there? You know, where the mean Lambert likes to use would be? The "expected values" would also lie in the middle there. With a true cluster sampling, those expected values would be possible to attain, therefore they have meaning with cluster sampling. With the Lancet method, they are irrelevant because it is IMPOSSIBLE for those expected values to be attained.

John,

"The expected value is not a red herring. A sampling bias would change the expected value of an survey. The expected value wasn't changed by the survey method, therefore there was no sampling bias.

This is not an opinion, it is a fact. Seixen is denying this fact. Unfortunately, when a person continually denies an obvious fact there are only two reasons and neither are pretty — dishonesty or stupidity."

Yes, instead of actually proving your case, just go with the "Seixon is stupid, and I am right" method of argumentation. Good job!

SO HOW ARE YOU GUYS DOING FINDING DOCUMENTATION FOR THE LANCET STUDY'S METHODOLOGY?

Seixon,

Putting aside for now the points already made to you, since you have shown your ability to evade them, let me raise another matter that puzzles me. Increasingly, as your forward defences crumble, you are retreating to this line: nobody has produced an example of an earlier study where clusters were paired randomly in order to reduce travel or the like. (You reject Ragout's example as insufficiently similar and my guess is that you will do the same with any future offerings.) You can rest assured that I'm not going to come up with an example, firstly because I don't know of any; secondly because, as has been pointed out to you many times, it's really not up to the rest of us to do your homework for you and, lastly and most importantly, because I cannot see that it matters in the slightest.

Your attitude on this last point is what puzzles me. You really do seem to think that you have a compelling argument here. Has it never occurred to you that every statistical technique in use today was quite unprecedented the first time it was used? In many fields it is a tall order to get a research paper published in a reputable journal unless some new technique is used. Researchers are expected to come up with novel ways to tackle problems. It is largely by doing so that they win the respect of their peers. Has there ever been a case where a reviewer advised the editor of a journal against publication of a paper solely because it introduced a new procedure? I seriously doubt it. What reviewers are supposed to do is pass judgement on whether the novelty is appropriate in the circumstances. In this case the verdict was favourable.

The challenge for you is to show that pairing was an unsound move in this particular case. If you can do that, it doesn't matter if thousands of studies have done it before. If you can't, you will get nowhere by simply protesting "there is no precedent for this!" like some fussy old bureaucrat in Yes, Minister. Innovation is what it's all about. Are you an enemy of progress? Why do you hate America so?

By Kevin Donoghue (not verified) on 06 Oct 2005 #permalink

[With the Lancet method, you will ALWAYS GET 0 OR 3. ALWAYS. Therefore the expected values of 2 and 1 are completely irrelevant. You will never ever attain these values, because you are not doing a series of trials, but ONE.]

How much would you pay in order to play a game which would give you 0 if an unbiased coin came up heads and $10 if it came up tails?

Seixon, I know that you can't get 1 or 2 clusters in a paired province the way the Lancet procedure worked, but only 0 or 3. In my own example there was no chance of getting the expectation value for the death toll, but only a result which was too high or too low.

So yeah, I know that the expected value refers to what you'd expect to get if a million Lancet teams went through the exact same procedure and then averaged their death tolls. Any given actual survey with paired provinces is going to give results that are too high or too low for the paired provinces, depending on which province won the 3 clusters and what its death rate was. If the death rates are similar it won't matter much. If they're much different it's quite possible the calculated toll will be too low as well as too high.

Kevin, that's interesting about the provinces--I'll look at it later if I have time. On what a hypothetical evil Lancet team would do, I think I'd do it differently. If they pair a large violent province with a small peaceful one, they'd be taking the risk that the small one would win and greatly skew the results down. The victory of the large violent province over the small peaceful one in such a case wouldn't skew the results upwards so much. So if they want to skew the data they might as well go ahead and pretend the random number gave them the province they wanted, and then pair them in the way that maximizes the death toll. In that case, I'd pair a small violent province with a large peaceful one and claim that the small one won and I'd also make darn sure that when I singled out a place like Sadr City I sample a neighborhood where somebody got killed and I'd decrease the Fallujah death toll so it wouldn't be such an outlier and could be included. But maybe it's all part of some subtle overarching plot to raise the death toll by some fractional amount, and Sadr City's bloodlessness was meant to throw us off the scent. Anyway, since the UN number for war deaths isn't that out of line with the Lancet figure for violent deaths, it doesn't seem that province selection (random or contrived) had a very big effect.

I wish they had deliberately contrived to drop a few more clusters in Fallujah, doing a separate analysis for that town. But that would have been very risky at the time.

By Donald Johnson (not verified) on 06 Oct 2005 #permalink

Kevin,

Did you even read the example Ragout gave? I'm guessing you didn't, because your recent comment is seriously dishonest and smacks of desperation. I will quote from the example Ragout cited, and I will ask you to show me how in the world the Lancet study is similar:

Survey estimates are derived from a stratified, multi-stage cluster sample. The primary sampling units (PSU's) composing the first stage of the sample were counties, groups of counties, or large metropolitan areas. Large PSU's were included in the sample automatically and are considered to be selfrepresenting (SR) since all of them were selected. The remaining PSU's, called nonself-representing (NSR), because only a subset of them was selected, were combined into strata by grouping PSU's with similar geographic and demographic characteristics, as determined by the 1990
Census.

The initial 1990 design consisted of 93 SR PSU's and 152 NSR strata, with one PSU per stratum selected with probability proportionate to population size.

Go ahead, tell me how this is like the Lancet study. I will enjoy reading it.

You can rest assured that I'm not going to come up with an example, firstly because I don't know of any; secondly because, as has been pointed out to you many times, it's really not up to the rest of us to do your homework for you and, lastly and most importantly, because I cannot see that it matters in the slightest.

So again, Seixon you're an idiot, we cannot show you how or why, because the points you bring up are so irrelevant that we can't be bothered with even posting a single paragraph showing your error.

Great job spinning Kevin. Lambert has come with an alarming amount of false statements, and in his desperate quest to take me out, he almost wandered into hanging his hat on the initial sampling process being fraudulent. Ouch.

Instead of being concerned with Lambert making a number of directly false statements, now you have created the most absurd argument I have read for a long time. That the great JHU researchers have indeed invented a new type of sampling, and that I must be a rube for not recognizing the immediate brilliance of it. Oh boy.

Innovation? For what, creating imprecise mortality studies? Oh gee, sign me up for some of that!

dsquared,

How much would you pay in order to play a game which would give you 0 if an unbiased coin came up heads and $10 if it came up tails?

Again with the irrelevant anecdotes. What on Earth does a gambling example have to do with distributing clusters to create a random and unbiased sample for a mortality study?

In your example, I would wager up to $5 if I were playing it a series of times because in that case, my combined winnings would outpace my wagers. Just one problem: I would have to play it a series of times to ensure this.

Unlike gambling, the Lancet study was only being carried out a single time. Which is what I made quite clear, which is why the expected values are meaningless. So are the averages of conducting the experiment.

Also, by changing your wager, you are also changing your earnings. You could not do this with the Lancet study.

Say, I wager $1 each time. Then my expected net winnings would be $4 per time. $2, it would be $3. $3 it would be $2. $4 it would be $1. $5 I would come out even.

Thus, by changing the wager, you are changing the expected values. With the Lancet, you did not have this choice.

In fact, the two scenarios are so different that I fail to see what you wanted to accomplish by bringing it up. You know, other than trying to derail the discussion once again since you are losing the battle.

Donald Johnson, when the study's authors pair regions that they assume have similar levels of what they are measuring, and the regions are actually dissimilar, naturally, the expected values don't change. But the actual values are different. This results in bias. And we agree that such a bias can go either direction. Also, if there is any connection between the reason for pairing (miles traveled) and statistic to be measured (deaths), then there is clear bias - but of a different type.

As for how the clusters were assigned after regions were combined, I don't see a problem there at all. I think I've addressed that before.

I am inclined to think that because of the nature of war, particularly localized bombing, under sampling of actual deaths is probably more of a problem than over sampling.
My main problem with the study, however, is the rosy picture it presents of the mortality rates before the war. They just don't comport with any of the on the ground studies done around that time (none actually cover that time period directly). They certainly don't make sense in light of what we know about that time period in Iraq. So, I take the mortality rates from the study seriously but the determination of excess deaths with reservation.
Separately, and addressed generally, there is too much of a link implied between bias and fraud in these discussions. The authors present a long list of potential sources of bias in their discussion. These biases are not included in the error or variance presented in the results, because, it's not known to what extent these biases are affecting the study. But the authors know that each of these areas are of a type of problem that can undermine the integrity of the study. That's why they are mentioned.
Kevin Donoghue: "If the study team wanted to cook the results in that way..."
I think you might be looking for someone else.

Donald,

OK, so you are starting to see that the expected value of this experiment is meaningless?

Yes, you are correct, the death toll will either be too high, or too low if we assume that the provinces are unequal in death rate. That's why the variance rises. However, the fact that the more populous region has more of a chance at winning the toin coss, that biases it towards that province. This bias would not exist if the clusters were distributed in another way.

The chosen provinces weren't random, and the pairings weren't random.

As far as Kevin's hypothesis regarding nefarious motives. Can't you guys see that the way the Lancet study is set up, it is far more susceptible to nefarious conduct than had it been carried out in another way?

In fact, if they only wanted to visit 11 provinces, then all they would have to do is reduce the number of clusters. If they wanted to reduce travel and limit the number of provinces to visit, they could have conducted the initial sampling with fewer clusters.

In fact, I will carry out the same initial sampling with equal-step random start as they did, but this time, with 15 clusters. My results:

Anbar: 1
Diala: 1
Karbala: 1
Wasit: 1
Arbil: 1
Najaf: 1
Dehuk: 1
Babil: 1
Dhi Qar: 1
Qadisiyah: 1
Ninawa: 1
Tamim: 1
Baghdad: 3

13 provinces sampled. But alas, only having 15 clusters would run up the DE, and give off the impression of an imprecise study. Tut, tut, we can't have any of that, now can we?

eudoxis,

OK, so you agree with one of my points, but perhaps not all of them. I want to ask you how come you don't think that the one-off distribution of the clusters in the pairs resulted in biasing it towards the more populous of the two. I have tried, in vain, to find any methodology that mirrors this, and have found none. Lambert hasn't provided any, Ragout tried to claim he did, but did not. Kevin has now spun this into claiming that perhaps the Lancet study invented a brand new methodology.

As I argued, distributing clusters in this way does not create a random sample because n elements aren't distributed randomly on their own. Tim and the gang then argued that distributing any amount of elements or clusters with a single trial would still be "random". So in other words, according to them, distributing 1,000 interviews across the United States with a single random trial would result in a "random sample".

What's your thoughts on that?

You talk about them bringing up biases in their discussion section. What you didn't mention, however, is that they mostly discounted biases that would bring the toll upwards, and gave credence to ones that could have brought it down. This led them to state that they believed the estimate was "conservative". For example, they said that they told the households that they would not gain anything from the interview. Then they claimed that the households may have lied about living people to receive more rations. When it came to Iraqis possibly lying about dead people (in the belief of getting compensation, e.g.) they discounted this and said that it was against Iraqi culture to do such at thing. Wha??

So in other words, they would bring up biases, but then they would say that the ones which might have brought the estimate down were credible, while the ones that would have brought it up were not. Again, leading them to claim their estimate was "conservative".

Isn't this problematic??

I also don't see why everyone here is hard-linking bias to fraudulent intent. I think that is being done because they want to discount the possibility of Les Roberts conducting the study unethically, therefore discounting the possibility of any bias being there at all.

All I am doing is showing that there was bias introduced into this study by the authors. Whether or not they did this with fraudulent intent is up to the eye of the beholder. The fact of whether or not the bias exists is by this point undeniable.

I think I have enough evidence that points to possible unethical conduct, but I will not let that get in the way of just showing that there is in fact a bias introduced by the authoers into this study.

I mean, no one here has yet answered to why the Lancet website lied about the conclusion of the study, which is directly what sparked news headlines across the world, including here in Norway.

All tough questions are suppressed, and instead, I am being lynched for daring to take on the Oracle, Tim Lambert.

His numerous false statements in this discussion, all in the attempt of derailing my points, should indicate to one and all that Lambert has lost this discussion and is solely conducting damage control.

Seixon, I understood the effect of pairing the first time I set up my fictitious Iraq survey problem sometime in the previous thread. I suspect everyone in this discussion understands that if you pair two provinces and give all the clusters to one through the procedure given in the Lancet study, it's likely to give a death toll that is too high or too low for the two provinces added together unless the two provinces had similar death rates--this became clear to me when I first did the super-simple algebra of my toy survey a week or so ago and I might have been one of the last people around here to get this. So it's best to try and pair similar provinces, but if they failed and paired non-similar ones it's just another source of error, not something that makes the entire survey completely worthless.

I was about to go on to make my usual point about whether this made much difference in reality and then say that it didn't, based on the UN report, and then add my customary lament that more sampling wasn't done in Fallujah since the real unsolved mystery of this war is how many civilians the US may have killed when the press wasn't looking, but I've said all this before (about five or ten times). There ought to be a limit to how many times a person can say the same thing in one thread and if there is, I've reached it.

Eudoxis brought up what might be new points for this thread--I have the vague impression that the UN survey showed a significant increase in child malnutrition because of the war, which would seem to support the Lancet claim of an increase in infant mortality. As for the Lancet study, as someone who used to rant against the sanctions during the 90's I was surprised by the relatively low infant mortality rate the Lancet team found for the prewar period, but maybe the sanctions death toll was somewhat exaggerated (something that wouldn't surprise me regarding the number of Saddam's victims as well, but I don't know). Or maybe the Oil-for-Food program had been successful in greatly reducing their impact. In general I've gotten a bit more skeptical about nearly every number or claim I read about Iraq both before and after the war.

By Donald Johnson (not verified) on 06 Oct 2005 #permalink

Donald,

"So it's best to try and pair similar provinces, but if they failed and paired non-similar ones it's just another source of error, not something that makes the entire survey completely worthless."

How many errors and unscientific compromises does a team of researchers need to make before you are willing to say that their conclusion is basically worthless?

Pairing up Basra and Missan, among others, and using the one-off method biased it towards the more populous. If SRS was used, then it would have just added to the variance as everyone has been saying. The reason for this is that, for example, Missan had a theoretical 39% chance of being excluded in the initial sampling, but because of the pairing, now had a 28.7% chance of exclusion. That's in the case of SRS. With the Lancet method, it was 66%, so you can see how the Lancet method biased it. The bias stems from using an unsupported methodology of redistributing all of the clusters within a pairing to one of the provinces with a single random trial with any number of clusters.

That comes in addition to arbitrarily selecting some provinces for this pairing process, and not others, and the composition of the pairings.

I showed you what they could have done to legitimately cut down the number of provinces they would need to travel to, by reducing the number of clusters.

Of course, I also showed that they were not willing to do this, because a survey with less than 30 clusters would ring off alarm bells with most casual academic readers.

They purposely sought to oversample 6 provinces of Iraq, and purposely sought to not sample 6 others. This was by design, not by chance. Therein lies the flaw that undermines the entire conclusion of the study.

Seixen: Yes, instead of actually proving your case, just go with the "Seixon is stupid, and I am right" method of argumentation. Good job!

Actually -- I did both. I did leave open the possibility that you were dishonest instead of stupid, but truth be told I don't think you're intentionally dishonest.

In case you missed the point -- let me repeat. I explained how the expected value did matter because sampling bias would change the expected value, and yet the expected value of the survey did not change. Therefore there was no sampling bias and your suggestions to the contrary are disproved. Not an opinion Seixen. This is fact. It has been explained to you several times.

I am not introducing an empty slur when I mention stupidity. If the words has any useful definition, you have met it on this forum.

By John Humphreys (not verified) on 07 Oct 2005 #permalink

Seixon, "I want to ask you how come you don't think that the one-off distribution of the clusters in the pairs resulted in biasing it towards the more populous of the two.It does and it's supposed to. It's a proportional distribution; a region that is larger should have greater representation. "...So in other words, according to them, distributing 1,000 interviews across the United States with a single random trial would result in a "random sample".

What's your thoughts on that?The sample is random but, and I think this is what you are getting at, the sample doesn't have much of a normal distribution and isn't very representative of the population until it reaches a certain size. (that's why n>=30 for clusters).
I think I have enough evidence that points to possible unethical conduct...I don't think so. Though I can't read hearts and minds, it's clear that the bias in the study is incidental and not deliberately done to skew results. The results have subsequently been used or abused in political ways and there is much to say about unethical conduct in that regard, but this team of researchers carried out a fine study and a fine analysis. I mean, no one here has yet answered to why the Lancet website lied about the conclusion of the study, which is directly what sparked news headlines across the world, including here in Norway.I take issue with "lied". The study was presented as a means to a political and humanitarian end. That's been said many times.Nothing in science or epidemiology is ever as clean as cookbook statistics. All studies come with a series of caveats and pitfalls. Someone, like you, who can understand some of these is several steps ahead of a public who is just fed the bottom line. The bottom line here isn't wrong. It's just highly uncertain and you've pointed to some of the uncertainties.

I think you might be looking for someone else.

Eudoxis,

It wasn't my intention to equate your view with Seixon's. On my reading of your exchange with Donald Johnson, you were examining the merits of some watered-down version of Seixon's theory about pairing, without necessarily championing it. One possible version might involve a (possibly subconscious) disposition on the part of the researchers to seek out violent areas by pairing violent governorates with peaceful ones which have a significantly smaller population. As I pointed out, the actual outcome doesn't provide much support for either Seixon's theory or a weaker version of it.

By Kevin Donoghue (not verified) on 07 Oct 2005 #permalink

eudoxis,

"It does and it's supposed to. It's a proportional distribution; a region that is larger should have greater representation."

Yes, a larger region should have a greater representation. But not ALL the representation. A smaller region should have less of a representation. But not be WITHOUT representation. Get my drift?

"The sample is random but, and I think this is what you are getting at, the sample doesn't have much of a normal distribution and isn't very representative of the population until it reaches a certain size. (that's why n>=30 for clusters)."

Ergo, calling it a "random national estimate" is basically tantamount to a lie. Especially when n=1....

"I take issue with "lied". The study was presented as a means to a political and humanitarian end. That's been said many times."

OK, so the entity that reviewed and published this study, the Lancet medical journal, shouldn't they know that the figure wasn't just "civilians"?

They had on their website "100,000 civilians". Don't you think this is a lie? The study doesn't conclude this, and the Lancet medical journal, if they actually read the damn thing, should have known that. So why did their website say that??

Humphreys,

Obviously you didn't take a hint from the plot graph I made. The expected values only mean something if you are going to be repeating the experiment in a series of times, which is not the case with the Lancet study since it is only done one single time. You obviously didn't understand that the expected values become meaningless when they will not ever become possible.

I guess I will have to make another plot graph of the distribution comparing the Lancet method with genuine cluster sampling....

"Yes, a larger region should have a greater representation. But not ALL the representation. A smaller region should have less of a representation. But not be WITHOUT representation. Get my drift?"

No...

Are you saying the study is biased because the set of Iraqis who are 47 years old, missing their left leg, dislike the color yellow, and have never eaten soup on a Wednesday when it was raining is totally WITHOUT representation? There is a difference between being without representation, and without any chance of being represented. It's like the difference in odds of winning the lottery between buying one ticket and buying no ticket. It's a small numerical difference, but a major conceptual difference.

Kevin Donoghue, "One possible version might involve a (possibly subconscious) disposition on the part of the researchers to seek out violent areas by pairing violent governorates with peaceful ones which have a significantly smaller population."That is a loaded hypothetical. Sorry, but I don't want to impugn the researcher's motives in their study design. Let's just say that the variation of the population within the cluster is different than what the team expected. That is what happens when dissimilar clusters are fused while they are treated as similar clusters. The correlation coefficient goes up and the resampling by bootstrappoing is unlikely to pick this up. The result is that the precision of the study is effected. I'm curious that anyone can discount such a possibility based on the results of the study. Potential sources of error are not computed.

z,

The basis is that the sample is supposed to be proportionally representative. This isn't true when you distribute 3 clusters with only one trial, even if the probability of that single trial is proportional. Sending all 33 clusters to Baghdad with a single trial would equally not be proportionally representative of Iraq.

It's funny how much Lancet-apologists have to stand on the side of slaughtering how representative a sample is to the point of undermining randomness of the sample.

Kevin,

"One possible version might involve a (possibly subconscious) disposition on the part of the researchers to seek out violent areas by pairing violent governorates with peaceful ones which have a significantly smaller population. As I pointed out, the actual outcome doesn't provide much support for either Seixon's theory or a weaker version of it."

There is no secret that they sought to only sample the provinces that were around Baghdad. Their study design makes that perfectly clear. I'm uncertain how you can claim that the actual outcome doesn't provide support for my theory or a weaker version of it. How can you say that without knowing the mortality rate of 7 provinces of Iraq?

As eudoxis says, and I have been saying, there is an ambiguous bias introduced into the sample, so we have no idea what effect this had, and we can't see how they corrected for this in their conclusion.

Of course, claiming that the paired provinces were in fact similar makes this a lot easier. Then you can correct for this method of theirs. It seemingly worked, since uncritical people like you took their uncorroborated word for similarity and seek to smear everyone who dares speak up about it.

The numbers I came up with as an indicator, reshuffled by BruceR, show that according to the only other available indicator, coalition death rates, 5 out of the 6 pairings were not similar in the least.

There is still no one who will give a real answer to why the Lancet medical journal lied about the conclusion of this report on their website. There is a precise difference between claiming that 100,000 "Iraqis" died in contrast with "civilians". The report concludes with the former, the Lancet website broadcast the latter.

Why?

Let me just throw this very simple question out there guys:

If there really is no difference between distributing 3 clusters with one trial, and 3 clusters with 3 trials, or any other number, if this really does not affect the probabilities...

Why is always the latter used, and not the former, in cluster sampling?

Pairing provinces the way the Lancet team did isn't usually done because it increases the variance, or in plain English, makes the result less precise. The Lancet paper pointed this out and justified doing it on the grounds that it decreased the travel time by a third.

By Donald Johnson (not verified) on 07 Oct 2005 #permalink

The majority of the violent deaths and probably most or all of the nonviolent deaths in the Lancet study were likely civilians. I think that's true whether you include or exclude the Fallujah numbers. To what extent that represents what's really been happening in Iraq--well, who knows? I suspect civilians are in the majority.

Besides, I don't know, but it seems to me that surveys are less likely to pick up on insurgent deaths. If a family member was an insurgent and died fighting the Americans or the Iraqi government, would you tell a stranger about his death? Maybe there would be awkward questions about why he died. Trying to put myself in that position, I'd probably tell a survey team about innocent people in my family that had been killed, but I'd be very cautious about saying anything if my son or brother had died fighting the Americans. There are death squads in Iraq, after all, and anyway, after decades living under Saddam people are likely to be cautious what they say to strangers.

By Donald Johnson (not verified) on 07 Oct 2005 #permalink

Donald,

Decreasing the number of clusters would have increased the variance and travel time. This would have been also statistically supported. Why didn't they do this? Why can't anyone find any study that has previously carried out a survey in this manner? Why are there no mentions of doing things in this way in sampling documentation?

I know they justified it because it cut down travel. The thing is, there were other legitimate ways of accomplishing this than the way they did things. They also paired the provinces according to "belief" and nothing else.

If they didn't want to travel all over Iraq, why didn't they just decide from the outset only to sample central Iraq, which is what they ended up doing anyways? Then they could have extrapolated their findings to that portion of Iraq, and then they could have come up with a conclusion about that portion of Iraq.

Why didn't they do this? Oh, right, because they wanted to give the impression of a more robust study than what they actually carried out.

"Besides, I don't know, but it seems to me that surveys are less likely to pick up on insurgent deaths. If a family member was an insurgent and died fighting the Americans or the Iraqi government, would you tell a stranger about his death? Maybe there would be awkward questions about why he died."

I just love that you said this. From the Lancet study:

When deaths occurred, the date, cause, and circumstances of violent deaths were recorded. When violent deaths were attributed to a faction in the conflict or to criminal forces, no further investigation into the death was made to respect the privacy of the family and for the safety of the interviewers.

Helps to read the study, ace.

Seixon, "ace", I read the study and recall that passage. Your sarcasm is dumb. Think about the situation. For thirty years there's been a nasty dictator in place and now, under new management, there are Islamic terrorists, Shiite death squads that work for the government, and American troops who sometimes drag innocent people away to prison where they may be tortured. So under the circumstances do you think it possible that people might be a little bit cautious in opening up to a stranger who comes to their door and starts asking questions? Worse, if you had an insurgent in the family who had been killed, would it maybe cross your mind that if you talked about this someone might come later and bust down your door and haul you away? The very fact that you admit a male member of your household was killed by the Americans would make you a prime suspect. Under the circumstances I suspect some insurgent families would deny any deaths. Hell, some people might deny deaths that were completely innocent. I'm a tiny bit paranoid with people who try to ask me questions over the phone--I don't think I'd feel the slightest compunction about lying to a stranger if I lived in Iraq and I thought lying was the safest course.

Notice that the reason the authors give for being cautious is that they feared for the safety of the interviewers--in short, if they asked too many questions about a family of insurgents they might get their heads blown off. So, Seixon, did it cross your mind to think that a family that might kill an interviewer for being too nosey might not have scruples about lying to them? That maybe they'd cover up the death of their family member for their own safety. Or do you assume that a family that is trigger-happy and quite willing to kill a survey team member could be expected to be scrupulously honest to someone they might suspect is a spy for the government or the Americans? It's not like they're going aroud the neighborhood asking people about their favorite brand of toothpaste.

So no, Seixon, I don't think you can expect a family of insurgents to necessarily reveal their casualties to someone they might regard as a spy, someone they might in fact kill if they feel pressed too hard. And I doubt the Lancet team thought their carefully worded questionaire would always elicit the exact truth from people who might think their lives depend on not being truthful. You show a touching faith in their methodology when it comes to uncovering possible insurgent deaths.

By Donald Johnson (not verified) on 07 Oct 2005 #permalink

"The basis is that the sample is supposed to be proportionally representative. This isn't true when you distribute 3 clusters with only one trial, even if the probability of that single trial is proportional."

Yes, it is. If you picked one household out of the entire country randomly and asked them how many dead they had and extrapolated to the whole country, that would be proportionally representative. The uncertainty would be very wide, however. You don't have to sample every house, or every town, or every province, or every cluster. You merely have to offer each the chance of being sampled.

Seixon: I'm uncertain how you can claim that the actual outcome doesn't provide support for my theory or a weaker version of it.

Is my comment on the pairings conspiracy theory unclear?

Why can't anyone find any study that has previously carried out a survey in this manner?

"Where's the precedent for this?" Would that I had a fiver for every time I've heard a crusty old civil servant use that rhetorical question as a device for resisting change. New problems, new methods: that's the way research is done. If you don't like it, join some organisation where regulations trump all. Maybe the army? "If it moves salute it, if it stands still paint it white."

As to the "100,000 civilians" thing, Tim Lambert devoted an entire thread to it. One reason why your questions don't get answers is that they have been discussed many times before.

By Kevin Donoghue (not verified) on 07 Oct 2005 #permalink

Donald,

Most of the interviewers were Iraqis. If the insurgent was, in fact, dead, then why would the family be scared to talk about it? Sure, some times, that may have been the case. But as you said, they also might not tell about any deaths. Or, they might tell about more living people in the hopes of it getting them more rations, as was a standard affair in Iraq for years and years. But no, the Lancet team dismisses the latter and cites a cultural reason for why this would not occur.

Now, I have been in contact with an Iraqi from IraqtheModel, and he says he is not aware of any such cultural phenomenon. Hmmmmmm.

z,

"Representative"? I think the word you were clinging to in your comment there was "random", not "representative". A sample of a single person in the USA is not representative of the country, sorry.

Kevin,

Yes, the genius JHU team invented a whole new methodology! Why, I must be an idiot at not seeing the sheer brilliance of nuking the precision of a study by pairing provinces post-sampling according to a "belief".

I know knight you Premiere Lancet Apologist.

If Lambert has taken on the 100,000 lie earlier, can you at least provide a link? Thanks.... Hopefully he won't make a list of false statements there as well...

Eudoxis,

I'm not sure whether we disagree very much at all. AFAICT, the following are the effects of pairing governorates, whether or not the presumption of similarity is false:

The sample remains unbiased, by definition, since each household still has an equal probability of inclusion. The estimators (of the relative risk ratio and excess mortality) are likewise unbiased: their expected values are equal to the population values. The variance of the estimators is increased. Whether the reported CIs fully reflects this depends on whether the sample happens to be sufficiently diverse to capture the underlying variation in the population. The more dissimilar the paired governorates are, the bigger this problem will be. But it really would have to be a whale of a problem to put the main findings of the study in doubt.

To assess whether there is such a problem here, we should look at any available evidence on comparative mortality in the six pairs. If particular pairs of governorates are similar, then whether they, as a group, differ markedly from unpaired governorates doesn't matter. Hence comparisons between paired and unpaired governorates shed no light at all on the problem.

That's my take on it, but I would especially welcome comments from those who have experience with bootstrapping and suchlike, or good ideas about how to detect differences in the six matched governorates.

By Kevin Donoghue (not verified) on 08 Oct 2005 #permalink

Yes, the genius JHU team invented a whole new methodology!

Do I detect a hint of envy? It's really quite unbecoming.

By Kevin Donoghue (not verified) on 08 Oct 2005 #permalink

Kevin,

Yes, I am incredibly envious of the JHU team inventing a new way to screw up their sample and study. I'm envious of the ability to oversample certain areas, and exclude other areas, then extrapolate the result from these as if they were similar without any evidence or indication of such similarity.

Yup, I'm envious alright. Envious of their ability to get notorious apologists like you and Tim Lambert.

""Representative"? I think the word you were clinging to in your comment there was "random", not "representative". A sample of a single person in the USA is not representative of the country, sorry."

Is it biased? I beleive that was the original assertion regarding the study in question. If it is not biased, and not representative, what is it?

So Seixon, what is your best guess for the number of casualties above and beyond those expected under the Saddam status quo? Shouldn't we be getting to the truth of that number instead of dancing on the head of a pin? Well, it does seem like there are bunch of angels on that pin, it is not clear to me you are on their side.

The sample is biased due to the oversampling of certain provinces, and the exclusion of others, according to a nonrandom and statistically unsupported methodology.

Selecting 90 households from one province in a pair, and 0 from the other, is hardly representative of those two provinces.

Pinko,

As I have stated, I put the number of deaths at around 50,000 according to the best information we have at this time. That is not comparing it to Saddam's Iraq. Yes, we should be getting to the truth of the real number, instead of defending a sham of a study to the death like Tim Lambert and his crew are trying to do here.

Seixon,

Please could you list all of your peer-reviewed publications in the space below. If you have any on the relevant topic (the Lancet article), that would be even better. I dont expect this to be too hard for you.

Moreover, as I said in an earlier thread, the Bush/Cheny/Blair war party is not interested in counting the dead in Iraq. Just as the aggressors didn't give a hoot about the number of dead as a result of other imperial adventures over the past century e.g. The Philippines, Korea, Viet Nam, Nicaragua, Panama, Haiti, Chile, Somalia, Indonesia etc. Human life is meaningless when it gets in the way of establishment policy. The powers that be only dredge it up as a convenient pretext when other excuses are proven to be outright lies or when camouflaging their real agenda. Far from proving the Lancet study to be incorrect, all Seixon has done is to shed light on western crimes by revealing how our plutocratic leaders yield collective yawns when requests are made to count 'our victims'. The only thing that scares these people is public opinion, hence why all kinds of mendacious propoganda are employed to downplay the findings of those who bother to make an effort to tally up the victims of our economic wars.

Seixon, like other apologists of US (and UK) atrocities, is making an exceptionally feeble attempt, albeit indirectly, to give the impression that the 'crazies' current occupying the White House actually care about democracy and human life, hence his failed, and frankly abyssmal, attempt to legitmise another imperial attack on a defenceless punching bag of a nation. Like the others who write in here in defence of the "American creed", he is living proof of Richard Falk's "One way moral-legal screen" argument.

By Jeff Harvey (not verified) on 10 Oct 2005 #permalink

Jeff Harvey,

You seem to have neglected to list your relevant peer-reviewed publications on this topic? Or is that to hard?

Btw, rambling, disjointed deconstructionist criticism from a protist viewpoint on the predatory imperialism of war-mongering Mononchus nematodes will not suffice.

My God! The return of Bubba, who was banned from Quark Soup for his irrational touts!

Bubba, as far as I am concerned I lump with in with all of the other corporate/plutocratic/neocon/crazies et al. who pop up with their imperial blather on various blogs. I'd like to ask you - and Seixon for that matter - how many deaths constitutes a 'bloodbath'? Seixon has tried (and miserably failed) to convince people on tis blog that the US/UK war party, in attacking another defenseless state (in this case Iraq) could have killed 100,000 or more civilians. Without understanding the basis of cluster sampling, he's concluded more-or-less off the top of his head that the real total is probably around 50,000. Only 50,000!!!!! Gee, that is a consolation! That expunges the war party from guilt in a massive crime, doesn't it? Or do 50,000 deaths constitute genocide, for which those who orchestrated it are guilty of cries against humanity?!

As I said above, those who commit crimes are not at all interested in quantifying them. I am sure that Vladimir Putin is not pushing for an enquiry into determining how many Chechen civilians have been slaughtered by Russian forces since 1999, any more than the US wants to tally up the millions of civilians killed in their drive for global hegemony since the end of WW II. Crude estimates - like those obtained for US direct or proxy wars in Latin America and Asia - are all we have to go on, and they sure weren't made by the agressors. This enables the US to ignore them as "unoffical counts", which means that (in the public's vision) these crimes didn't happen. It took only a year after the slaughter of up a million Phillipinos by US forces in 1901-02 for the media and historians to enter into denial mode - that the "most peace loving of nations" which espouses human rights and liberty could be complicit in such a massive crime. This is all part of the "American Creed" which is a form of indoctrination.

As far as peer-review is concerned, at least I publish my research papers through this system which works because it keeps science safe. Once peer-revew is abandoned, then every crackpot theory - from flat earthers to alchemists - attain equal standing. The Lancet is a rigid journal and there is no doubt that the Iraq study went through a very rigorous peer-revew process before acceptance and publication.

By Jeff Harvey (not verified) on 11 Oct 2005 #permalink

Bubba,

He also seems to have "neglected" to assert his own favored correct figure for fatalities without bothering to waste time with any of that silly field work, statistical analysis or peer review.

By Ian Gould (not verified) on 11 Oct 2005 #permalink

"Moreover, as I said in an earlier thread, the Bush/Cheny/Blair war party is not interested in counting the dead in Iraq. Just as the aggressors didn't give a hoot about the number of dead as a result of other imperial adventures over the past century e.g. The Philippines, Korea, Viet Nam, Nicaragua, Panama, Haiti, Chile, Somalia, Indonesia etc."

Ironically, the dysministration has just put out an RFP for some means of managing (in the management sense) the War on Terror, including
"a system of metrics to accurately assess US progress in the War on Terrorism and identify critical issues hindering progress."
"Conduct research, analysis and provide recommendations on the best criteria by which progress in the War on Terrorism is to be assessed."
Help the Defense Department to "review, revise and assign relative weights to the metrics system."
"Design and maintain an automated process for assessing, recording, reporting and tracking the status of each metric."
"Establish a means to identify critical issues and/or shortfalls that are inhibiting progress toward accomplishing the objectives comprising the metrics system."
"Identify and report any issues that are inhibiting the completion of the tasks and provide recommendations for issue resolution."
http://blogs.washingtonpost.com/earlywarning/2005/10/the_terrorist_b.ht…
(Doesn't this read like something from The Onion?)

Of course, it's inevitable that what they will in the end contract for will be a CYA system that unerringly assures the managers that they are doing a great job. Hell, we do that now without spending money on some silly system.

Even more astoundingly, this recalls a memo from Rumsfeld two years ago, which shows a startling degree of realistic sanity:
"The questions I posed to combatant commanders this week were: Are we winning or losing the Global War on Terror? Is DoD changing fast enough to deal with the new 21st century security environment? Can a big institution change fast enough? Is the USG changing fast enough?
...
"We are having mixed results with Al Qaida, although we have put considerable pressure on them — nonetheless, a great many remain at large.
"USG has made reasonable progress in capturing or killing the top 55 Iraqis.
"USG has made somewhat slower progress tracking down the Taliban — Omar, Hekmatyar, etc.
"With respect to the Ansar Al-Islam, we are just getting started.
"Have we fashioned the right mix of rewards, amnesty, protection and confidence in the US?
"Does DoD need to think through new ways to organize, train, equip and focus to deal with the global war on terror?"
http://www.usatoday.com/news/washington/executive/rumsfeld-memo.htm
What happened to that guy, and who is this Rumsfeld we have now? And why the hell did it take 2 years for the Bushies to move from that memo to start floundering around for answers to the questions?

"He also seems to have "neglected" to assert his own favored correct figure for fatalities without bothering to waste time with any of that silly field work, statistical analysis or peer review."

Reminds me of a Tom Toles political cartoon from GWI:

Interviewer to American on the Street:
"How much do you think is a fair price for a gallon of gasoline?" "Err... Umm..." "Well, how much do you think the US should pay for military involvement in the Middle East to control petroleum prices?" "Umm... Err..." "Ok, well, how many Arabs do you think it would be OK to kill to keep oil prices down?" "Oh, that's easy. All of them".

"Selecting 90 households from one province in a pair, and 0 from the other, is hardly representative of those two provinces."

"Representative" is not a Yes/No item; it can be considered on a scale from 0-1. Sampling anything always reduces the "representative" nature of the sample. As I said before, the households studied are grievously nonrepresentative of 47 year old balding men missing their left leg who have never eaten soup on rainy Wednesdays. Nonetheless this does not constitute bias; it constitutes sampling error. Picking 90 households from one province and one from the other on a random basis certainly underrepresents the unsampled province; such is the very nature of sampling, some subpopulation is always going to be not selected in the sample, and therefore underrepresented. But is it biased against the unsampled province? Not unless there were built-in mechanisms which reduced the chances of picking that province. And is the observation that the sample does not consitute 100% of the population studied cast substantive doubt on the validity of the conclusions? Not unless it can be demonstrated that there is some sort of correlation between the estimated parameters, and the odds of being in or out of the sample. Otherwise, it's just random handwaving of the "you can never prove to my satisfaction that it never happened" sort.

Seixon, I think plain old lying might be a problem with any survey in Iraq if one is trying to count insurgent deaths along with the others. I don't think people would necessarily lie to get more rations if these interviewers told them they were conducting a survey. But families with insurgents in them are more likely to be paranoid (with very good reason)--they might suspect this harmless-seeming interviewer is trying to gather information about them and that the harmless-looking interviewer might soon be followed by some not-so-harmless American or Iraqi government forces paying a visit to this family. I'm pretty sure I'd think about that if I were an Iraqi whose son or brother had died fighting the Americans. So my guess is that they'd be the most likely ones to lie. There might be other reasons for lying too--the Lancet team made some effort to do random checks and I'm feeling too lazy to go look and see what they were. Asking for death certificates in some cases, I think. But you're not going to catch people who cover up the death of an insurgent family member that way--you might catch them if they, for instance, changed the gender of the dead person from male to female and you asked for a death certificate.

I agree with Kevin's post 118 and since he writes it up better than I could, I'll leave it at that.

By Donald Johnson (not verified) on 11 Oct 2005 #permalink

Enlarging on my previous largitude:
Any sampling procedure will leave some combination of factors unrepresented. Perhaps the definition of bias is that if the sampling procedure were repeated approximately an infinite number of times, the same combination of factors would be represented a proportion of the time which is less than its proportion of the population. If the 47 year old etc. etc. etc. represents 1/10,000,000 of the population, and an infinite series of samples would include him 1/10,000,000 of the time, then the sample is NOT biased, even if the current sample excludes him. Even if 9,999,999 samples in a row exclude him. Much the same for the provinces which were excluded in the current sample.

Jeff,

No I don't have any peer-reviewed publications. I am in no position to have anything I publish peer-reviewed. I might ask what in the world that has to do with me asking very tough questions to those who insist on foolishly defending a sham of a study.

First you say that Bush/Cheney/Blair are not interested in counting the dead in Iraq. Then you list up the Philippines, Korea, Viet NAm, Nicaragua, etc. Were they also in power back then? Oooops. Did Clinton care about counting the dead when he bombed Iraq? Sudan? Afghanistan? Kosovo?

Here's a tidbit of information and rational thought that may have escaped you: everyone counts their own dead. That is how it has always been and should be for obvious reasons. What reasons? Practical and logical. I'm sure you'd advocate our soldiers to go around counting dead bodies while they are getting shot at, but that's just not sound military strategy. When a person is dead, they are dead. No amount of counting is going to make them come alive, so what is the use of putting more people in danger to accomplish the task? Why not let each side figure out their own dead in the aftermath? Wouldn't that be more practical, rational, and logical?

I cannot either see that I have apologized for any atrocities, nor attempting to give the impression that the "crazies" in the White House care about democracy and human life.

I'm wondering whether you had so little to write about on this topic that you instead chose to smear, tar, and feather me as a a war-monger and bloodluster.

Seixon has tried (and miserably failed) to convince people on tis blog that the US/UK war party, in attacking another defenseless state (in this case Iraq) could have killed 100,000 or more civilians. Without understanding the basis of cluster sampling, he's concluded more-or-less off the top of his head that the real total is probably around 50,000. Only 50,000!!!!! Gee, that is a consolation! That expunges the war party from guilt in a massive crime, doesn't it? Or do 50,000 deaths constitute genocide, for which those who orchestrated it are guilty of cries against humanity?!

Color me amazed that I haven't convinced that many on this blog of my case against the Lancet study. I don't understand the basis of cluster sampling? Really? Would you care to elaborate? Tim Lambert wasn't up to the task, as he made numerous false statements as to what multistage clustering and cluster sampling is.

My number of 50,000 isn't concluded off the top of my head. If you even cared to read anything I was actually saying instead of shooting off knee-jerk reactions, I used the UNDP study, official figures from the Iraq Interior Ministry, and figures from the Icasualties site. It's easy to just sweep my arguments under the carpet when you misrepresent them as mere inventions based on nothing, when they are nothing of the sort. That is what Lambert has been doing the whole time.

The Lancet is a rigid journal and there is no doubt that the Iraq study went through a very rigorous peer-revew process before acceptance and publication.

None of which anyone outside Lancet has gotten to read. Oh, and here we go again, believing that this or that journal is above bias and agenda-setting.

I might ask you, since the Lancet peer-reviewed this study so vigorously:

Why did they then lie about the conclusion of the study on their website?

Ouch.

Next!

z,

"Representative" is not a Yes/No item; it can be considered on a scale from 0-1. Sampling anything always reduces the "representative" nature of the sample. As I said before, the households studied are grievously nonrepresentative of 47 year old balding men missing their left leg who have never eaten soup on rainy Wednesdays. Nonetheless this does not constitute bias; it constitutes sampling error.

Wow, that was an amazing amount of spin.

1. Yes, sampling does reduce how representative something is, that still doesn't excuse creating an unrepresentative sample.

2. We were not seeking to be representative of 47 year old balding men missing their left leg who have never eaten soup on rainy Wednesdays. We were seeking to be representative of Iraq. Quit spinning like a top.

3. No, of course not being representative of the 47 year old man you described isn't bias. Awesome job of knocking down a strawman!

Picking 90 households from one province and one from the other on a random basis certainly underrepresents the unsampled province; such is the very nature of sampling, some subpopulation is always going to be not selected in the sample, and therefore underrepresented. But is it biased against the unsampled province? Not unless there were built-in mechanisms which reduced the chances of picking that province.

90-1? In this case, it is 90-0. Quit trying to reframe things, you're just being dishonest. And no, the nature of sampling isn't arbitrarily pairing up provinces based on nothing other than a hunch and then only choosing one from each pair to represent the both of them.

Here again we see the meme of it all happening by mere chance. This is what Kevin and Lambert tried long ago, only when I called them out on the fact that all of this did not happen as randomly as they said it did. Whether they didn't read the fine details of the study or not is anyone's guess, there's little doubt they were trying to mislead.

I called attention to the fact that SRS wasn't used when distributing the clusters in the 2nd phase. Lambert then tried to point out that SRS wasn't used in the 1st phase either, even though technically, it was, and the two phases are completely different anyways.

Lambert claimed pairing them up and playing Cut Half The Sample Which Has Been Sample constituted "multistage clustering". This is utterly false. Multistage clustering would be dividing Iraq into larger areas, such as north, central and south Iraq. After this, distribute clusters via SRS PPS to each of these regions. Then after this, either distribute the clusters randomly via SRS PPS within each region, or divide up the region into smaller regions and repeat the process.

This is not what the Lancet study did as they retroactively paired up provinces, and then did not use SRS to distribute the clusters.

Not a single person here has admitted to Lambert lying through his teeth on this point. Nor has anyone admitted that the methodology used in this study is completely unsupported. Kevin has since spun this into them inventing a new method of sampling and calling me a hater of progress.

Sigh.

Perhaps the definition of bias is that if the sampling procedure were repeated approximately an infinite number of times, the same combination of factors would be represented a proportion of the time which is less than its proportion of the population. If the 47 year old etc. etc. etc. represents 1/10,000,000 of the population, and an infinite series of samples would include him 1/10,000,000 of the time, then the sample is NOT biased, even if the current sample excludes him. Even if 9,999,999 samples in a row exclude him. Much the same for the provinces which were excluded in the current sample.

Alright, here's an experiment you can do:

Distribute the clusters via the 1st phase the Lancet study did, the genuine cluster sample, a 1,000 times and see how often each province is sampled.

Then, distribute the clusters via the 2nd phase the Lancet study did, a 1,000 times, and see how often each province is sampled. (Of course, the results for 5 of the provinces, the unpaired ones, will be the same as above... Hint, hint...)

The results may shock and amaze you.

Lastly!

Donald,

You simply ignored what I said to you last time, the quote from the Lancet study. The JHU team didn't ask any questions regarding whether the deceased was a combatant or not, so how do you imagine they would be compelled to lie out of fear?

They never asked, "So, was he a combatant?"

And the possibility of families having both innocent civilian casualties and insurgent casualties seems to have slipped your intense analysis... Why be paranoid? If your household had an insurgent, just lie and say they were civilian if anyone asked you about it. Isn't harder than that. Of course, that is yet another thing you didn't think about in your quest to undermine yourself.

The argument about possible lying isn't that interesting to me, Seixon, but you're not making much sense to someone of my possibly more paranoid tendencies. If I had a son who'd died fighting the Americans and a stranger came to my door and asked about deaths, the thought would flit through my mind "If I mention I had a 20 year old son killed by the Americans, is someone going to be breaking my door down tonight, because anyone hearing about a 20 year old son killed by the Americans is probably going to suspect he might have been an insurgent and I do have other sons to worry about?" Not an unreasonable fear, given what goes on in Iraq, and it wouldn't surprise me if some survey respondents had it and chose to lie. But I can't prove it and you can't disprove it, though you can't really say it's unreasonable except by repeating what I already know, that the interviewers didn't push the issue. That doesn't really deal with my point.

If the American forces have some way of keeping track of which families have lost young men to American firepower, I'm sure they'd be on top of it, because those are obviously the families with a higher-than-average likelihood of having other members in the insurgency. Unless, of course, virtually all young men killed by American firepower are innocent bystanders. But go ahead and pretend not to understand this--I think I'll just go back into lurk mode.

By Donald Johnson (not verified) on 12 Oct 2005 #permalink

Seixon, I stand by everything I have written here, and no-one other than you is disputing what I have said. I've given up trying to explain statistics to you, because you refuse to learn.

And no, simple random sampling is not the same as systematic equal-step sampling. Under SESS Missan has a 7% chance of not being sampled, while under SRS the probability is 39%. Of course under either scheme or the one the Lancet used, households in Missan have the same chance of being sampled.

Donald,

So do you think it would be more likely for them to tell interviewers that they had a LIVING 20 year old son in the household?? I tend to think a dead possible insurgent is better for them than a living one... Regardless, they could just lie about the age or gender of the deceased if they wanted to escape scrutiny. Or lie about any deaths. But that would also apply if they had living young males in the household, would it not?

See, like the Lancet study, you are trying to have your cake and eat it too.

Lambert,

You have not given up on anything, since you haven't even attempted. No one has challenged what you have said because they have to cling to you like tree-huggers around here to keep their worldview intact.

SESS is a form of SRS, as each cluster is still distributed with its own random trial, although the trial is limited between 0-739,182 instead of 0-24,393,000. Regardless, each province still has the same PPS chance of being sampled for each cluster.

You claimed that what the Lancet study did was "cluster sampling" and "multistage clustering". This is utterly false and you know it. It's not a matter of me learning what cluster sampling or multistage clustering is, it is a matter of you admitting that you are misrepresenting what they are.

From the very beginning, you misrepresented my arguments. You pretended that my argument was that since most of Iraq was not in the sample, that this meant it was biased. That is not what my argument has been at all, yet you misrepresented my argument in such a way to imply I was a complete moron.

Multistage clustering is when you distribute clusters first via large areas, and then into smaller areas after that, via SRS or an equivalent unbiased method.

Multistage cluster sampling is where you first distribute clusters like the Lancet did in the 1st phase, and then you take a random sample from each of the clusters. Such as first choosing 33 school districts in Iraq, and then randomly sampling a number of students from each district. Or, adding more stages, randomly selecting 30 schools within the school district, and then randomly sampling a number of students from each school.

That's what multistage cluster sampling is.

The Lancet study, in its 2nd phase, does none of this. It pairs up provinces, after sampling, and then only picks one of the two provinces to sample. That is incongruent with any form of cluster sampling.

Instead of admitting this simple fact, you continue to play Mr. Authority and claim I know nothing about cluster sampling although I am demonstrating several examples of such knowledge.

You won't even admit that the images you created for this post were completely dishonest because they do not simulate a real world situation at all.

Only in Lambert's fantasy universe will the sampled mortality rate in a province vary as much as 1-6 or 1-13 without a normal distribution. Of course, in Lambert's fantasy universe, it is OK to base an entire sample in a province on a single randomly selected household, after all, that would be "random" thus meeting Lambert's rigorous requirements for sampling.

Seixon: SESS is a form of SRS, as each cluster is still distributed with its own random trial.... Regardless, each province still has the same PPS chance of being sampled for each cluster.

How wrong can you get? The destination of 23 out of 33 clusters is known with certainty before the random number is generated. Baghdad has 6, Ninawa has 3, Babilion and Thi Qar have 2 each and 10 provinces have 1 each. How on earth can you conclude that "each province still has the same PPS chance of being sampled for each cluster"?

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

Kevin,

Glorious. In fact, hilarious.

Why does Baghdad have 6, Ninawa have 3, and so on? Could it be because it is PPS? In fact, it is very similar to stratified sampling.

Baghdad "automatically" gets six because of its population proportion. The same with the others.

Just like stratified sampling.

But feel free to dig yourself yet another hole, one that Lambert started to dig but more wisely did not end up jumping into after I called him off...

...each cluster is still distributed with its own random trial....

Glorious. In fact, hilarious.

Well, quite. That was my point.

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

SESS is similar to SRS, not exactly the same. Each element still has the same probability of being chosen. Stratified sampling is the same way, similar to SRS, but not exactly the same.

Again, this is yet another irrelevant tangent to keep us away from discussing the real points here. You know, like Lambert redefining cluster sampling and multistage clustering in order hold up the Lancet study.

And then there is: ...the same PPS chance of being sampled for each cluster.

Does "each" have its very own Seixonian meaning too, quite unlike that which the rest of us use, just as "bias" evidently has?

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

Each cluster is distributed with its own random trial. You know, picking a number between 1 and the sampling interval? Isn't that a random trial? And what is the purpose of this tangent? Are we seeking to delegitimize SESS??? Or that I dared claim that SESS uses SRS?

Each cluster is distributed with its own random trial. You know, picking a number between 1 and the sampling interval?

Are you now asserting that this is done for each cluster?

Indicate precisely what you mean to say,
Your's sincerely, wasting away.

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

Care to read the link you supplied in comment number 75?

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

Aha. Gotcha. Appears I was in error. Only one random trial is conducted, and then the sampling interval is added to this one random number. However, each cluster is still distributed independently which is what is important. They are also distributed according to PPS which ensures that the distribution is unbiased.

The clusters in Lancet's grouping process did not distribute each cluster independently. Nor did they use SESS, SRS, stratified sampling, cluster sampling, or multistage clustering.

I admitted glossing over one detail, will you admit to anything?

I admitted glossing over one detail, will you admit to anything?

Seixon, although you might find it a bit depressing, I think it would do you good to read back through these two threads and make a list of the details you glossed over. I already admitted to believing something you said about the Lancet study without checking; that's not a mistake I am likely to repeat.

Now I'm going to the pub. I can't honestly say you drove me to drink, because I was going anyway. Good night.

By Kevin Donoghue (not verified) on 13 Oct 2005 #permalink

I am man enough to admit my mistakes, something which seems taboo around here otherwise.

Unfortunately for you, my mistakes are mostly irrelevant, such as this one. The fact that no recognizable sampling procedure was used in the Lancet study's 2nd phase is the elephant in the room.

The main point was that each cluster was distributed independently, which is consistent with all the other sampling methods. The Lancet method did not follow this, and is why the whole thing is flawed. In addition to the arbitrary nonrandom selection of provinces for grouping and the arbitrary nonrandom pairing of provinces, also not supported by any statistical means.

Have a nice night at the pub, maybe you'll get drunk enough to see the elephant.

No, Seixon, you are not man enough to admit your mistakes. You admit some of your mistakes, then you blow a lot of smoke to conceal the fact that your position is untenable. What started all this was the statement on your blog that the study is based on a biased sample. That statement has been proven false. The proof requires only the definition of an unbiased sample and elementary probability theory. This has been explained to you in detail. Yet you have not retracted your false statement.

By Kevin Donoghue (not verified) on 14 Oct 2005 #permalink

"'Picking 90 households from one province and one from the other on a random basis certainly'
90-1? In this case, it is 90-0. Quit trying to reframe things, you're just being dishonest.'"
Actually, it was a typo for "none", but I'm sure Dr. Freud would have some suspicions about my subconscious desires to subvert the truth.

"arbitrary nonrandom selection of provinces"
I feel we've reached the nub of something, there:

arbitrary:
seemingly **random** or without reason or system. Dependent on a whim.
http://www.angliacampus.com/public/sec/geog/gn009/glossary.htm

arbitrary:
Uncertain; **random**; accidental; discretionary; outside of central relevance to the methodology, law or principle, therefore accepting of individual choice and subjectivity. (MP) ARBITRATION: The hearing and resolution of a dispute by a person or legal body (arbitrator) chosen by the disputing parties or appointed by government statute. (See MEDIATION, NEGOTIATION, FACILITATION, DISPUTE) (MP)
www.biol.tsukuba.ac.jp/~macer/biodict.htm (still shows up in google definition search, but site appears to be gone now)

"Dr. Freud" BTW I actually meant Sigmund Freud, not any sarcastic allusions to anyone posting here who might suspect I was being more subtly deprecating than I actually have talent to accomplish.

Seixon: I'm still not sure what you are specifically concerned about. Didn't you say at one point that your hunch was 50,000 fatalities? That's well within the confidence interval of the study. A clash between an estimate of 50,000 and one of "101,000 plus or minus 93,000" is a tempest in a teapot indeed. If you were arguing that the number was 0 or below, you'd have a disagreement.

Frankly, it would be a lot easier to argue that the study was biased low, due to their clearly described elimination of Falluja from the primary result, specifically because it was very high. Your description of the bias problem that you see becomes suspiciously vague when it comes to the exact details of how it biases the results towards higher numbers, to the point that most here seem to feel it's more a factor of increasing the sampling error rather than bias. However, the confidence intervals in this kind of study typically are considered with a grain of salt, because the distribution seldom fits the standard curve perfectly.

As for the sampling method not fitting an established textbook procedure, that's not really an issue. Fitting a textbook cookbook procedure is nice, particularly for readers who can't follow complicated theoretical arguments (myself included), but lots of studies don't fit a textbook procedure (or they would have already been done years ago). The question is, given the methodology they used, does their logic follow? and it seems to do so for most of us; as I stated, your description of the hole in their logic gets vaguer and vaguer as one tries to pin down exactly how it affects the results regarding bias, rather than precision. It's not that we're clinging to a belief; it's that we've considered your arguments, given quite a bit of consideration, and they have failed to prove your point. Period. Even if you are right, it's Not Our Fault that you have failed to convince, us, we haven't brushed you off or failed to listen. What more do you want?

Of course, in the Big Picture, aside from your own hunch not really contradicting the findings of the paper in a meaningful way, it's a bit bizarre for somebody to be accusing the paper of bias, based on their own hunch derived from nothing more than a priori bias, by definition.

z,

The dishonesty rolls on. Arbitrary has many meanings, and of course you only picked one of them to suit your argument, instead of the one that applies to this situation. I will use your excerpt to demonstrate which definition applies when it comes to the Lancet study:

arbitrary: Uncertain; random; accidental; discretionary; outside of central relevance to the methodology, law or principle, therefore accepting of individual choice and subjectivity.

You know that the choice of the provinces was not random and thus that the definition of arbitrary that was applicable was that of a subjective and unsupported decision.

Yes z, 50,000 is within the range of the study. So is 9,000, so is 194,000. Is that supposed to be an argument? You have just shown yourself how meaningless the conclusion of this study is. In other words, they say 100,000 but if anyone else says something higher or lower, you can just chime in with, "oh, well that is within the range of the study so it is correct."

In other words, if I did a poll in the USA between Kerry and Bush before the election, the result being 60% Kerry and 40% Bush, with a range between 1 and 99%, then you would just say that my poll was right no matter what the outcome. You know, since any number would virtually be within the poll's range.

eudoxis tried to explain to you guys how there was bias introduced to this study, but you wouldn't listen to him, either.

Normally if a province is not sampled, by chance, then you just widen the variance of the result because you didn't get to sample that area.

What they did was take two provinces, oversample one and not sample the other, but then project the result across both of them as one entity without even knowing anything tangible about their similarity.

That is hardly sound methodology, and the frivolous spin to get away from this, like Kevin's "oh, well they invented something new" or now z's "oh, well, they were creatively going outside the box" is just laughable.

There is a reason this type of thing is not done, and it is because it is not sound methodology. They compromised the whole study for convenience, which goes to say that perhaps they should have waited and done the study at a later time when they would have been more able to do so.

But no, they had a presidential election to influence. Mystically enough, the Lancet website lied about the results of the study and got the headlines they wanted the Friday before the election.

Gee, how convenient.

>What they did was take two provinces, oversample one and not sample the other, but then project the result across both of them as one entity without even knowing anything tangible about their similarity.

that's your interpretation.

Now I know that you believe you possess the ability to know with absolute certainty others' thoughts and motives even bett than they do, but have you ever considered writing to Reoberts and asking:

a. other ecamples of studies where this methodology was employed and
b. the basis on which the pairign was undertaken?

Or are you afraid the International Left-wing Journalist Conspiracy to Suppress the Truth will send hit-men after you?

Try not to start out: "Dear scumbag, I bet you wish Saddam was still in power..."

By Ian Gould (not verified) on 15 Oct 2005 #permalink

Actually, Seixon, your last formulation in post 156 I wouldn't quarrel with. They did take paired provinces which they thought were similar (but maybe they were wrong), and then they used a random procedure to pick between them and the winner got all the clusters. They then let that winning province stand for both of them. The expected value is unchanged, so there's no bias in the technical sense, but if the provinces happened to be very different the result would be too high or too low. Doing this with several provinces (six pairs? I forget), sometimes you'd expect too high and sometimes too low a result. As far as I can tell, everyone in this thread understands this.

One thing I think you said earlier I might have done if I'd written the paper--they could have just used their results to stand for the provinces actually sampled. I'm not going to check, but if the unsampled provinces amounted to maybe 20 percent of the population than the midrange death toll for the sampled portion would have been around 80,000, and the violent death toll (midrange) a little under 50,000. The impact of the paper would have been about the same (it had none, politically, because the US press couldn't have cared less), They could have said that assuming the unsampled provinces on the whole weren't that different from the sampled ones (and that's a reasonable thing to assume) the overall death rate (midrange) was 98,000.

Their procedure wasn't ideal and someone might argue they should have presented their results the way I just did, but I think the overall picture would stay the same, unless someone could show that the unsampled provinces were so different from the sampled ones that it drastically lowered the overall excess mortality. That would imply that the invasion drastically lowered the mortality in the unsampled places, unlike what happened in the sampled regions where mortality went way up. That seems unlikely. And remember, they actually threw out Anbar Province, where it is very very likely mortality went up drastically.

BTW, I don't agree with your post about who had the incentive to lie to the survey team, but will let that argument die, as it wasn't that important.

By Donald Johnson (not verified) on 15 Oct 2005 #permalink

I looked up the paper and the unsampled provinces had a total population of about 6 million, or about 25 percent. Actually, on second thought I take back what I said in the previous post, since if you only looked at the sampled provinces you've now got to take into account that you've oversampled some and I guess the arithmetic involved in getting a death toll is a little more complicated than simply multiplying by the Lancet number by .75. But if I do that anyway as a crude estimate than you have about 75,000 excess deaths, 45,000 of which are violent, still pretty shocking unless you think the unsampled provinces had a huge decrease in mortality to balance it out. And since the choice was random, there's no reason to think the unsampled provinces as a whole were much different from the sampled ones (i.e., the process of selection was unbiased) and so as a hypothetical Lancet author I'm back giving the 98,000 figure for the whole country.

BTW, by the roll of the dice Anbar Province lost out in only getting one cluster--its population was 1.26 million and a cluster represents .739 million, so 2 clusters would have been closer to what it deserved. Basrah with 1.33 million was originally going to get 2 and Misnah with .639 was going to get 1. But by the roll of the dice Misnah got 3, Anbar 1, and Basrah nothing. But I'm not complaining--this was all fairly done. The only part of the whole process that does seem unfair came when Anbar's cluster got thrown out for being too violent.

Anyway, Seixon, I come back to what I said earlier--you'd have done better to have stuck to trying to make a case that by sheer bad luck the Lancet authors picked places that were above-average in violence. I think you'd have some trouble making this case when Anbar got thrown out, as I keep saying, and when Sadr City's one cluster was in an unscathed neighborhood, but it probably would have been a better thread than the one we had. Not that I'm complaining--I learned a little bit about statistics.

I should have stuck to my original resolution to remain here solely as a lurker--I'm only posting this to correct my mistake in the preceding post. This thread is starting to look like another quagmire, much like Iraq, and I'm having trouble getting out.

By Donald Johnson (not verified) on 15 Oct 2005 #permalink

Ian,

You see, I thought I'd come here to try and solicit the information you tell me to get from Roberts. I thought that the most defiant Lancet-apologists should be able to come up with this, yet I have found that this was not quite correct. The aim was to not have to contact Roberts and waste his time if I was simply just overlooking something simple.

After a few weeks on this blog, I see that perhaps I have good reason to write Roberts and ask him just what in the hell he was doing with this study.

I see there is still stunning silence regarding the Lancet lying on their website.

Donald,

Yes, clearly it would have been much more honest to have just projected their numbers to the areas that were in fact sampled, although I'm not sure if the result would be 75,000 or not. There were other ways for the JHU team to have designed their study to accomplish much the same as they eventually did, although they would most likely have increased the Design Effect even more, and move the CI even wider.

Their manhandling of the sample led to a false sense of security in the result, as the resultant sample is more similar to a cluster survey conducted with far fewer clusters than 33, such as 11 or 15.

Whether their result was too high or too low is anyone's guess. That's why I say that the way they did things is a problem. It was also biased to arbitrarily pick out some provinces to be paired up, while not doing it with others, according to apparently entirely subjective reasons.

The study design ensured that 6 provinces were excluded. Which provinces were excluded was not entirely random, as 5 of the provinces were saved from this possibility.

An unbiased survey would not have done this. An unbiased survey does not ensure that particular areas of population are to be excluded while others are protected.

In other words, the possibility of Missan OR Basra being excluded was 100% (repeat 5x for the other pairs). In an unbiased sample, this would not have been the case.

And Donald, it was not unfair to exclude Anbar in the end, because it was an outlier. In any case, Anbar was sampled. Nobody would have taken them seriously if they came up with 250,000+ as their number, and they knew that.

Much like they knew that 100,000 could at least be passed off realistically, and why the Lancet website lied to get this figure out there.

Well Seixon, as a hypothetical Lancet author I might have presented the data both the way they did present it, and maybe also in the more limited way as in--here's the death toll we calculate for the provinces in Iraq that were actually sampled. I don't think you can get the number as simply as I did (multiplying by .75), but that's probably not far off. What would happen to the confidence interval I couldn't guess.

But it wasn't unfair to do what they did--that's where we disagree. (You vs. everyone else here). Pairing provinces wasn't the ideal way to sample, though I don't blame them given the circumstances. Most of the two massive threads produced on this subject have been around your use of the term "bias"--everyone agrees that pairing provinces with differing death rates adds to the uncertainty in the number for the reasons hashed out endlessly before.

By Donald Johnson (not verified) on 17 Oct 2005 #permalink

I think eudoxis has helped me establish that at least the grouping process introduced a bias. The distribution of the clusters after the groupings he sees as unproblematic and unbiased, but the actual pairing up of provinces is a source of bias.

So if the distribution of the clusters within the pairs is not bias, then it must be something else, because it certainly is not legitimate.

There is no support for clustering clusters and I think the lack of evidence to show otherwise confirms this. Whether clustering clusters introduced a bias, or was just sampling error or something else, it is clearly something wrong.

If the people at this blog were actually interested in getting to the truth, then this would have been resolved a long time ago. Instead of just admitting that there's a problem, and that perhaps I have framed the problem in the wrong terms, the problem is covered up and denied.

I'm tempted to write up a generic study design based on what the Lancet study did and then get several statisticians or professors to comment on it. That might be the only way to get any honest unbiased answers.

I have now come to the following conclusion:

I. The grouping process introduced a selection bias as explained by eudoxis and I.

II. The resultant sample was not random due to the distribution of clusters within pairs, as explained most succinctly by nikolai:

"A random sample isn't one in which each unit (in this case cluster) has an equal probability of being selected. A random sample is one in which each possible combination of n units (in this case clusters) has an equal probability of being selected. The change in wording ensures that the chance of each unit being selected is independent of the chance of any other unit being selected."

What you haven't shown is that the grouped provinces were actually significantly different (in fact, other commentors have given decent reason to believe that they weren't). In other words, you've spent christ knows how many words on minor departures from the Platonic Form of Randomness without even considering whether it makes a hair of empirical difference.

Given how much worse things got in the clusters sampled, the ommitted governorates would have to have seen very substantial improvements indeed to reverse the survey's conclusions. We can say right off the bat that Najaf and Ramadi (both cities in ommitted governorates) aren't going to help you very much here.

dsquared,

BruceR's new tabulations showed that I was correct, that in fact 5 out of the 6 pairs were not similar at all which was even worse than my initial finding of 4/6. To avoid having to admit this, BruceR said that the death rate across Iraq was 28 per million, while the death rate in the surveyed provinces (excluding Anbar) was 27 per million. That is nice, except for the following facts about the paired provinces: the death rate for the 6 chosen provinces was 23 per million, the death rate for the 6 excluded provinces was 13 per million.

As they would say in Norwegian: Uff Da.

Of course, this is just something that compounds the two flaws I summarized, which you summarily ignored commenting on.

Seixon,

1. The word "bias" means something in statistics. Stop misusing it.

2. Systematic equal-step sampling does not fit the definition of random sampling that you offered. Where is your condemnation of this procedure? Heck, where is your condemnation of the UNDP survey? I mean, except for Baghdad, they had the same number of clusters in each governorate, so each hosehold did not have the same chance of being sampled.

Lambert,

The UNDP study weighted the results from each governate afterwards and they sampled each governate independently. The clusters they used, the PSUs, were of varying size within each governate and chosen via PPS within each governate. Thus there is no reason to attack that methodology at all as it is completely statistically sound, showing that your simplistic and shallow representation of it leaves much to be desired.

SESS does fit that definition, since it distributes each cluster independently and the procedure ensures a normal distribution of the clusters. On top of that, SESS is an academically supported and documented procedure, which is statistically sound.

Yes, bias does have a special meaning in statistics, and selecting some provinces to have their variances changed while leaving out others, is selection bias. Thanks again for stopping by.

You are just continuing the worst kind of intellectual dishonesty that has become a staple of your Argument from Authority to masochistically defend this study.

Seixon writes: SESS ... distributes each cluster independently and the procedure ensures a normal distribution of the clusters.

Evidently "independence" has a different meaning in Seixonian statistics, just as "bias" does. One random number determines where 33 clusters go; that's independence? And what on earth is meant by ensuring a normal distribution of the clusters?

We have a serious communication problem here, as this comment of Seixon's on his own blog indicates:

This appears to violate something called the "ekvivalenssetning" in my statistics book....

Well, for all I know maybe Roberts et al. have flouted the ekvivalenssetning principle, but as yet we haven't been given any reason to suppose that matters.

By Kevin Donoghue (not verified) on 18 Oct 2005 #permalink

Kevin,

I find it hilarious that you guys are so off the rails now that I have to actually defend a procedure that is statistically sound and academically documented, as evidenced by the CDC.

Yes, independent, as in that each cluster gets distributed on its own, that it isn't grouped with any other cluster. Capisce?

A normal distribution of the clusters? You know, like each province getting a number of clusters proportionate to their population and proportionate to their chance of getting clusters with a normal distribution.

Of course it doesn't matter to you that they violated the principle which I have summarized in point II here. Nothing matters, because the study is robust and credible! What I say has never mattered, nor does any objective look at statistics, it seems.

If I didn't already make it clear, I was in error in describing the distribution between pairs as biased. It was just not done randomly or in step with any statistically supported methodology.

If you guys want to unearth methodology that says it is OK to not distribute sampling units independently, or that changing cluster sizes twice during the process of sampling is OK, then please document it.

The selection bias of choosing those 12 provinces still stands though, and I will be developing a write-up about this, which it seems I have neglected to elaborate on so far.

I find it hilarious that you guys are so off the rails now that I have to actually defend a procedure that is statistically sound and academically documented....

You don't have to defend SESS. What you were asked to explain was why you got so excited about the fact that pairing governorates changes the probability of Missan getting zero clusters. The switch from SRS (the method you originally thought was in use for the first round) to SESS (the method actually used) also changes that probability. You were being inconsistent. Yes, it's hilarious, but the guy wearing the clown-pants is Seixon. The rest of us have no major problem with SRS, SESS or the pairing method used by Roberts et al., for reasons which have nothing to do with whether they are "documented". The Lancet study is itself a perfectly respectable document.

Yes, independent, as in that each cluster gets distributed on its own, that it isn't grouped with any other cluster.

No, not every cluster "gets distributed on its own"; Baghdad gets at least six, come hell or high water. Those six go together which means each one does not "get distributed on its own".

BTW, you need to learn what a normal distribution is. Detached Observer explained it to you on your own blog, but evidently he might as well have been talking to the wall.

If I didn't already make it clear, I was in error in describing the distribution between pairs as biased.

No, you didn't make it clear and yes, you certainly were in error. That's why Tim Lambert described you as innumerate. He was right, but fortunately the condition is curable.

The selection bias of choosing those 12 provinces still stands though, and I will be developing a write-up about this, which it seems I have neglected to elaborate on so far.

If at long last you have a serious point to make it would, of course, be interesting to see it.

By Kevin Donoghue (not verified) on 18 Oct 2005 #permalink

I didn't closely follow the argument you had with Bruce R, Seixon. Are you talking about American combat deaths per province as a proxy for political violence?

Anyway, supposing you're right, I'd still say that the vagaries of sampling seem to have gone both ways. Again, Sadr City, one would expect, would be a place where you'd expect a cluster to show a lot of deaths, and the one they picked had no violent deaths. Fallujah had the opposite problem--as they say, probably the most violent city in Iraq at the time and they end up with a neighborhood apparently hit so hard they had to exclude it. They'd have gotten a higher death rate for their results if they'd found a Fallujah neighborhood where only 5-10 died violently, which would still be horrendous, but maybe wouldn't have led to them tossing it out as an outlier. And anyway, as I said and dsquared said, the 25 percent of Iraq that wasn't sampled would have had to have had a pretty dramatic decrease in death rate to cancel out what was found in the portion they did hit.

I'm trying to avoid getting sucked too deep into the argument you're having about the process of choosing samples, but it just isn't a bias to pick out some provinces and guarantee clusters for them while holding a lottery to determine which of the remaining provinces get clusters. If it worked out as you say, with 5 of the 6 lotteries going to the more violent province, that would skew the results upwards, but that's a random sampling error like getting a Sadr City neighborhood with no violent deaths or a Fallujah neighborhood with so many deaths that it had to be excluded. (Though, as they also imply, it could be places like that Fallujah neighborhood which are producing a large fraction of the casualties.) Those events skewed things downward. As a rough estimate the Lancet paper is okay and we always knew it was a very rough estimate. Though I do think it'd have been useful for the paper to have contained an estimate strictly for the provinces sampled.

By Donald Johnson (not verified) on 18 Oct 2005 #permalink

Kevin,

I know perfectly well what a "normal distribution" is in statistics, in fact, I employed one in handing Lambert his ass on his incredibly dishonest plot graph in creating my own more realistic plot graph. I'm sorry my wording threw you off in talking about the distribution of clusters across the provinces as I was not employing the statistical jargon of "normal distribution" in that instance. Perhaps if you would learn to comprehend the meaning of the words in front of you in the context presented, then it would not be a problem. Especially if you weren't constantly looking for angles to "get" me.

SESS emulates what one would expect from carrying out an SRS experiment many times. Thus, if you distributed the clusters with SRS a 1,000 times, you would most likely get the result that was attained using SESS. This is to ensure that the sampling of clusters does not get thrown off by a freak selection.

All of this is a red herring, since the Lancet method of doling out clusters in the grouping process was neither SRS nor SESS nor any other statistically sound method. When are you guys going to admit that?

Yes, Baghdad will get 6 clusters no matter what. That still does not mean that the 6 clusters are grouped together. They are still doled out one by one after adding the sampling interval. If you want to keep holding to this, then I can just point to the fact that the Lancet method grouped ALL of the clusters together, and not a few here and a few there. The Lancet method does not emulate a SRS process, either as the results of the Lancet method will never be similar to that of an SRS experiment repeated ad naseum.

Don't let that stop you though. Apples are oranges, apparently, since they are both fruits. The citric attributes of oranges are irrelevant, apparently, as are the texture attributes of an apple. It's all the same because it has to be the same in order for you to look a winner.

I first took on this thing with the distribution of clusters between pairs as a non-random process. In an unwise move, I was harassed into thinking this was incorrect and thus led me to believe that the process was biased. It was not. My first hypothesis was correct which nikolai and my statistics book have duly confirmed to me. An innumerate is someone who is unfamiliar with math concepts and methods. I am obviously familiar with all of these concepts, I was just swayed into changing my argument based on harassment, which I should have not done.

The only bias that happened in the Lancet study sampling was when they chose 12 provinces to undergo the grouping process, and I can mathematically prove this. I have written it all out on a notepad in front of me and I am contemplating sending it to Lambert via e-mail to make sure that I have not made any mistakes in my math.

So, will you relieve us of the red herring that Lambert slapped down, which you snapped up to use?

The Lancet method isn't SESS, it isn't SRS, it doesn't emulate anything close to these statistically sound and documented methods.

To deal with this argument, you will either have to show:

a. That it is statistically sound to change the cluster size in a cluster sample twice during the sampling process or;

b. That it is statistically sound to violate the principle of a random sample by grouping together all the sampling units and not distributing these independently to receive a result that does not emulate an SRS sampling.

This is only one of my points. The other is that the study introduced a selection bias by choosing 12 provinces to go through the grouping process. I have written up a mathematical proof of this, and will either post this on my blog soon, or will send it to Lambert in order to rectify any errors made before publication.

Donald,

The issue BruceR and I discussed was how many violent Coalition deaths transpired within each governate before Lancet did their study. This was because the JHU team "believed" that the governates they paired up were similar in violence, while offering no reference or evidence of this. I made the initial error of looking at all Coalition deaths, including non-violent, and came to the conclusion that 4 out of the 6 pairings were indeed not similar according to this indicator.

The reason for using this indicator was that the level of violence against Coalition forces would be indicative of the violence against Iraqis. In the absence of any other indicator, this one was apparently the only available and defensible one to use. Now, BruceR caught me on my error of including non-violent deaths. He also took some issue with the time span I used which went past the time they carried out their survey.

So he retabulated and found the violent death rates of Coalition forces up until the time they carried out their survey. In doing this, BruceR actually strengthened my findings. His new tabulation shows that 5 out of the 6 pairings were mildly to wildly not similar according to this indicator. Thus absent any substantiation by the JHU team of similarity between the provinces, I have to conclude that their belief was wrong and thus the pairings were not credible.

Donald, it is a bias to choose out some provinces to undergo a new process, while others not. You are correct that the lotteries within the pairs was not biased, I have now erased this allegation on my part and gone back to my original stance that these lotteries just were not statistically sound and in step with the principle of random sampling. The bias was in the selection of some and not all of the provinces to be paired up. I have done some math on this and will show it soon from which I think it will become very clear.

I have now written up an analysis of the selection bias in the study and sent it to Tim Lambert via email. I hope he will be responsible and honest enough to read through it and make any corrections he finds necessary. I hope he can do this by tomorrow evening, at which point I will be posting it on my blog in the case Lambert does not find any errors. At the same time, I will possibly send it to Les Roberts and other interested parties for comment.

I'm a little confused about the coalition death rate numbers, Seixon. BTW, that's not sarcasm--I'm a little confused. Maybe I should track down the posts where you and Bruce R talked about this, but I'm going to bed soon and don't feel like websurfing. Anyway, if Bruce R is right that the coalition death toll across all of Iraq was 28 per million (presumably in Sept. 2004) and in the surveyed area it was 27 per million (excluding Anbar), then if you use American deaths by violence as a rough estimate of violence levels, it sounds like the Lancet sample picked out a fairly representative portion of Iraq. But then you point out that the provinces that were picked from the 6 pairs had an average death rate of 23 per million vs. 13 per million in the excluded provinces. So yeah, for those provinces it sounds like (by chance) they picked out the more violent provinces in the paired areas, but somehow overall they ended up with provinces that, excluding Anbar, were very slightly less violent than Iraq as a whole.

So since Anbar was excluded, I'm guessing that's responsible for the fact that the average level of violence (using this troop death number as a guide) in the sample was a tiny bit less than average--27 rather than 28. So it does sound like things evened out.

Just for the sake of working things out, I just pulled out my Lancet paper and a calculator and wrote down the numbers you gave.
Not that one should take the troop deaths too literally as a guide, but I took your numbers and the populations of the provinces in the Lancet paper and came up with the following. They're not exact--I rounded and you presumably rounded when you gave me those statistics about death rates. Also, it's late and I should be in bed. Anyway,

Total population of Iraq = 24.4 million.

Troop deaths = 28 per million or 28 * 24.4 = 683

Total population of the 6 provinces that were chosen = 7.8 million
Population of rejected 6 provinces = 5.6 million

(You'd expect the rejected ones to total less, given how the lottery worked).

Since troop death rates in the 6 chosen were 23 per million
Troop deaths in 6 chosen provinces = 7.8 * 23 = 180

Troop deaths in 6 rejected provinces = 5.6 * 13 = 73

Total troop deaths in the 6 pairs = 253.

Now notice that if you used the chosen province death rates and extrapolated to include the rejected provinces, you'd get

23 * (7.8 + 5.6) = 308.

In other words, by pairing provinces you'd overestimate the total American death rate by 308 - 253, or 55 deaths. Significant, but not exactly earthshattering. And according to you, apparently the overall death rate if Anbar is excluded is slightly less than 28, so using the provinces the Lancet team ended up using you'd end up with a total American death toll that's slightly less than the true value.

If I'm understanding you right, you've raised an interesting issue, but Bruce R's argument is valid. Probably by excluding Anbar the Lancet team cancelled out the (probably small) distortion they got from selecting provinces that were a bit more violent than the ones not sampled.

On the other hand, if I misunderstood your numbers I just wasted 15 minutes and a sheet of paper. I'm going to bed.

By Donald Johnson (not verified) on 18 Oct 2005 #permalink

Obviously the exclusion of Anbar threw things off, and yes it would appear that the sample that the Lancet did end up getting would have almost been representative of Iraq according to that indicator.

That still does not resolve the fact that they paired up provinces which were not similar and the other methods they used to get this sample.

Instead of being a good samaritan, Lambert has just responded to my numbers with the usual arrogance that has tainted his whole response to this. While I was dozing off to sleep last night, I came up with one figure that I forgot to include, although it would not make any real difference in the final figures I came up with. I wish Lambert could have answered me more concretely rather than just, again, brushing me off with arrogance. He did give me a little hint, so I will follow up on that. I spent about an hour writing that analysis up last night, so as I said, there are good chances I made some errors. I already came up with one myself afterwards. I guess I might have to release some of the numbers on here so that less arrogant souls can show me if/where I am going wrong.

I have never claimed to be a statistical genius, but making an error here and an error there does not make me an innumerate, Lambert. I could call you an innumerate for claiming that clustering clusters is the same as multistage clustering, since this is obviously wrong. However, I know you were just being intellectually dishonest so I will let that one pass since no one else here cares...

Seixon, I have explained things to you concretely. That's what this post did. Your response was to falsely accuse me of dishonesty. I explained clustering to you concretely, with links to web sources giving extensive details. Your response was to falsely accuse me of dishonesty.

I gave you my honest evaluation of your work like I always do. Why should I go into the details of all the many mistakes when folks have already calculated it for you in comments on your blog? You'll just ignore the answer because you don't like it.

Lambert,

Your cluster graph in this post is dishonest because it does not emulate a real word situation. The measured mortality rate from a sample of an area will not vary as much as 1-6 or 1-13 with each of these having the same probability of being the sample result. That is preposterous. I already created a new plot graph showing what a real world scenario would look like using normal distributions for the 1-6 and 1-13 ranges which you did not do. The resulting graph was much different than yours and pointed out obvious problems.

You have not linked to anything that says clustering of clusters is accepted methodology. In fact, I think the only one who attempted to do so was Ragout, and his link did not reflect anything close to this at all.

You cannot start a study with cluster size of 30 households across Iraq, only to then change it to 90 when you arbitrarily pair up provinces (pairings which should have been done BEFORE the initial distribution to follow accepted methodology) and then when you randomly pick one of the provinces in a pair, change the cluster size back to 30!!

I know perfectly well what cluster sampling entails, and what the Lancet study did in the "grouping process" is not consistent with cluster sampling at all. Your insistence on not admitting this is intellectual dishonesty.

Plus, the people at my blog did not calculate the figures for what I am currently working on, and what I provided to you. At my blog, the probabilities of Missan's households getting chosen was calculated operating on the given that 3 clusters were won by the pair, and comparing this to what would have transpired with SRS.

That is not what I am calculating now, and not what I provided to you to look over. If you couldn't see the difference, then I suggest you take a harder and more honest look at what I sent you.

At my blog the following was calculated:

Basra households' probability of selection, operating on the given that there are 3 clusters within the pair:

90/190,000 = .47E-4 x .66 = .31E-4

For Missan:

90/97900 = .92E-4 x .34 = .31E-4

So as I was shown wrong that the distribution between the pair of their clusters with one random trial did not produce a biased result. That's fine, I have agreed to this.

Now, what I am calculating now, and what I sent to you, is the probability from the beginning, taking into consideration that the pair might end up with 1, 2, or 3 clusters (the only possible values).

The probability of a household in Iraq being chosen, from the initial sampling was calculated to 2.84E-4. I showed this to be true for both Missan and Dehuk, to control the result.

Then, since Missan was paired, I wanted to calculate its households' resultant probability of being chosen as a result of that process.

Obviously this is not the same as we were calculating on my blog, and your claim that it is the same thing is yet again lazy dishonesty on your part.

You said I would ignore the answer because I didn't like it? Really? I took the answers from my blog and have accepted them now. Now I am asking you to take a look at a NEW set of numbers. Yet you ignore them and claim they are the same as something else when they are not. Hypocrit?

On a side note, I found the following passage in the primer on mortality studies written by Les Roberts which I thought was worth some people looking at. The ones who believe everything published in the Lancet, or other journals, is automatically gospel:

Following the Gulf War of 1991, the destruction induced and the economic sanctions imposed resulted in concern in
some quarters for the health of the Iraqi people. A research
team from Harvard University hired and trained young, welleducated
Jordanians to collect a sample of data from within
Iraq in an attempt to measure under-five mortality. The
results, published in The Lancet, implied that hundreds of
thousands of children had died as a result of the invasion
and the disruption that followed. Post-publication, a
review of the data showed that one interviewer recorded
many, if not most, of the excess deaths. An embarrassing
retraction was requested by a Harvard review committee.

Wha? The study was published even though there was an obvious manipulation bias in it? Who woulda thought that possible...

"They compromised the whole study for convenience, which goes to say that perhaps they should have waited and done the study at a later time when they would have been more able to do so."

Well, there we get to the nitty of the gritty. All results of such studies result in a "best estimate". Is no estimate better than an imprefect estimate? Conversely, is there an impermeable barrier separating studies good enough from studies not good enough? Does it depend on the importance of the subject being studied? Is the barrier for the first and only study of the subject lower than for subsequent studies?

More precisely, if the paper had not been published because the methodology was not preapproved, would we still be subject to a constant barrage of how the war was worth it because Saddam would have killed even more Iraqis if he wasn't stopped? Would the 20,000 recorded and verified individual death estimates of people like Iraqi Body Count still be being criticized as outrageously high, rather than suddenly held up as reliable estimates of total deaths?

pretty much every study contains strong and weak assertions, robust and nonrobust conclusions, and people adjust their confidence in each accordingly. As with the Gun Debate going on elsewhere in Deltoid: "John Lott proved that More Guns = Less Crime" "No he didn't" "Well, he proved that More Guns doesn't equal More Crime". Etc. If you want to quibble with the 198,000 estimate, nobody's asserting that it's a particularly precise estimate. On the other hand, you argue with my position that the various other estimates are consistent with this one, even without requiring broadening of the CI95. Would you be arguing that this study was invalid if it resulted in an estimate of 198,000 lives saved, plus or minus 100,000, and that the estimates of 20,000 excess dead were better? Were you arguing that the 20,000 figures were correct before this study was published? Because most of those who now adhere to that figure were in fact arguing that it was terribly inflated before they came to embrace it.

As for any political motivation of the authors, perhaps; although the lead time before publishing is considerable and certainly the time taken from the start of the idea was very long. But what would you have them do? You do know that in pharmaceutical clinical trials, they will terminate the study if preliminary results look like the drug is resulting in excess morbidity or mortality. Not in keeping with scientific objectivity and the principle of establishing the terms of the study a priori, but hopefully understandable. If a researcher finds something that is killing people, do they not have a moral and ethical obligation to get that information out as soon as possible, and if that means before an imminent decision point where that data might have a greater effect, so much the more urgent?

"On a side note, I found the following passage in the primer on mortality studies written by Les Roberts which I thought was worth some people looking at."

Are you quoting Roberts' primer because you feel he hasn't got a good handle on inherent bias in methodologies?

Roberts has been doing this kind of study since 1992, including Bosnia, Congo, and Rwanda. He did this type of survey of the Congo for the International Rescue Committee three times, which were cited by Colin Powell and Tony Blair to support a UN Security Council resolution that all foreign armies leave the Congo and a grant of $140 million in aid to that country, as well as a US State Department pledge of $10 million in emergency aid. So, although it does not reference this particular survey and methodology, his general track record among pro-war officials would seem to be in his favor.

"Les [Roberts] has used, and consistently uses, the best possible methodology"
-Bradley A. Woodruff, epidemiologist at Center for Disease Control and Prevention

"The study is not perfect. But then it does not claim to be. The way forward is to duplicate the Lancet study independently, and at a larger
scale."
-The Iraqi war
Counting the casualties
Nov 4th 2004
The Economist

z,

The UNDP were already in the process of putting together a study that took mortality into consideration, although it was not a mortality study per se. What you bring up is something that is essential to the reason for why the study was published in the Lancet: no mortality study had been released yet and something was better than nothing. Despite all its errors and flaws. I'm not sure why the Lancet lied about the study's conclusion on their website, maybe you can have a stab at that.

I never suscribed to the IBC as accurate namely because at the outset, it had problems with counting the same deaths multiple times due to overlap in press reports. Beyond that, of course it will not pick up all the deaths that occur. So I never used the IBC to gauge anything because it is obviously not reliable for many reasons. That is why I didn't use it in concluding on my own what the likely death toll is, while using the UNDP study and the Iraqi government figures, plus the Icasualties site (most specifically for Iraqi police/military deaths).

Roberts is a mortality study veteran, I know. The quote you used from Woodruff is also one I am very aware of, as I quoted it in my initial blog post about the study. Roberts may have consistently used good methodology, but that is not the question here. The question here is: did he use good methodology THIS time?

I would be very interested in seeing the methodology used by Roberts on the Congo, Rwanda, and Bosnia studies. Have any links for those? I have looked and can't find them.

The excerpt I gave from Robert's primer was just to show that sometimes things are published in the Lancet without it being particularly well vetted........ Only the most daring of us speculate this to be because of political bias. The Lancet lying about the JHU study's results right before the presidential election tends to put that into the same light. But there I go again, riding the elephant in the room. Sorry.

"But there I go again, riding the elephant in the room. Sorry."

And oddly enough, last night I dreamed about an elephant in the room. No lie. But in deference to popular demand I shall refrain from speaking of it, despite the fact that no colored bits of naughty anatomy were involved.

Seixon, no matter how many times you repeat your falsehoods they remain false. The Lancet did not lie about the conclusions of the study -- you are just projecting.

Nor is it right for you to accuse me of dishonesty based solely on your profound ignorance of statistics and probability. If you had a clue about probability you would know that your claim that a normal distribution with a mean of 6 and a standard deviation of 1 is somehow realistic for the number of deaths in a cluster is just ridiculous. If the province is completely uniform and everybody has the same independent chance of dying then you get a Poisson distribution which has the nice property that its mean and variance are equal. If deaths cluster together or the rate varies within the province then the variance will be even higher. Your distribution, with a mean of 6 and a variance of 1 has the variance too low by a factor of at least 6 and is completely unrealistic. See if you can figure out what the variance was for my distribution.

Lambert,

The Lancet didn't lie? OK, let's see, this is what they had on their website from the time the study was released until the headline was replaced:

New Early online publication: 100 000 excess civilian deaths after Iraq invasion

The study itself says:

Making conservative assumptions, we think that about 100 000 excess deaths, or more have happened since the 2003 invasion of Iraq.

When
violent deaths were attributed to a faction in the conflict
or to criminal forces, no further investigation into the
death was made to respect the privacy of the family and
for the safety of the interviewers.

Many of the Iraqis reportedly killed by US forces could have been combatants.

Really Lambert? Really? Either the editors and reviewers of Lancet are illiterate or in this case, liars. I guess you'll have to choose, or try to spin your way out of this one.

You are right Lambert, using a SD of 1 for a mean of 13 (not 6) was too low. I could do the plot again, just give me a SD to use big guy. It's not going to make that much of a difference, I'm afraid. The Lancet method is still going to be wildly erratic compared to a normal cluster sampling.

I have no problem calling you dishonest, because that is exactly what you are. Instead of showing how the Lancet method is similar to ANYTHING that resembles cluster sampling, you just continue to claim it to be so and stone-wall.

You also newly claimed that the figures I am working on have already been calculated at my blog, yet another fallacy.

I'm projecting? Oh, give me a break.

"The Lancet method is still going to be wildly erratic compared to a normal cluster sampling."

We've all more or less agreed that it will be have some larger degree of error; the assertion was that it would be BIASED. In particular, biased towards higher mortalities, I assume is your objection.

Just seventeen days after Tim Lambert posts a link to a paper by Checchi and Roberts, Seixon suggests we read it. (Thanks, Seixon, I read it a fortnight ago.) The hot news item? A well-known, 10-year-old story about how a team of Harvard researchers made a mistake, which Harvard quite properly required them to correct when it came to light. Relevance to the study by Roberts et al.? None whatsoever.

Many moons ago Tim Lambert wrote regarding the sample used by Roberts et al.:

At most two (one from small arms, one adult male from bombing) of the deaths outside Falluja were combatants. In other words, 95% or more of the 100,000 excess deaths were civilians, so it is not wrong to describe the findings as "about 100,000 civilian deaths".

Now Seixon wants us to go back to arguing about whether that is true. Meanwhile, we are assured that his definitive critique will be coming up shortly, just as soon as he figures out what his problem is, now that it isn't sampling bias any more. I won't be in the least surprised if his startling conclusion is something like this:

This clumping of clusters was likely to increase the sum of the variance between mortality estimates of clusters and thus reduce the precision of the national mortality estimate.

At which point we will be able to agree that he has got something right at last.

By Kevin Donoghue (not verified) on 19 Oct 2005 #permalink

Kevin,

The point was that many people, even on this blog in commenting, have said that the study must be credible because it was published in the Lancet journal. I was just using Robert's own words to show that the Lancet is no more immune from bias than any other institution. If that isn't relevant to this discussion, where people have used the Lancet's supreme credibility as an argument, then again, I must be riding that big old elephant stomping through the room.

I don't care that Lambert said the same dishonest things many moons ago. Do you need for me to excerpt more from the actual study?? I guess you do since some people just don't want to let go:

Many of the Iraqis reportedly killed by US forces could
have been combatants.
28 of 61 killings (46%) attributed
to US forces involved men age 15-60 years, 28 (46%)
were children younger than 15 years, four (7%) were
women, and one was an elderly man. It is not clear if the
greater number of male deaths was attributable to
legitimate targeting of combatants who may have been
disproportionately male, or if this was because men are
more often in public and more likely to be exposed to
danger
. For example, seven of 12 (58%) vehicle accidentrelated
fatalities involved men between 15 and 60 years
of age.

So I've got to ask Kevin, do I believe Lambert, or my lying eyes?

Kevin, you still don't seem to have picked up on the two points I made a few posts back. I have withdrawn the stance that the distribution of clusters within pairs produced a bias. I have moved back to my original stance that this process was not according to the principle of a random sample, and as I have been saying all along, completely unacceptable statistically. Instead of facing this uncomfortable fact, Lambert projected onto SESS and cluster sampling as if they were the same thing as what the Lancet study did in the 2nd phase. Dishonest.

I am currently working out whether or not the 12 selected provinces' households had their probabilities altered from being grouped up. In other words, all the provinces' households had a 2.84E-4 probability of being chosen from the initial genuine cluster sampling.

I am now trying to see whether or not, for example, Missan's households still have a 2.84E-4 probability of being chosen after being paired up with Basra, and undergoing the exclusion process.

This is not the same thing as I was doing before. Instead of Lambert pointing out specifically where my numbers were wrong, he dishonestly claimed that the numbers were already calculated in the comments on my blog.

And now we are back to a plethora of red herrings about the Lancet study supposedly not saying what it says, because Lambert says it says something else, and so naturally the goons around here must listen to him instead of their lying eyes.....

When are you people going to wake the hell up?

You've done it Tim!

You're almost at the perpetual motion machine: Post on the Lancet, Seixon denies, someone corrects, Seixon thrashes, lather, rinse, repeat.

Now if you can just turn all these bytes into work...

D

Now Dano, if only you cared that Lambert is saying that the Lancet study said something different than what it said! Now that would be amazing.

I have discovered one error that Lambert was hinting at, but I will have to go over this more tomorrow as it is getting late.

I'm dying to hear how the Lancet study didn't really mean that they didn't know how many of the dead were combatants, while Lambert claims it could only have been two.....

Colour me old-fashioned, but I think that before you claim that the sampling is biased you should have actual evidence that it is. Seixon's been jumping up and down screaming bias for a month now and he still hasn't figured out how to calculate the probabilities. Here's another hint for you Seixon: if pairing doesn't introduce bias when there are three clusters allocated to a pair, do you think it will bias things when there are two clusters in a pair.
Oh, and the 100,000 number was after excluding Falluja, so the relevant question is the number of possible combatant deaths outside Falluja. There were at most two.

"I have withdrawn the stance that the distribution of clusters within pairs produced a bias."

Missed that part.

"I have moved back to my original stance that this process was not according to the principle of a random sample, and as I have been saying all along, completely unacceptable statistically."

Well, if it's not random then it's biased, and if it's not biased, it's random. But never mind; I think what you're saying is that the margin of error is unacceptably large. It should be possible to calculate what the final standard error would be, I imagine, by brute force; add the additional variance from the additional sampling of the 6 pairs to the variance from the straightforward clustering, then use that as the error in an ANOVA. Although I can remember the general procedure from classes where we had to handle nonstandard ANOVAs in such a fashion, I'm nowhere near inclined to do it, even if I could still manage the details.

Seixon writes: I am currently working out whether or not the 12 selected provinces' households had their probabilities altered from being grouped up.

Then he writes: This is not the same thing as I was doing before.

Actually Seixon, it is the same thing and if you do it right you will get exactly the answer which numerous people, here and on your own blog, have calculated for you. Let P1 be the probability that, without any pairing, a Missan household gets sampled; let P2 be the probability with pairing; show P1=P2.

What I think you are getting at is this: previously you thought of the number of Basrah-Missan clusters as a constant 3, but you are now concerned with the case where the initial allocation has not yet been carried out, so that the number of clusters at stake is a random variable. The answer is still P1=P2, for reasons which have already been explained to you.

By Kevin Donoghue (not verified) on 19 Oct 2005 #permalink

Lambert et al,

I have recalculated and it seems that the probability is still the same. Thanks to only myself, I now know this to be the case. One problem I immediately see with this is the following:

By definition, a sample of size n is random if the probability of selecting the sample is the same as the probability of selecting every other sample of size n.

Unfortunately, I have not determined that every sample of size n has the same probability of being selected. I have only determined that each household, on its own, has the same probability of being selected.

Right off the top of my head, with the Lancet study design, it is impossible, 0% probability, of choosing a sample where Basra and Missan are both in the sample. Does this not violate that principle?

I think this is more what eudoxis was hinting at, and also what nikolai was saying. Thus, a bias can still be present if it violates that principle, which I'm not certain is mathematically quantifiable.

Also, I am more interested in the bootstrapping aspect of the study, as they say:

As a check, we also used
bootstrapping to obtain a non-parametric confidence
interval under the assumption that the clusters were
exchangeable.17 The confidence intervals reported are
those obtained by bootstrapping.

Bootstrapping requires a random sample, so if the violation of the principle of above is true, then bootstrapping is useless. Also, they worked under the assumption that the clusters were exchangeable. I have already shown this to be false using coalition death rates and I feel tempted to find any relevant indicators in the UNDP study to corroborate it.

Lambert, there were 13 violent deaths of males between 15 and 59 years of age outside of Fallujah. No where in the study does it say that only two of them could have been combatants. If you believe the study to have said so, please print the relevant excerpt.

"Thanks to only myself, I now know this to be the case."

I think we have now established the cause of the inflammation.
Here's a prescription.
1 lb. beefsteak, with 1 pt. bitter beer every 6 hours.
1 ten-mile walk every morning.
1 bed at 11 sharp every night.
And don't stuff up your head with things you don't understand.

Just leave a banker's draft for 50 guineas with Penelope on your way out.

Cheers.

Nabakov,

Did you even read what I wrote? The probability of a sample of 30 households containing households from both Basra and Missan is 0%. The probability of a sample of 30 households being from Basra is what, 30 x 3E-4 = 9E-3? This violates the definition of a random sample. Yes/No?

Using bootstrapping only works if you are operating with a random sample, and the assumption that the clusters were exchangeable was not substantiated.

I swear, commenting on this blog is like a game of hide-and-seek with the elephant somewhere in the room. Lambert and his friends giggle every time I open up a cupboard and don't find the elephant, even though the elephant is somewhere to be found. In the process, they keep making false statements to distract me and make me look places where the elephant can't be found

In the mean time, I found Robert's studies (I & II) in Congo. He did not use this "grouping process" in those studies. In fact, the study is almost entirely similar to the Iraq one, except for not using the bootstrap method, and not using cluster-clumping/grouping process.

With that said, he worked under the assumption that two provinces in Eastern Congo experienced similar mortality rates, and only sampled one of them. The one province was excluded before sampling began, though. The province was much more likely to have been similar to the other than the various Iraq pairings, as well. Also, no where in these studies does Roberts claim that the sample "remained a random national sample" or anything similar as he did in the Iraq study.

I will now be looking for indicators in the UNDP study to use to compare the Iraqi provinces that were paired up.

Seixon, Lambert has a post on December 9 2004 that tells where the figure of two possible insurgents comes from. One of the Lancet skeptics (Mike Harwood) emailed Les Roberts and asked him something that the rest of us were wondering about--what was the breakdown of the non-Fallujah deaths? You could figure out a lot from the paper, but not everything. Roberts replied and if you look at the reply you see where Tim got his number.

Short answer--21 violent deaths outside Fallujah, 9 from the coalition forces. Four of these deaths were male. Of the three shooting deaths, one was the right age to be an insurgent. Of the other two shot, one was too old and one was a guard shot by accident. One bombing death was of the right age to be an insurgent.

It crossed my mind that, as I explained earlier and as the Lancet paper mentions, respondents might lie about having an insurgent in the family, dead or alive, as it might attract unwanted night-time visitors if you went around telling perfect strangers about how your male son of military age was killed by the Americans. People might wonder why, you know, and might pay a visit to find out if you have other military age males in the household who might know something. So one could speculate that some of the people supposedly killed by violent crime (I think 7) were actually insurgent deaths reported as murder victims. You might be more open to that speculation since it doesn't increase the absolute number, but would merely shift the deaths from one category to another. (Fewer innocent victims and more insurgents in the bodycount). A coverup as a reaction to an interviewer's question (which would mean an undercount of the death toll) is also possible. But we know independently that violent crime is killing many thousands in Iraq, so the Lancet number of 7 deaths in their sample is reasonable.

By Donald Johnson (not verified) on 20 Oct 2005 #permalink

Seixon, thanks for the DR Congo links. I admit I do giggle when you open a cupboard and find no elephant inside, but that's because I'm pretty sure there is no elephant on the premises.

By Kevin Donoghue (not verified) on 20 Oct 2005 #permalink

Donald,

So we are operating under the assumption that any male Iraqi that is not killed by coalition forces is automatically not an insurgent? Also, are we operating under the assumption that all non-violent deaths of males were also not insurgents?

Please, do tell.

13 violent male deaths. 2 coalition inflicted, 11 non-coalition inflicted. All 11 of those were not insurgents? All 7 of the adult males in accidents were not insurgents? All 5 of the adult males who suffered heart attacks or strokes were not insurgents?

How does this jibe with this from the study:

When deaths occurred, the date, cause, and
circumstances of violent deaths were recorded. When
violent deaths were attributed to a faction in the conflict
or to criminal forces, no further investigation into the
death was made to respect the privacy of the family and
for the safety of the interviewers.

In other words, any number of those 11 could have been insurgents because they failed to ask whether or not they were out of safety reasons.

Quite notable that every single death in Fallujah was described as being a result of bombing and none were from small arms fire, even though the US invaded and wrapped up the whole city at least twice.

In summary, any of those 11 "non-coalition" deaths could in fact have been coalition deaths, and otherwise combatants because they did not press on about the circumstances.

The non-Fallujah bombing statistics indicates this is correct since only one man was bombed, while 4 children and one woman were. Compare that to the Fallujah stats, and the general stats (46% males, 46% children, 7% women, 1% elderly) and you see that those 11 "non-coalition" deaths are not quite what you make them out to be.

So the probability of a sample size n of Iraqi households is equally likely to contain households from Basra and Missan as any other sample in Iraq? 0% vs...?

Anyone want to take on that aspect?

Bootstrapping would also not alleviate any problems arising from pairing the provinces and sampling only one of them, as far as I can understand from reading this.

Any comments on that?

Seixon, I'll break it down into three categories.

Males killed by criminals--Statistically, I think it'd be unlikely that these guys would be insurgents, because most males in Iraq aren't insurgents and I'm also guessing that criminals pick on a random sample of the male population. If anything, they might tend to avoid well-armed males who are insurgents. (I'm leaving aside the possibility of lying about who killed them, which could skew things in various ways.)

Males killed by accidents, heart attacks, etc...--I think Tim (and I) were only talking about the violent deaths, but sure, it's possible that some of these others were insurgents. Again, though, think of the odds. Most people in Iraq aren't insurgents and heart attacks don't single out insurgents (who might be younger and healthier than the average Iraqi male), so if you list, say, 10 heart attack deaths (a number I invented), it's possible, but statistically unlikely that any of these were insurgents, unless insurgents make up 10 percent or more of the male population.

Males killed by insurgents--I don't think there were that many in the Lancet study, but if there is any insurgent on insurgent violence (and there has been, with shooting between insurgents who want to focus on the occupiers and insurgents who want to kill civilians), then some of these might be insurgents. I'd still guess it's statistically more likely that these are cvilians either deliberately killed or caught in the crossfire, but because of internecine fighting among insurgents this group has the strongest possibility of containing insurgents among its number. (As usual, assuming truthful responses.)

On Fallujah, you're making my own favorite point for me--the Lancet cluster there showed a huge death toll from aerial attacks and it makes one wonder what went on there when Western reporters weren't around. You have air support when ground troops go in, so you'd expect deaths that way back in the spring 2004 assault. If they did things the way I read they did them in Vietnam, you might have helicopters or planes taking out homes where fire came from--that's safer than sending infantry in to root them out, if you ignore the risk to civilians. But the largest single death toll in the Fallujah cluster occurred in August 2004. The study ended in September 2004 and the final assault occurred in November, so you don't expect to see deaths from that one. I know from a NYT report that the US was bombing Fallujah between the first and second assaults, and this one cluster suggests (but of course doesn't prove) that the bombing might have been causing very high casualties. I've always thought the Fallujah cluster (if the responses were honest) is the most interesting and possibly revealing thing about the entire study--the rest of it gives us an overall violent death rate which isn't that far off the UNDP report for the shorter period of time. The Fallujah data might be telling us how bad things were in one part of the country right after the UNDP report ended.

BTW, I think the one real contribution you've made (along with Bruce R) was the idea of troop deaths as violence indicators. That was a good idea, though I think when you combine what you found with Bruce's point, it suggests that the various sampling flukes (my own private technical term) more or less cancelled out. It'll be interesting if you find something in the UNDP report that can be used as an indicator. You won't take my advice, but it would have been good if you'd stuck to these empirical approaches and left the statistics theory alone. (Though again, I've personally learned a few things from all this.)

By Donald Johnson (not verified) on 20 Oct 2005 #permalink

Donald,

I like you, you're one of the more honest and intellectually honest persons I have dealt with here.

Those 11 males, I think I may have have found a key to them in the Lancet study:

Table 2 includes 12 violent deaths not
attributed to coalition forces, including 11 men and one
woman. Of these, two were attributed to anti-coalition
forces, two were of unknown origin, seven were
criminal murders, and one was from the previous
regime during the invasion.

There we have at least two more that are unaccounted for as far as why or how they died, of course one of them could have been the woman. Seven criminal murders, as you said, most of these would probably not be insurgents, but some of them could be, especially taking into consideration what you said about lying.

I find it severely lacking by Lambert to claim as fact that only 2 of those 13 males could have been insurgents, especially given that 1-2 more were never assigned a category, and that some of the criminal murders might also have been insurgents. We just don't know, so claiming to know is just disingenuous. Roberts said he didn't know, why isn't this good enough for Lambert? Oh, right, because he took to heart the fact that the Lancet website overstated the conclusion of the study on the eve of a presidential election...

As far as heart attacks and strokes, without knowing more specifics on these deaths, I wouldn't think it unusual for an insurgent who has been fighting for their life, running away, being in an intense state of fear having this happen to them. Once again, I am not claiming that any or all of these were insurgents. I am only soliciting the assurance that it is a possibility.

You said that you thought possible that some of the ones who got killed by insurgents might be insurgents themselves due to fighting between them, 1-2 of the males were killed in this fashion.

In other words, claiming that tops 2 males out of the 13 were insurgents is sweeping a lot of possibilities under the rug.

About Fallujah, the cluster they had there gave a point estimate of 200,000 deaths for that 3% of Iraq (739,000 people). The population of Fallujah has been said to be around 200-300k. As Fallujah was the main focus of the 739,000 people in that area, I think it would be quite unlikely that the Coalition killed almost the entire city of Fallujah...

Would be great if they at some point did a wider study on Fallujah to get a more representative look of the casualties there.

I have looked in the UNDP study, and I can't really find anything that would be a very good indication of Iraqi mortality. I have written down 5 indicators, but I don't feel any of them are more accurate than the Coalition death rates we already derived. They are: forced change of residence; damage to dwelling from military action; weapons never being shot in neighborhood; weapons being shot everyday in neighborhood; and household member a victim of crime in the past 4 weeks.

On the first, 3 pairs are similar. On the 2nd, 4-5 are similar. On the 3rd, none are similar. On the 4th, 1-2 are similar. On the last, the differences are too small to discern similarity.

I'm trying to cool down the flame war, Seixon. I got myself banned from a well-known liberal blog (deservedly so) when I got very heated and rather obnoxious with the guy who ran it. I still think I had a good point to make, but when you make people angry that gets lost. (Incidentally, you'd have been on the other guy's side--I was attacking this liberal from the left.) It's best not to accuse people of lying and fraud unless you've got rock solid evidence. I exclude politicians, because their very job makes them almost certain to be liars, but it's a good rule for most other people.

On the number of insurgent deaths, if you put aside the possibility of respondents twisting the truth for various understandable reasons (which would make all surveys in Iraq very hard to interpret, not just the Lancet one), I think Tim's number is probably right, but not certain. First, the people most likely to be killing insurgents are Americans, so if you've got male Iraqi deaths of military age and you know some were killed by Americans, then those are the ones most likely to be insurgents. The victims of violent crime are statistically almost certain to be mostly innocent civilians, simply because insurgents are a small portion of the male population.
The insurgent-on-insurgent violence does occur, but as far as I know it's very sporadic--I'd assume most of the insurgents who get killed get killed by coalition forces. If the largest number of people in the survey (taken at face value) who could be insurgents killed by Americans is two, then it'd be a real fluke if you had one or two more in the same survey killed by criminals or enemies in the insurgent movement. Though that said, the murder rate is incredible in Baghdad--Robert Fisk said there were 1000 bodies in the Baghdad morgues in July. So who knows who is killing whom and for what reasons? That murder rate could also include death squad killings by people associated with the Iraqi government, for that matter.

I doubt Tim was even thinking of heart attacks and car accidents ( wasn't), but again, with insurgents being a small chunk of Iraq, you'd expect them to make up a small fraction of the heart attacks and car wrecks, though one could argue about whether they'd be over or under-represented in relation to how many insurgents there are.

By Donald Johnson (not verified) on 20 Oct 2005 #permalink

I forgot to comment about Fallujah. Roughly 25 percent of that neighborhood supposedly died, so I think that gives the point estimate of 200,000 for the province. Nobody believes that, of course. You use common sense--for one thing it wasn't the entire province that was being bombed, AFAIK. One could argue that maybe it was 25 percent of Fallujah, around 50-75,000 and I saw an interview or a letter by the Lancet authors where they suggested maybe that was true. Still pretty hard for me to believe--I think we've had a major news blackout with regard to what goes on when Americans bomb places where Western reporters can't go, but I have trouble believing that many people could have died without some indication leaking through. Maybe I'm wrong. But cut it down some more and my skepticism drops quite a bit. I did google a bit and found articles claiming there was a Red Crescent official who said 6000 people died in Fallujah and it wasn't clear if he meant the final assault or the whole period. (How did this unnamed official know? I don't have a clue.) It was hard to find anything about the period between the assaults, except for a NYT article that I clipped in the fall, talking about civilian casualties in the bombing, but giving no numbers. So I'm guessing many thousands of civilians died there, both in the bombing and in the final assault, and the Fallujah cluster in the Lancet reflects what happened in the hardest hit areas during the bombing. (Since I'm just guessing, maybe many of those Fallujah males were insurgents, presuming that their presence made the neighborhood a target.)

By Donald Johnson (not verified) on 20 Oct 2005 #permalink

In summary, saying that it could have been at most two is being a bit conservative. I'm not claiming any number, such as two, I am just saying that Lambert is being disingenuous by claiming that there is a certain number to be used here. The only reason he is doing it is because he wants to put up some kind of defense for the Lancet saying it was 100,000 excess civilians, when even the study doesn't say that a single time in its text.

As I also said, I'm not sure how Lambert can discount the possibility of insurgents dying in accidents, getting heart attacks, and so on. I also cannot see how he knows that those two Iraqi adults who were not attributed to anything are not insurgents.

Face it Lambert, the number is not just for civilians, and the study itself never even claims this to be the case.

Even if it was 95,000 civilians, and 5,000 insurgents... how can you say that saying "100,000 excess civilian deaths" is anything but a false statement, especially when the study never even says this??

Geez...

And the lack of comments about the whole sample of size n being equally probable of selection, why is that? Because the selection of sample size n to include households from Basra and Missan was impossible, whereas one that did not was possible? Is that why? Or am I just opening up yet another cupboard in hope of an elephant?

What about the bootstrapping method, and its inapplicability to detecting the confidence interval to compensate for excluded areas? As far as I understood, that isn't what bootstrapping is even for...

What about the assumption that the provinces they paired were similar, without providing any substantiation, and me giving at least circumstantial evidence that this is not the case?

No?

As is becoming clear to me now, the Lancet study did not, and could not, have figured in the increased variance caused by the exclusion of the 6 provinces. Some at this blog talked about bootstrapping, and that this would solve this problem. From reading about bootstrapping, I do not believe this to be the case as it only mimics the results of resampling your population many times based on your original sample. In other words, if your original sample excluded 6 provinces, your bootstrap is not going to help you determine anything based on those exclusions since it depends on your original sample being a representative sampling of the population.

In the Lancet study, they say this:

This clumping of clusters was
likely
to increase the sum of the variance between
mortality estimates of clusters and thus reduce the
precision of the national mortality estimate.

It "was likely" because they couldn't determine what it was. Thus, the national mortality estimate does not reflect this added imprecision. Thus:

As a check, we also used
bootstrapping to obtain a non-parametric confidence
interval under the assumption that the clusters were
exchangeable.

In other words, their bootstrap assumed that the 6 excluded provinces had similar levels of violence to their partner provinces, thus not alleviating the problems with their exclusion and the "likely" increase in variance they talked about earlier.

Some were asking earlier whether they corrected for this pairing process. I think this just about answers that. What do I know though? I'm just an innumerate.

Iraq Body Count released a new report including a break-down by province. Now we no longer have to go by the coalition death rates to find out if the pairings the Lancet study did were correct. Civilian death rates (deaths per million, to the nearest 10 deaths) by pair from the IBC report:

  1. Ninawa/Dehuk: 560/0
  2. Sulaymaniyah/Arbil: 80/130
  3. Tamim/Salahuddin: 970/960
  4. Karbala/Najaf: 920/750
  5. Qadisiyah/Dhiqar: 90/650
  6. Basra/Missan: 1250/50

Yup. Those are all "similar" alright. Nothing to see here, Lancet is right, I'm wrong, moving right along....

Iraq Body Count derives their minimum and maximum estimates using specific data extraction methods that rely almost exclusively on a specific list of pre-approved media sources. Any deaths that went unreported or unmentioned by these media outlets was not counted. Their emphasis is on tabulating only those deaths for which there are tangible and specific records--no unreported deaths of any kind are counted.

Their FAQ section at their web site contains the following statement;

We are not a news organization ourselves and like everyone else can only base our information on what has been reported so far. What we are attempting to provide is a credible compilation of civilian deaths that have been reported by recognized sources. Our maximum therefore refers to reported deaths - which can only be a sample of true deaths unless one assumes that every civilian death has been reported. It is likely that many if not most civilian casualties will go unreported by the media. That is the sad nature of war. (my italics)

In other words, Iraq Body Count themselves state that their count is low, and likely does not even account for a simple majority of civilian casualties. This qualification puts their estimate well within one standard deviation of the Lancet study mean.

Seixon, it's not enough to tabulate some convenient numbers--you need to consider where the data came from and what it actually represents. This is doubly true if you're going to accuse another team of having questionable data and methods.

Interesting ... these figures are for the entire period since the war, though, not the period covered by the Lancet survey. I think that makes a massive difference because the Dhi Qar figure is going to be swelled by fact that Nasriya has been a big centre of the insurgency over the last year while nowhere in Qadisiyah has to the same extent. Also worth noting that IMO Iraq Body Count massively underestimates even what they are trying to count because when they get a report of "a family" having been killed they count it as four deaths when the average size of a nonextended family in Iraq is six. This is a completely ad hoc assumption made "for conservatism".

The problem pairings appear to be 1, 5 and 6 on Seixon's list - the others look OK [ and I suspect that 5 looks bad because of the different sampling periods]. Of these, Ninawa/Dehuk doesn't really contribute anything to the central estimate; they found a more or less unchanged death rate in this province as you'd expect because it's in the far North. Missan/Basra was a quite significant contributor to the estimate, but in this case they sampled Missan which is the province with the lower IBC. So Seixon's case that the grouping process makes a practical difference has to rest on Qadisiyah/Dhi Qar.

I put the numbers in a spreadsheet. In three cases the clusters were moved to a governorate with a higher death rate, in three cases to a lower death rate. Making the dubious assumption that the IBC isn't a biased measure of the death rate in each governorate, I find that the net effect of the pairing process was to make a small reduction in the estimate of about 4,000 deaths. So Seixon should go with 102,000 instead of 98,000.

"In three cases the clusters were moved to a governorate with a higher death rate, in three cases to a lower death rate. "

That's bias! In fully half the cases, the higher numbers were used, whereas the lower numbers were used in only three cases!

But seriously folks; these are total deaths, not death rates.

As I posted somewhere else, the majority of the deaths cited by IBC have been sourced by mortuaries, medics, Iraqi officials, and police (in that order). Journalists were only the primary source for 8% of the deaths in the IBC database. So it seems like yet again certain people are resorting to convenient arguments that are not based in fact.

In fact, four of the pairs had the most violent one chosen, with two having the least violent one chosen. Using the IBC numbers from BBC, and using the UNDP figures for the current population of Iraq, I tabulated the death rates on my blog.

Here's some numbers:

Paired provinces (the 12):
Population - 14.8M
Civilian deaths - 7602
Death rate - 510/M

Sampled provinces (from pairs):
Population - 8.4M
Civilian deaths - 4348
Death rate - 520/M

Excluded provinces (from pairs):
Population - 6.4M
Civilian deaths - 3254
Death rate - 510/M

Sampled provinces (all):
Population - 20.2M
Civilian deaths - 23995
Death rate - 1190/M

Sampled provinces (excluding Anbar):
Population - 18.8M
Civilian deaths - 21700
Death rate - 1150/M

Unsampled provinces:
Population - 7M
Civilian deaths - 3375
Death rate - 480/M

All of Iraq:
Population - 27.1M
Civilian deaths - 27370
Death rate - 1010/M

All of Iraq (sans Anbar):
Population - 25.8M
Civilian deaths - 25075
Death rate - 970/M

If we go by what we would have expected following the IBC's numbers, and the methodology of the Lancet study, we would have expected a mortality of 1150/M since Anbar was excluded at the end for being an outlier.

Had all of Iraq been sampled, we would have expected 1010/M. Excluding Anbar, we would have expected 970/M.

So, Lambert, if we're going to play that game (which I think is pretty dumb really since it doesn't really prove anything) is that the Lancet methodology would have overestimated it by about 16,000 deaths.

970/1150 = 84% * 98,000 = 82,660

Of course, this doesn't really mean anything, since the 100,000 number is too imprecise to use such as this.

My main point is that the pairs aren't "similar" as they claimed in making their study, which throws their whole methodology off the wagon. Their methodology is only valid if those provinces were in fact similar, which they unfortunately, by two different data sets, were not.

Obviously Lambert keeps ignoring this, which he has proven in recent comments where he continues to claim that the net effect of the pairings was that less violent ones were chosen on average, which isn't true.

I would appreciate an elaboration on this claim:

I find that the net effect of the pairing process was to make a small reduction in the estimate of about 4,000 deaths

Would you be so kind, Mr. Lambert?

Obviously, and mine is just about finished. Meanwhile, I have solicited data from FAFO and the contact for the ILCS. Hopefully I will get it!

In the mean time, I correlated some of the available ILCS numbers with the IBC numbers.

The ILCS gives their number of 24,000 "war-related" deaths in Iraq, and then carves that number up into regions. I took these same region definitions, applied them to the IBC numbers, and did a cross-check.

The only problem with this is that the ILCS data is current as of May 2004 (August 2004 for the northern region) and the IBC data is current as of December 2005. Not taking this into consideration, I found the following by adjusting around Baghdad's rate as it was defined as a sole region...

South:
ILCS - 2420
IBC - 630

Central:
ILCS - 990
IBC - 870

North:
ILCS - 250
IBC - 60

The ILCS figures are adjusted by multiplying the original ILCS figure by 1.95, which was the factor between the ILCS and IBC numbers for Baghdad.

This rough evaluation seems to show that the North and South were both underestimated by a factor of approximately 4, while the figure for Central is very close. This is what we would have expected as far as media reporting bias. Here I have attempted to quantify the factor this bias played.

This matters little in the end... since most of the pairings are in the same region, and thus would be affected by reporting bias similarly (something I hadn't even though of until now!).

If I get more precise data from ILCS, I will be able to determine with more certainty that, in fact, most if not all the pairings conducted by the Lancet study were not correct.

Seixon, I gave details of my calculations in the linked spreadsheet. Your calculations in comment 212 are wrong. You fail to account for the fact that some governorates were oversampled after the pairing process.

Lambert,

I did not have any software to open your ODS file, but I guess I will find some, hopefully it is freely available.

You fail to account for the fact that some governorates were oversampled after the pairing process.

I'm not quite sure what you mean by this. The rate found in each governate was meshed into a national rate, while each governate in a pairing was supposed to represent two governates, not one.

I will have a look at your numbers if I find software to open it, and get back to you.

From a cursory look at your spreadsheet, it seems like you are trying some hocus pocus here.

First of all, you use the populations that the Lancet study gives, which are not correct. The UNDP figures for population size are more current than the ones the Lancet study uses.

Second, you do deaths per cluster, then multiply this by the different numbers of clusters received in each of the rounds.

The problem with this is that if they only had 2 clusters, they would of course correlate this differently to the entire governate they were sampling than if they had 3 clusters.

In other words, let's take an example from your spreadsheet. Ninawa.

You give deaths per cluster in the initial sampling as 1216 for Ninawa. Then for deaths per cluster after grouping, you give Ninawa 1621 due to it receiving one more cluster from the grouping process.

The problem with this is that you are not doing death rates, but the number of deaths. In other words, it doesn't matter if additional clusters are given, because this is made up for by the population of the governate versus the number of clusters sampled.

In other words, first we have 1216 for 3 clusters. Applied to Ninawa as a whole, each cluster representing 739,000 people... This would give a death rate for Ninawa of 1288 deaths per million.

Second we have 1621 for 4 clusters. Applied to Ninawa as a whole, this would give a death rate for Ninawa of 1288.

The death rate is still the same.

So I might want you to explain how this shows that the study methodology actually underestimates the number by 4,000 deaths.

Seixon, I had a couple of questions about your 217 post. I'm confused about what you're saying. Were the IBC numbers for Baghdad 1.95 times lower than the ICLS numbers (presumably we're talking about violent deaths)? Also, what's this adjustment you made? Were the IBC numbers 4 times smaller in the North and South?

If I understand you correctly it sounds like you're presenting evidence that the IBC numbers are a serious undercount. But that may not be what you're saying.

By Donald Johnson (not verified) on 02 Jan 2006 #permalink

Donald,

The ILCS and the IBC data are not directly comparable in their size, since the IBC data is current as of Dec 2005, while the ILCS data is only current as of May 2004. Beyond that, the ILCS numbers are a projection based on a survey, while the IBC data are based on actual reported deaths.

What I was demonstrating is the differences between the different regions with the two data. It demonstrates that it is possible that the IBC data undercounted the Northern and Southern region by about a factor of 4, while the Central region was undercounted by a factor of a negligible 1.1.

Lambert and others had bemoaned the reliability of the IBC data, saying that the more rural regions (North and South) would be undercounted. I challenged them to go on the record to say that the discrepancy was so much that the provinces would still be similar. No one has.

The three most dissimilar pairings according to the IBC data differ by factors of 9, 23, ~500. However, two out of these 3 pairings lie within the same region as defined by the ILCS.

Thus an underreporting bias with a factor of 4 would still not change the fact that these pairings were incorrect.

I have solicited the raw numbers for each governate from the UNDP and FAFO to determine more accurately whether the inherent reporting bias in the IBC numbers plays any role in whether or not the pairings were similar or not.

If this initial finding is any indication, Lambert & Co might want to start scrambling for the next excuse.

Seixon, you are confused again. Column F in my spreadsheet is a death rate. It's not deaths per million people, but deaths per 739000 people (the cluster size).