Kevin Drum comments on Lott’s weighting scheme. He also links to a January posting which has this explanation from Lott explaining how he might have got a weight of 1/8 from his weighting procedure (my emphasis).

Whether it is possible depends upon how finely you do the weighting. If you do something as simple as national weighting, you are right, it would not be likely. But if you are willing to put in the effort to break things down into enough categories it becomes quite likely. I just looked up some different numbers from 2000 to give you a rough idea. In Montana, black males make up .14 percent of the population. In Mississippi, they make up 18.8 percent. That is a difference of 134 fold, quite a bit bigger than your 8/1 ratio. Obviously, this is an extreme difference and the difference that Lott must have come across is only about 1/17th as large. If he broke things down by age in addition to race and gender, I am sure that you could easily get difference much bigger than 134 fold. My impression is that at least on this point Lindgren is “making a mountain out of a mole hill.”

Lott is now claiming that he did do national weighting, so by his own admission, it is not likely that he could have got a weighting of 1/8.

James Lindgren writes:

The weighting problem you note for the claimed 1997 survey bears repeating. Things are more serious than I think some readers might suspect from your example. I’m not sure that I can convey the problem adequately in words.

The purpose of the sort of weighting Lott claims to have done is to adjust things so that if there are a few more of any one group in your sample in a state and a few less of any other group, you weight their answers accordingly, so the total is still the same. For example, assume you start out with 200 Californians in your sample, with a few more whites in your sample than the population and few less blacks. You reweight the sample so it still comes out at 200 Californians, but some are weighted more than 1 and others are weighted less than 1. The average of all respondents’ weights is exactly 1, and the sum of the weights of all 200 respondents is still 200. (I leave out the complication of more complex sampling designs and design effects that Lott said he did not use and that he did not have the expertise to use.)

So there are 36 categories for each state and 51 jurisdictions, which means a staggering 1836 categories with 1836 different individual weights. But Lott says he had only 2424 respondents in his 1997 study. A small state with about half a million people in the 1990 census (Wyoming, Vermont, DC, North Dakota, etc.) would have only about 4-6 respondents, but 36 categories for weighting those handful of respondents.

This doesn’t work in practice. Let’s say that all 4 of the respondents in a given small state were white, one white male aged 35, one white male aged 51, one white female aged 22, and one white female aged 45. Let’s say that for your sample of 4 people, males and females each make up 7% of the state’s population in each decade. Each would have their answer weighted by .25/.07=.28. These four categories with data together make up 28% of the population and thus would contribute 28% of the state’s total in Lott’s study. But where does the other 72% of the state’s total come from? There are no respondents in those categories making up 72% of the state to weight. In all, 32 of the 36 categories would be empty.

So adding all 4 respondents together, the total weighted count for the small state is only (4*.28)=1.12 of a respondent. Yet the total weighted number of respondents for that state should equal 4 respondents, not 1.12 respondents. The small state would contribute only 1.12 respondents to Lott’s totals, not the 4 people it is supposed to.

Note that in my example above for weighting California, you had 200 respondents both before and after weighting. But in Lott’s methods, in a small state you would get 4 respondents before weighting, and 1.12 respondents after weighting—if you are lucky; if you are unlucky, you might get a respondent from a small group with a low weight or 2 from the same group and get even less of the total. When added to the very large states, which would have enough respondents to cover most of the 36 categories for each state, the resulting data would be garbage, because large states would overcontribute to the total.

In short, weighting by 36 categories in each state sounds like something someone might come up with off the top of his head, but it is hard to believe that someone would really go through using it, once they realized that their weighted totals didn’t come close to adding up to 2424. I think it highly unlikely that a social scientist would actually try to weight by 1836 weighting categories when he realized that the great majority of the weights couldn’t be used. Indeed, Lott didn’t use the unworkable 1836 category weighting in his latest 2002 survey.

If there were some way to estimate the counts for the missing cells (which there isn’t in this case) and most of the cells were not empty, one might consider using such a method.

It doesn’t help Lott’s current account of the sampling and the weighting that he first told me that he drew a random sample from the CD but didn’t remember how, but then wrote: “Not true. I told Jim that one of the students had a program to randomly sample the telephone numbers by state. My guess is that it was part of the CD, but on that point I can’t be sure.”

When Lott called me on this matter, I was listening closely to Lott’s answers to questions about methods to see what details he could remember on the spot. I asked him how he drew the sample from the CD, which he said that he thought he got from a student. Lott said that he didn’t remember how he sampled, but assured me it was drawn randomly. I remember being disappointed in his answer because I thought that a social scientist would probably remember how he solved this problem of getting a random sample (there are several solutions). Also, I am absolutely positive that he did not mention pre-stratifying the sample by state, which is a form of stratified proportional sampling, not random sampling. He seems to be claiming now that he drew the sample proportionally by state then randomly within states. This discrepancy is not large by itself, but it is troubling nonetheless.

This comment about weighting problems is confusing. But I find it hard to believe that Lott (or anyone) would use the weighting method he claims to have used in 1997, not just because it is substandard, but because it doesn’t work at all when most of the cells are empty. Almost anyone trying to use it would immediately realize this and reduce the number of categories, such as removing the age breakdowns, which I gather from comments on the internet is what Lott did in his 2002 survey.