by Michael Maltz

Anyone who has looked closely at the data used by John Lott in coming up with his “findings” in More Guns, Less Crime (MGLC) would come to the same conclusion, MGLC = GIGO. First, the data are so full of holes as to be unusable for the analyses he conducted. Second, Lott badly miscalculated the crime rates. Third, he ignored a major discontinuity in the data. [And fourth, as Ayres and Donohue have shown, even if the first three points were not valid (which they are), he did it wrong.] It’s a bit technical, so bear with me.


1. Gaps in the FBI Crime Data.

Lott used the FBI-collected Uniform Crime Reports (UCR) in his analyses, aggregated to the county level. I’ve been involved in studying the characteristics of the UCR since 1995, which I started during my tenure as a Visiting Fellow at the Justice Department’s Bureau of Justice Statistics (BJS). Many agencies don’t report their crime to the FBI—since the UCR is a voluntary system, they don’t need to do so. [Those interested in the reasons for this should read Bridging Gaps in Police Crime Data.]

In fact, there was a noticeable decline in UCR coverage in the 1990’s, which was not uniform in space and time. Between 1977 and 1992 (the years focused on in the first edition of MGLC, some states had rather significant decreases in the population covered by UCR reports—more than ten percent decreases, for example, in Indiana, Louisiana, Massachusetts, Mississippi, New Mexico, and Tennessee.

Handling a ten percent decrease in a state’s reporting is problematic, but not fatal, especially since the FBI makes sure that all agencies in jurisdictions of over 100,000 in population provide data. For agencies that don’t report, the FBI imputes their data; it feels that this will permit them to make national, regional, and state estimates of crime data despite the missing data, and thus publish relatively comparable data from year to year.

The decreases in crime reporting within states were not uniform, either. In many counties there were many years in which not a single agency provided crime data to the FBI. When a county is missing most or all of its data, there is no foundation on which to build a reliable estimate, and the FBI makes no county-level estimates. However, the National Archive of Criminal Justice Data (NACJD) takes the FBI data, imputes missing data [more on this later], and makes a county-level data set available on the web. Lott (mis)used that NACJD data set.

2. Miscalculation of Crime Rates.

The NACJD county-level data set provides both “crimes reported” and “population of the reporting agencies” for each county. But one can’t calculate the crime rate by using the “crimes reported” for the numerator and “reporting population” for the denominator.

If no agencies in a county send in reports, the denominator would be zero, leading to indeterminate rates. In fact, there are cases where there is reported crime in a county, but the reporting population is zero, leading to an infinite crime rate. This occurs when the reported crime comes from an agency that the FBI calls a “zero-pop” agency (such as transit police, park police, or campus police, or any agency with overlapping jurisdiction with the primary policing agency). They are so termed because they are not associated with a population—were they associated with a population, this would then in essence be double-counting that population.

To deal with these indeterminate and infinite crime rates, Lott chose to use a different denominator, the Census Bureau’s estimate of the county population. What this meant was that, in cases where no agency in a county provided crime reports, Lott calculated the crime rate to be zero.

To make matters worse, if an agency reported its crime to the FBI, but did not provide more than 5 months of crime data, NACJD did not record this agency’s crime or population in its county tally. So not only were non-reporting agencies omitted from the data Lott used, also omitted were agencies reporting 5 or fewer months. [The crime figures for an agency reporting 6-11more months were imputed by multiplying the known count by 12/(months reported), to inflate the figures to a full-year estimate.]

Lott claims that these many data gaps amount to random error, easily handled by standard statistical techniques. This is absolutely false: the error isn’t random, and standard techniques don’t apply.

3. “Break in Series.”

The imputation method used by NACJD was different than the one used by the FBI. The FBI uses the same formula, 12/(months reported), but its cutoff is 2 months. If an agency submits reports for 0–2 months, the FBI does not use that agency’s data. Instead, it uses the following imputation scheme: it calculates the aggregate crime rate (total crime/total pop) of similar agencies in that state, and multiplies that figure by the agency’s population.

In 1996 BJS told NACJD to stop using the imputation method it been using in earlier years, and start using the FBI method. It did, starting with the next data set it received from the FBI, the 1994 data set. So from 1994 and on it used the same imputation method as the FBI, and included the following statement in the documentation of these data sets:

Break in Series.

These changes will result in a break in series from previous UCR county-level files. Consequently data from earlier year files should not be compared to data from 1994 and subsequent years because changes in procedures used to adjust for incomplete reporting at the [agency] level may be expected to have an impact on aggregates for counties in which some [agencies] have not reported for all 12 months. [Emphasis added.]

Lott either ignored or (more likely) didn’t notice this warning and, in the second edition of MGLC, added the years 1993-1996 to his data set. When we informed him of this change, he added a dummy variable to his data set to account for the 1993-1994 change—and noted that it was not significant. This is hardly the way to deal with a break in series, especially when the publisher of the series explicitly states that the data are not comparable

So there you have it, a down-and-dirty description of the data set that Lott analyzed to produce the MGLC findings. He used various “fudge factors” to compensate for the data’s deficiencies; although his findings have the same color as fudge, I suggest you take the smell test before sampling them.

I notified Lott of these major mistakes in August of 2000, but he did not change the content of his many speeches and op-ed articles pushing his policy. I’ve held back on my criticism until now for a number of reasons. First, I relied on Lott’s scholarly integrity to acknowledge his mistakes; obviously, I made a mistake in this reliance—this was before I made the acquaintance of Mary Rosh. Second, Lott owns a bigger megaphone; the groups he caters to seem to have a lot more money and can generate much more publicity than the groups opposed to his policy recommendations. It was not until I found Tim Lambert’s blog earlier this year that I realized that there are other ways of bringing truth to bear on this issue.