One of the disagreements between Lott and the NAS panel is on the question of whether the models fit the data. Joel Horowitz explains the problem in Appendix D of the report. For people who don’t like equations, I’ll try to explain the issue with some pictures. The graphs below show straight lines (these are the models in our example) fitted to two different data sets. While the line is roughly the same distance from the data points in both cases, the one on the left is bad fit, while the one on the right is a good fit. The points on the left lie on a curve and not a straight line, so it is obvious that the line does not fit the data, but it is not so clear why a line fits the points on the right.

The bottom pair of graphs show the residuals—the difference between the straight line and the data points. The first shows no pattern—the residuals are completely random. The second shows an obvious pattern and is not random. You can use a statistical test to see if the residuals are not random. Such a test is called a *specification test*.

The points lie on a curve–the model is misspecified

The points are roughly on the line

The residuals show a definite pattern

The residuals look like random noise

In Appendix D of the NAS Panel report, Joel Horowitz examines the models used by Lott and finds (my emphasis):

None of the models examined by the committee passes a simple specification test called RESET (Ramsey, 1969). That is,none of the models fits the data. This raises the question whether a model that fits the data can be found. For example, by estimating and testing a large number of models, it might be possible to find one that passes the RESET test. This is called a specification search. However, a specification search cannot circumvent the curse of dimensionality. If the search is carried out informally (that is, without a statistically valid search procedure and stopping rule), as is usually the case in applications, then it invalidates the statistical theory on which estimation and inference are based. The results of the search may be misleading, but because the relevant statistical theory no longer applies, it is not possible to test for a misleading result. Alternatively, one can carry out a statistically valid search that is guaranteed to find the correct model in a sufficiently large sample. However, this is a form of nonparametric regression, and therefore it suffers the lack of precision that is an unavoidable consequence of the curse of dimensionality. Therefore, there is little likelihood of identifying a well-fitting model with existing data and statistical methods.7 In summary, the problems posed by high-dimensional estimation, misspecified models, and lack of knowledge of the correct set of explanatory variables seem insurmountable with observational data.

Lott’s response to this was:

Professor Horowitz’s discussion of the reset tests seem too strong since I provided the panel with the reset tests done for a wide range of estimates. Even accepting that the Reset test is appropriate (and no one else on the panel also uses this test in their work), there are many estimates where the results pass this test and he should thus conclude that those indicate a drop in violent crime.

This seems to miss the point of Horowitz’s discussion above. Lott does not seem to be at all familiar with RESET, since he does not capitalize the name correctly (RESET is an acronym for “regression error specification test”). I asked Horowitz to comment and he replied:

I am not aware of any RESET results from Lott. He did not know what RESET is the only time I discussed it with him. All the models of Lott and his critics to which I applied RESET failed. That is, the hypothesis of correct specification was rejected. Of course, one can always make passing RESET or any other specification test a model selection criterion, thereby guaranteeing a “correctly specified” model. This would be a serious misuse of RESET or the other test, and the results would be meaningless.

As I explain in the signed appendix of the NRC report, the curse of dimensionality and lack of knowledge of the correct explanatory variables make it impossible to use Lott’s or similar data draw conclusions about the effects of right-to-carry laws on crime. Although the models I tested fail RESET, it would be easy to make lots of models that pass the test and imply contradictory conclusions about the effects of right-to-carry laws. The fact that Lott may have found some models that pass RESET is neither surprising nor significant.

There are have also been previous findings that Lott’s carry-law models fail specification tests and hence do not fit the data. Black and Nagin’s 1998 Journal of Legal Studies paper found that Lott’s original model also failed a specification test. In Lott’s response to Black and Nagin he just ignored the problem.