One of the disagreements between Lott and the NAS panel is on the question of whether the models fit the data. Joel Horowitz explains the problem in Appendix D of the report. For people who don't like equations, I'll try to explain the issue with some pictures. The graphs below show straight lines (these are the models in our example) fitted to two different data sets. While the line is roughly the same distance from the data points in both cases, the one on the left is bad fit, while the one on the right is a good fit. The points on the left lie on a curve and not a straight line, so it is obvious that the line does not fit the data, but it is not so clear why a line fits the points on the right.
The bottom pair of graphs show the residuals---the difference between the straight line and the data points. The first shows no pattern---the residuals are completely random. The second shows an obvious pattern and is not random. You can use a statistical test to see if the residuals are not random. Such a test is called a specification test.
The points lie on a curve--the model is misspecified
The points are roughly on the line
The residuals show a definite pattern
The residuals look like random noise
In Appendix D of the NAS Panel report, Joel Horowitz examines the models used by Lott and finds (my emphasis):
None of the models examined by the committee passes a simple specification test called RESET (Ramsey, 1969). That is, none of the models fits the data. This raises the question whether a model that fits the data can be found. For example, by estimating and testing a large number of models, it might be possible to find one that passes the RESET test. This is called a specification search. However, a specification search cannot circumvent the curse of dimensionality. If the search is carried out informally (that is, without a statistically valid search procedure and stopping rule), as is usually the case in applications, then it invalidates the statistical theory on which estimation and inference are based. The results of the search may be misleading, but because the relevant statistical theory no longer applies, it is not possible to test for a misleading result. Alternatively, one can carry out a statistically valid search that is guaranteed to find the correct model in a sufficiently large sample. However, this is a form of nonparametric regression, and therefore it suffers the lack of precision that is an unavoidable consequence of the curse of dimensionality. Therefore, there is little likelihood of identifying a well-fitting model with existing data and statistical methods.7 In summary, the problems posed by high-dimensional estimation, misspecified models, and lack of knowledge of the correct set of explanatory variables seem insurmountable with observational data.
Lott's response to this was:
Professor Horowitz's discussion of the reset tests seem too strong since I provided the panel with the reset tests done for a wide range of estimates. Even accepting that the Reset test is appropriate (and no one else on the panel also uses this test in their work), there are many estimates where the results pass this test and he should thus conclude that those indicate a drop in violent crime.
This seems to miss the point of Horowitz's discussion above. Lott does not seem to be at all familiar with RESET, since he does not capitalize the name correctly (RESET is an acronym for "regression error specification test"). I asked Horowitz to comment and he replied:
I am not aware of any RESET results from Lott. He did not know what RESET is the only time I discussed it with him. All the models of Lott and his critics to which I applied RESET failed. That is, the hypothesis of correct specification was rejected. Of course, one can always make passing RESET or any other specification test a model selection criterion, thereby guaranteeing a "correctly specified" model. This would be a serious misuse of RESET or the other test, and the results would be meaningless.
As I explain in the signed appendix of the NRC report, the curse of dimensionality and lack of knowledge of the correct explanatory variables make it impossible to use Lott's or similar data draw conclusions about the effects of right-to-carry laws on crime. Although the models I tested fail RESET, it would be easy to make lots of models that pass the test and imply contradictory conclusions about the effects of right-to-carry laws. The fact that Lott may have found some models that pass RESET is neither surprising nor significant.
There are have also been previous findings that Lott's carry-law models fail specification tests and hence do not fit the data. Black and Nagin's 1998 Journal of Legal Studies paper found that Lott's original model also failed a specification test. In Lott's response to Black and Nagin he just ignored the problem.
Tim, could you recommend any sources for RESET?
One would imagine, given his problems with being able to evidence his claims in the past, Lott will be able to prove that the data he gave to the NAS panel was complete with RESET tests, in fact you would think it will be quite easy for him to do so. Alternatively, of course, this could well be another one of his little faux pas that the gun rights movement forgives him.
RESET is a gorgeous old-fashioned piece of statistics. It's really mindless and crude, but surprisingly powerful. Basically, if you want to do a RESET test on a regression, you add the squared, cubed and fourth powers of the independent variable into the regression equation, rerun the equation and then do an F-test testing the hypothesis that the coefficients on these new variables are all zero. Then you add a few squares and cubes of the explanatory variables and do the same.
If the F-tests fail to reject the hypothesis that the coefficients are zero, then you've passed the RESET test. If not, then the test is telling you "something is up". The problem with RESET is that it's in some senses too powerful; it detects ommitted variables, nonlinearities in the relationship and some forms of autocorrelation. Which means that all you know is that one of the very many things that can go wrong with a regression, has done.
That's why RESET is best used as a "post-test" to validate your final results, rather than a working test to try and narrow down your specification; once you've got a model that passes RESET, it will probably also pass a load of other tests. So if you just search around for a model that passes RESET then even if you've just data-mined and found it by chance on that dataset, none of your other tests will warn you.
(here endeth the lecture)
(no, it continueth)
The intuition behind RESET is that the higher power terms are correlated with the independent variable, but don't have a linear model that relates them to the dependent variable. Therefore, if they materially increase the explanatory power of a linear regression model, then there is some structure in the independent variable which is not being captured by the linear regression, and this will show up as residuals which are not white noise.
(yes, that will be much more comprehensible)
Two sources for RESET:
J.B. Ramsey (1969), Tests for Specification Error in Classical Linear Least Squares Regression Analysis. Journal of the Royal Statistical Society, Series B 31, 350-371
W. Krämer & H. Sonnberger (1986), The Linear Regression Model under Test. Heidelberg: Physica
These are from the reset() help page in R, btw: download R and the lmtest package and you can play with it.
dsquared and Kieran, thank you both - I've been messing around with reduced major axis regressions and had thought of nothing more informative than doing a normality test on the residuals and seeing if their central tendency was significantly different from zero.
Thanks to both of you, dsquared and Kieran. Makes life with reduced major axis regressions a bit easier.
I can see it now, John Lott is sitting in his office furiously figuring out how to use a RESET test. Once he figures it out, if he follows past patterns, he'll do a bunch of tests, backdate the results, and then 'prove' that he supplied them to the NAS Panel.
I have to disagree with the emphasis place on specification tests in this discussion. All flunking a specification test means is that there are hidden variables or non-linear terms that you are leaving out. It does not mean that your choice of variables lack explanatory power; although it does lead you to mistate the degree of confidence in the coefficients. As a previous poster indicated, you can always "fix" this by adding non-linear terms (or anything that tracks the shape of the residuals), regardless of the validity of the relationship. Conversely passing the specification test does not prove a cause and effect relationship, although it could provide evidence. A
Specification error, and a more insidious problems with auto-correlation, secular trends or hetereoskedasticity will cause you to grossly mistake the degree of confidence one should have in the results. The former problem arises when each data point is correlated with the prior one. For example in a monthly regression on gun sales vs home sales; home sales one month would be highly correlated with homesales another month. In an extreme case, it would be as if you were counting the same data twice.