Steve Levitt has a post with a detailed response to Foote and Goetz’s paper. They construct a new, better, measure of abortions under which more abortions are associated with less crime. They conclude:
The results we show in this new table are consistent with the impact of abortion on crime that we find in our three other types of analyses we presented in the original paper using different sources of variation. These results are consistent with the unwantedness hypothesis.
In comments to Levitt’s post Steve Sailer raises objections that do not impress me in the slightest but Daniel Davies makes a good point here:
Finally and most importantly, this is about as far from a double blind trial as you can get. I’ve written in the past about the perils of data mining in econometrics, and to be honest, all that is lacking in the series of changes to the data and the model that the Freakonomics blog presents is a phalanx of dwarves singing “Hi Ho, Hi Ho, It’s Off To Data-Mine We Go”. What has happened here is that Levitt and his research assistant have sat down in the knowledge that a perturbation to their model doesn’t deliver their result, and decided to have a think about what kinds of alterations to the data ought to be made.
You don’t need to suggest any intentional dishonesty to say that it is somewhat unsurprising that the outcome of the brainstorming session on “What sort of changes ought one to make to this data, in an ideal world?” was a dataset and model in which the result that Levitt is famous for was present. Even if Levitt and Ethan Lieber had sat down at a table with no computer on it, starting with a blank sheet to discuss the changes to make and not touching the model until they had finished, I would still guess that it would be the easiest thing in the world for someone who was intimately familiar with the dataset to subconsciously put his thumb on the scales. And I don’t think this is what they did; colour me cynical but I would bet quids that lots and lots of iterations of different possible changes to the data were tried. I note once more that there is no accusation of intentionally cooking the books here; medical science certainly doesn’t insist on double blind trials to protect them from unscrupulous doctors.
I think that there’s a general issue here which is endemic to the territory that Levitt chooses to operate in. By their nature, political debates are debates. One side produces arguments, the other side produces counterarguments and so on, so iteratively. This is an environment which is absolutely poisonous to datasets. By the time you’ve been through two or three iterations of a “controversy” like this it’s more or less impossible to pick a model without failing even the most homeopathically weak version imaginable of a double blind criterion. This is why I now say that we’re simply never going to know the truth (by which I mean, even the simple statistical truth about the existence of a comovement, much less the truth about the underlying causal hypothesis) about abortion and crime in the period 1976-2000. Stick a fork in this dataset, it’s done.
I don’t think that the situation is as hopeless as that. Foote and Goetz have access to the same data and tools so we can see if they can come up with another measure of abortions
that makes the results go away. Another possibility is that Donohue and Levitt present the results for a whole slew of alternative formulations of the abortion measure so we can see if their results are sensitive to the particular way that it is defined.