How much difference can one coding error make?

By tlambert on August 25, 2003.

In his statement on the coding errors Lott tries to downplay the significance of the errors:

Minor coding errors were discovered in the data set after it was first given out. The files available for downloading on this site have the corrected results using the statistical county level tests employed in Ayres and Donohue's paper ("Shooting Down the More Guns, Less Crime Hypothesis"). The corrections involved a few thousandths of one percent of the data entries and occurred for observations after 1996. There were well over 70,000 observations and over a hundred variables available in the data set, we are dealing with a few hundred data entries that contained mistakes.

However, even one single error can make a dramatic difference to a least squares analysis. To demonstrate this I generated 1000 random observations, using years between 1981 and 2000 and a random value between 1 and 100. These are the red crosses in the graph to the left. I introduced just one error by changing the year of the last observation from 2000 to 0. This is a similar error to one of the few hundred coding errors Lott made. Note that one out of 1000 is a much smaller percentage than the percentage of coding errors in Lott's miscoded data set. The green line is the fit to the data with the error. Correcting the error gives the dramatically different blue line. That one single error also changed the result from statistical insignificance to being significant at the 8% level.

How much difference can one coding error make?

More like this

Scienceblogs is shutting down

June 2017 Open Thread

March 2017 Open Thread

January 2107 Open thread

December 2016 Open Thread

The Science of the Local Group (Synopsis)

Messier Monday: A Straggling Globular Cluster, M30

Whispers from the Ghosting Trees