hambidge writes:
There is no real correlation with total homicide. Why do you say 14 countries? Didn't they leave out N. Ireland, and cook the numbers for Switzerland? Since much disagreement surrounds the use of those two countries, do the analysis again with the remaining 12. One gets a correlation.
OK, Spearman r is 0.64 (p=0.02). (Pearson is misleadingly high because of its sensitivity to outliers.)
So the U.S. point is an outlier. Painfully obvious, wouldn't you say?
Leave out the U.S., and the correlation disappears.
Hardly. Spearman r is 0.53 if you do this. (And Person r is almost identical).
And p=0.0905 (approximately).
No, since you chose the point to exclude which caused the largest decrease in the correlation coefficient. p=.09 is roughly the probability that a random permutation of 11 numbers will have a Spearman r of magnitude 0.53 or higher. What we require is the probability that if we take a random permutation of 12 numbers, delete the one that causes the largest decrease in the magnitude of Spearman r and then compute the magnitude of the Spearman r we get 0.53 or higher. This is a little more difficult to compute :-), since the stats texts do not tell us how to do it. I computed it by simple Monte Carlo methods and got p=.03 (approximately).
Last time I checked, p>0.05 means "not significant".
There is nothing magical about 0.05. It's best to give the p value and let the reader decide whether to reject the null. A p of 0.09 could arise by chance only 1 time in 11, so is usually considered to have borderline significance.
Why didn't you mention "p" here, when you did just a few lines above?
My program reported a "p" value to me, but I did not report it because I realized that it was incorrect since we excluded the point that caused the largest decrease in r. I had to write a computer program to get the correct value.
In any case an r value of 0.53 cannot be described as non-existent.
Graph the points. Look at them. Then tell me that any real scientist would not consider the U.S. data point very suspicious.
Once again, any scientist worth his salt would scoff at the notion that a correlation is real if it depends on the inclusion of one data point out of 12,
True, but this is not the case here.
especially when that point is so far out of whack with the rest of the data.
I wish there was some cut and dried method for dealing with outliers, but there isn't. Leaving them in and use a robust method seems the safest thing to me. I will concede that there is room for differences of opinion on this issue. (As opposed to my differences with Brandon and Kleck who insist on using Pearson without excluding all outliers.)




