There is no real correlation with total homicide.
Why do you say 14 countries? Didn’t they leave out N. Ireland, and
cook the numbers for Switzerland?
Since much disagreement surrounds the use of those two countries,
do the analysis again with the remaining 12.
One gets a correlation.
OK, Spearman r is 0.64 (p=0.02). (Pearson is misleadingly high
because of its sensitivity to outliers.)
So the U.S. point is an outlier. Painfully obvious, wouldn’t you say?
Leave out the U.S., and the correlation disappears.
Hardly. Spearman r is 0.53 if you do this. (And Person r is almost
And p=0.0905 (approximately).
No, since you chose the point to exclude which caused the largest
decrease in the correlation coefficient. p=.09 is roughly the
probability that a random permutation of 11 numbers will have a
Spearman r of magnitude 0.53 or higher. What we require is the
probability that if we take a random permutation of 12 numbers, delete
the one that causes the largest decrease in the magnitude of Spearman
r and then compute the magnitude of the Spearman r we get 0.53 or
higher. This is a little more difficult to compute :-), since the
stats texts do not tell us how to do it. I computed it by simple
Monte Carlo methods and got p=.03 (approximately).
Last time I checked, p>0.05 means “not significant”.
There is nothing magical about 0.05. It’s best to give the p value
and let the reader decide whether to reject the null. A p of 0.09
could arise by chance only 1 time in 11, so is usually considered to
have borderline significance.
Why didn’t you mention “p” here, when you did just a few lines above?
My program reported a “p” value to me, but I did not report it because
I realized that it was incorrect since we excluded the point that
caused the largest decrease in r. I had to write a computer program
to get the correct value.
In any case an r value of 0.53 cannot be described as non-existent.
Graph the points. Look at them. Then tell me that any real scientist
would not consider the U.S. data point very suspicious.
Once again, any scientist worth his salt would scoff at the notion
that a correlation is real if it depends on the inclusion of one
data point out of 12,
True, but this is not the case here.
especially when that point is so far out of whack with the rest of the data.
I wish there was some cut and dried method for dealing with outliers,
but there isn’t. Leaving them in and use a robust method seems the
safest thing to me. I will concede that there is room for differences
of opinion on this issue. (As opposed to my differences with Brandon
and Kleck who insist on using Pearson without excluding all outliers.)