hambidge writes:

There is no real correlation with total homicide.

Why do you say 14 countries? Didn’t they leave out N. Ireland, and

cook the numbers for Switzerland?

Since much disagreement surrounds the use of those two countries,

do the analysis again with the remaining 12.

One gets a correlation.

OK, Spearman r is 0.64 (p=0.02). (Pearson is misleadingly high

because of its sensitivity to outliers.)

So the U.S. point is an outlier. Painfully obvious, wouldn’t you say?

Leave out the U.S., and the correlation disappears.

Hardly. Spearman r is 0.53 if you do this. (And Person r is almost

identical).

And p=0.0905 (approximately).

No, since you chose the point to exclude which caused the largest

decrease in the correlation coefficient. p=.09 is roughly the

probability that a random permutation of 11 numbers will have a

Spearman r of magnitude 0.53 or higher. What we require is the

probability that if we take a random permutation of 12 numbers, delete

the one that causes the largest decrease in the magnitude of Spearman

r and then compute the magnitude of the Spearman r we get 0.53 or

higher. This is a little more difficult to compute :-), since the

stats texts do not tell us how to do it. I computed it by simple

Monte Carlo methods and got p=.03 (approximately).

Last time I checked, p>0.05 means “not significant”.

There is nothing magical about 0.05. It’s best to give the p value

and let the reader decide whether to reject the null. A p of 0.09

could arise by chance only 1 time in 11, so is usually considered to

have borderline significance.

Why didn’t you mention “p” here, when you did just a few lines above?

My program reported a “p” value to me, but I did not report it because

I realized that it was incorrect since we excluded the point that

caused the largest decrease in r. I had to write a computer program

to get the correct value.

In any case an r value of 0.53 cannot be described as non-existent.

Graph the points. Look at them. Then tell me that any real scientist

would not consider the U.S. data point very suspicious.Once again, any scientist worth his salt would scoff at the notion

that a correlation is real if it depends on the inclusion of one

data point out of 12,

True, but this is not the case here.

especially when that point is so far out of whack with the rest of the data.

I wish there was some cut and dried method for dealing with outliers,

but there isn’t. Leaving them in and use a robust method seems the

safest thing to me. I will concede that there is room for differences

of opinion on this issue. (As opposed to my differences with Brandon

and Kleck who insist on using Pearson without excluding all outliers.)