Since my amateur discursion into stochasticity appears to have flushed out all of the mathematically savvy, I'm going pose a real life statistics question for you. I have some data that are non-normally distributed (in fact, they really don't seem to fit any distribution well--and, yes, I've tried various transformations and the data still don't fit anything). If the data were normally distributed, I would like to perform ANOVA to partition the sources of variance (using percent sums of squares).
Since the data aren't normally distributed, ANOVA is not the right test to use. Because I'm using a full-factorial model with four factors, using Friedman's test or any of the one-way non-parametric tests doesn't seem to apply either. I should add that, while my data are continuous, I can transform the data into ordinal and/or nominal categories if that would help. While logistic regression could circumvent the non-normality problem, I'm unaware of how one partitions variance with that test (if there's a way to do so, please let me know).
Any ideas? If there are any statistics programs or R packages out there that can handle this, please let me know. Don't be shy...
- Log in to post comments
I remember vaguely that the output for logistic regression (in Stata, SAS, SPSS etc) has the amount of variation 'explained' by the model called 'deviance' The amount of variation due to a certain factor is hence the difference between deviance with the factor in the model and without. It is also used for the chi-square test for significance of the factor.
R-wise, maybe the examples in Frank Harrell's Design::lrm package would help:
http://lib.stat.cmu.edu/S/Harrell/help/Design/html/lrm.html
With a logistic model the variance isn't a good measure of fit because the domains of the errors is restricted. Maybe your data is nasty enough that the optimizing some function of the model errors doesn't give you meaningful results, and any partitioning that error measure won't tell you anything significant. You need to have some idea of the error distribution to make meaningful inferences about apportioning that error to factors.
I had a similar problem.
I'm using JMP (the full-on, expensive version).
JMP lets you decree that your continuous numbers are ordinal, without having to change the actual numbers. (I found it worked better when I truncated my numbers to 3 or 4 sig figs, though.) Then it lets you run a general linear model, giving you logistic regression output (instead of parametric). The GLM can handle ties in the ordinal data, multiple input variables, interactions, and even a little nesting.
It gives you the likelihood ratio tests for each effect -- like effects tests in ANOVA -- and provides parameter estimates (with P-values and 95%CIs) for each level in each effect. Ergo, the analysis does partition the error for you.
It also has the refs for the analyses it's doing, so you can both figure out how the logistic GLM handles the error, and cite the authorities to back it up.
Finally, for those of us with advisors unfamiliar with logistic regression, you can run a (technically invalid) parametric ANOVA that is exactly congruent, just by switching your response variable back to "continuous." When they give the same answer, this congruent-but-invalid analysis can be useful for educating advisors.
I can email you an example output if you're interested.
PS
Mike, you've got two copies of the original post.
And they've got different comments.
Read both!