The single most useful piece of advice I can give you, along with a theory as to why it isn't better known, all embedded in some comments on a recent article that appeared in the Journal of the American College of Cardiology

By agelman on March 24, 2010.

Our story begins with this article by Sanjay Kaul and George Diamond:

The randomized controlled clinical trial is the gold standard scientific method for the evaluation of diagnostic and treatment interventions. Such trials are cited frequently as the authoritative foundation for evidence-based management policies. Nevertheless, they have a number of limitations that challenge the interpretation of the results. The strength of evidence is often judged by conventional tests that rely heavily on statistical significance. Less attention has been paid to the clinical significance or the practical importance of the treatment effects. One should be cautious that extremely large studies might be more likely to find a formally statistically significant difference for a trivial effect that is not really meaningfully different from the null. Trials often employ composite end points that, although they enable assessment of nonfatal events and improve trial efficiency and statistical precision, entail a number of shortcomings that can potentially undermine the scientific validity of the conclusions drawn from these trials. Finally, clinical trials often employ extensive subgroup analysis. However, lack of attention to proper methods can lead to chance findings that might misinform research and result in suboptimal practice. Accordingly, this review highlights these limitations using numerous examples of published clinical trials and describes ways to overcome these limitations, thereby improving the interpretability of research findings.

This reasonable article reminds me of a number of things that come up repeatedly on my (other) blog and in my work, including the distinction between statistical and practical significance, the importance of interactions, and how much I hate acronyms.

They also recommend composite end points (see page 418 of the above-linked article), which is a point that Jennifer and I emphasize in chapter 4 of our book and which comes up all the time, over and over in my applied research and consulting. If I had to come up with one statistical tip that would be most useful to you--that is, good advice that's easy to apply and which you might not already know--it would be to use transformations. Log, square-root, etc.--yes, all that, but more! I'm talking about transforming a continuous variable into several discrete variables (to model nonlinear patterns such as voting by age) and combining several discrete variables to make something continuous (those "total scores" that we all love). And not doing dumb transformations such as the use of a threshold to break up a perfectly useful continuous variable into something binary. I don't care if the threshold is "clinically relevant" or whatever--just don't do it. If you gotta discretize, for Christ's sake break the variable into 3 categories.

This all seems quite obvious but people don't know about it. What gives? I have a theory, which goes like this. People are trained to run regressions "out of the box," not touching their data at all. Why? For two reasons:
1. Touching your data before analysis seems like cheating. If you do your analysis blind (perhaps not even changing your variable names or converting them from ALL CAPS), then you can't cheat.
2. In classical (non-Bayesian) statistics, linear transformations on the predictors have no effect on inferences for linear regression or generalized linear models. When you're learning applied statistics from a classical perspective, transformations tend to get downplayed, and they are considered as little more than tricks to approximate a normal error term (and the error term, as we discuss in our book, is generally the least important part of a model).
Once you take a Bayesian approach, however, and think of your coefficients as not being mathematical abstractions but actually having some meaning, you move naturally into model building and transformations.

P.S. On page 426, Kaul and Diamond recommend that, in subgroup analysis, researchers "perform adjustments for multiple comparisons." I'm ok with that, as long as they include multilevel modeling as such an adjustment. (See here for our discussion of that point.)

P.P.S. Also don't forget economist James Heckman's argument, from a completely different direction, as to why randomized experiments should not be considered gold standard. I don't know if I agree with Heckman's sentiments (my full thoughts are here), but they're definitely worth thinking about.

More like this

In which Orac gets even more "shrill and brutish" about chelation therapy and TACT

If there's one thing that a certain subset of people who view themselves as reasonable and science-based don't like, it's harshness: Harshness in criticism, harshness in discussion, or—horror of horrors!—anything they view as "incivility." That's all well and good as far as it goes, but the problem…

Pavlov's Dogs: Proving the Null With Bayesianism

How many times did Pavlov ring the bell before his dogs' meals until the dogs began to salivate? Surely, the number of experiences must make a difference, as anyone who's trained a dog would attest. As described in a brilliant article by C.R. Gallistel (in Psych. Review; preprint here), this has…

The director of NCCAM discovers Bayesian probability. Hilarity ensues.

Over the years, the criticism of "evidence-based medicine" (EBM) that I have repeated here and that I and others have repeated at my not-so-super-secret other blog is that its levels of evidence relegate basic science considerations to the lowest level evidence and elevate randomized clinical trial…

Random Variables

The first key concept in probability is called a random variable. Random variables are a key concept - but since they're a key concept of the frequentist school, they are alas, one of the things that bring out more of the Bayesian wars. But the idea of the random variable, and its key position in…

Andrew, I'd like to take this idea of transformations one step further: think about the process, and build a mathematical model that respects the character of the process. THEN do statistics (analysis of variability and missing information).

Is a variable positive by definition? Then perhaps it is the exponential of something, so taking its logarithm makes good sense. Is a variable really the sum of a large number of separate things? It's likely to have a normal-like distribution... Is your variable constrained to be less than some upper bound? Can you consider variations of your variable to be small with respect to some other scale? Perhaps your variable is the nonlinear transformation of another variable and the transformation can be represented in a taylor series? etc etc etc.

Model first... then consider what variability means for your model.

Good insight...I think the mathematicians are going for probablity and statistics theory more than classical mathematics

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Bye

July 11, 2010

I realize that I haven't been posting much here. We had some plans to use the Applied Statistics blog for other purposes but it didn't really work out, so from now on you can go to my main blog for your statistical entertainment.

"How many zombies do you know?" Using indirect survey methods to measure alien attacks and outbreaks of the undead

July 1, 2010

I've been told that it's zombie day, so I thought I'd link to this research article by Gelman and Romero: The zombie menace has so far been studied only qualitatively or through the use of mathematical models without empirical content. We propose to use a new tool in survey research to allow…

Scientists can read your mind . . . as long as the're allowed to look at more than one place in your brain and then make a prediction after seeing what you actually did

June 23, 2010

Maggie Fox writes: Brain scans may be able to predict what you will do better than you can yourself . . . They found a way to interpret "real time" brain images to show whether people who viewed messages about using sunscreen would actually use sunscreen during the following week. The scans were…

Ethical and data-integrity problems in a study of mortality in Iraq

April 27, 2010

See discussion here. I've linked to it from here because ScienceBlogger and investigative journalist Tim Lambert has written some on the topic.

Random matrices in the news

April 12, 2010

Mark Buchanan wrote a cover article for the New Scientist on random matrices, a heretofore obscure area of probability theory that his headline writer characterizes as "the deep law that shapes our reality." It's interesting stuff, and he gets into some statistical applications at the end, so I'll…