The latest in a long series of articles making me glad I don’t work in psychology was this piece about replication in the Guardian. This spins off some harsh criticism of replication studies and a call for an official policy requiring consultation with the original authors of a study that you’re attempting to replicate. The reason given is that psychology is so complicated that there’s no way to capture all the relevant details in a published methods section, so failed replications are likely to happen because some crucial detail was omitted in the follow-up study.
Predictably enough, this kind of thing leads to a lot of eye-rolling from physicists, which takes up most of the column. And, while I have some sympathy for the idea that studying human psychology is a subtle and complicated process, I also can’t help thinking that if the font in which a question is printed is sufficient to skew the result of a study one way or the other, then maybe these results aren’t really revealing deep and robust truths about the way our brains work. Rather than demanding that new studies duplicate the prior studies in every single detail, a better policy might be to require some variation of things that ought to be insignificant, to make sure that the results really do hold in a general way.
If you go to precision measurement talks in physics– and I went to a fair number at DAMOP this year, there will inevitably be a slide listing all the experimental parameters that they flipped between different values. Many of these are things that you look at and say “Well, how could that make any difference?” and that’s the point. If changing something trivial– the position of the elevator in the physics building, say– makes your signal change in a consistent way, odds are that your signal isn’t really a signal, but a weird noise effect. In which case, you have some more work to do, to track down the confounding source of noise.
Of course, that’s much easier to do in physics than psychology– physics apparatus is complicated and expensive, but once you have it, atoms are cheap and you can run your experiment over and over and over again. Human subjects, on the other hand, are a giant pain in the ass– not only do you need to do paperwork to get permission to work with them, but they’re hard to find, and many of them expect to be compensated for their time. And it’s hard to get them to come in to the lab at four in the morning so you can keep your experiment running around the clock.
This is why the standards for significance are so strikingly different between the fields– psychologists (and biomedical researchers) are thrilled to see results that are significant at the 1% level, while in many fields of physics, that’s viewed as a tantalizing hint, and a sign that much more work is required. But getting enough subjects to hit even the 3-sigma level at which physicists become guardedly optimistic would quickly push the budget for your psych experiment to LHC levels. And if you’d like those subject to come from outside the WEIRD, well…
At the same time, though, physicists shouldn’t get too carried away. From some of the quotes in that Guardian article, you’d think that experimental methods sections in physics papers are some Platonic ideal of clarity and completeness, which I find really amusing in light of a conversation I had at DAMOP. I was talking to someone I worked with many years ago, who mentioned that his lab recently started using a frequency comb to stabilize a wide range of different laser frequencies to a common reference. I asked how that was going, and he said “You know, there’s a whole lot of stuff they don’t tell you about those stupid things. They’re a lot harder to use than it sounds when you hear Jun Ye talk.”
That’s true of a lot of technologies, as anyone who’s tried to set up an experimental physics lab from scratch learns very quickly. Published procedure sections aren’t incomplete in the sense of leaving out critically important steps, but they certainly gloss over a lot of little details.
There are little quirks of particular atoms that complicate some simple processes– I struggled for a long time with getting a simple saturated absorption lock going in a krypton vapor cell, because the state I’m interested in turns out to have hellishly large problems with pressure broadening. That’s fixable, but not really published anywhere obvious– I worked it out on my own before I talked to a colleague who did the same thing, and he said “Oh, yeah, that was a pain in the ass…”
There are odd features of certain technologies that crop up– the frequency comb issue that my colleague mentioned at DAMOP was a dependence on one parameter that turns out to be sinusoidal. Which means it’s impossible to automatically stabilize, but requires regular human intervention. After asking around, he discovered that the big comb-using labs tend to have one post-doc or staff scientist whose entire job is keeping the comb tweaked up and running properly, something you wouldn’t really get from published papers or conference talks.
And there are sometimes issues with sourcing things– back in the early days of BEC experiments, the Ketterle lab pioneered a new imaging technique, which required a particular optical element. They spent a very long time tracking down a company that could make the necessary part, and once they got it, it worked brilliantly. Their published papers were scrupulously complete in terms of giving the specifications of the element in question and how it worked in their system, but they didn’t give out the name of the company that made it for them. Which meant that anybody who had the ability to make that piece had all the information they needed to do the same imaging technique, but anybody without the ability to build it in-house had to go through the same long process of tracking down the right company to get one.
So, I wouldn’t say that experimental physics is totally lacking in black magic elements, particularly in small-lab fields like AMO physics. (Experimental particle physics and astrophysics are probably a little better, as they’re sharing a single apparatus with hundreds or thousands of collaboration members.)
The difference is less in the purity of the approach to disseminating procedures than in the attitude toward the idea of replication. And, as noted above, the practicalities of working with the respective subjects. Physics experiments are susceptible to lots of external confounding factors, but working with inanimate objects makes it a lot easier to repeat the experiment enough times to run those down in a convincing way. Which, in turn, makes it a little less likely for a result that’s really just a spurious noise effect to get into the literature, and thus get to the stage where people feel that failed replications are challenging their professional standing and personal integrity.
It’s not impossible, though– there have even been retractions of particles that were claimed to be detected at the five-sigma level. And sometimes there are debates that drag on for years, and can involve some nasty personal sniping along the way.
The really interesting recent(-ish) physics case that ought to be a big part of a discussion of replication in physics and other sciences is the story of “supersolid” helium, where a new and dramatic quantum effect was claimed, then challenged in ways that led to some heated arguments. Eventually, the original discoverers “>re-did their experiments, and the effect vanished, strongly suggesting it was a noise effect all along. That’s kind of embarrassing for them, but on the other had, speaks very well to their integrity and professionalism, and is the kind of thing scientists in general ought to strive to emulate. My sense is that it’s also more the exception than the rule, even within physics.