In the current discussion about the methodological crisis in psychology, there are calls to (1) include exact self-replications in papers, (2) provide chronological rather than plot-driven accounts of your research, and (3) publish null results. This makes a lot of sense but what happens when you heed these calls? You might end up with a 14-experiment Behemoth!
That’s at least what’s happened to us with a paper we’re writing right now. It is on the memory effects of direct versus indirect speech. I’ll talk about the contents in a later post. Here I’m talking about the sheer size of the beast. How did we manage to breed our Behemoth?
We started out with a hypothesis (direct speech is “livelier” than indirect speech and should therefore lead to better memory representations) that seemed highly plausible given the literature. We found the exact opposite of what we predicted—a convincing effect, even by Bayesian standards. We tried to replicate it. Same pattern. So now we had 2 experiments but no cigar.
We thought the problem might be with the memory probe that we used. So we used a different one. Again we found the opposite of what you would predict. This finding replicated as well. So that’s 4 experiments and still no cigar.
Then we thought that it had to do with the placement of our memory probe. So we placed it at a later time. No, that didn’t do the trick. Still the opposite effect of what we originally had predicted. We replicated this. So that’s 6 experiments (are you counting along?).
Then one of us thought it had to do with the fact that we used visual probes, where auditory ones might be more appropriate given certain findings in the literature. So we used auditory probes. First we used pure tones. No dice. A resounding null effect, which replicated. That’s 8. Enough, right?
No because then we thought it was actually stupid to use tones and so we used words instead (the words left and right). Again null effects, which replicated. So that’s 10. Then we thought it was stupid after all to use words like left and right that were unrelated to the target sentences so we used words from the sentences instead. Again the results knocked our prediction right out of the park: two big fat null effects. So that’s 12. (If this gets boring to you, imagine how we felt.)
We thought we couldn’t finish on this note. It would be like a band closing off a concert with a drum solo (or two dancing dwarves and a miniature Stonehenge for Spinal Tap fans). We were looking for a grand finale. Our final hypothesis was that direct speech leads to better memory for the exact wording of a sentence than indirect speech. Drum roll...Yes! A convincing effect. And it even replicated! And that’s 14.
So now we’re writing this Behemoth of a paper and then we’re going to try to find a home for it [update: the paper was published in 2013]. I actually think it will be a highly informative and interesting paper. Our Null results are meaningful because our experiments were high-powered and we used Bayesian statistics, which enabled us to quantify the strength of evidence for the Null. And our other results are partly counterintuitive and partly as expected; and taken together, they present a coherent picture. Moreover, we have great confidence in the results because of their high power and reproducibility.
But I’m still wondering what the paper would have looked like under the old regime.
It might have looked like this. We wouldn’t have run the exact replications, so that leaves us with 7 experiments. We would have started off the paper with one of the last two experiments (of course we’d actually run only one of them). A nice effect that is consistent with the literature. Then we would have reported one of the first two experiments (again, the one we ran). Hey, interesting, a counterintuitive effect! Then we would have reported the third experiment, which would perhaps be presented as a conceptual replication of the first experiment. We probably would have also included one of the experiments with the longer time interval.
And that’s it! We wouldn’t have reported the null effects. Instead of 14 experiments, we'd be down to a healthy 4. Instead of the Behemoth, we’d have a sleek foal, which would probably be a lot easier to sell and would make us look like considerably more competent breeders.
So that’s maybe what the new psychology will look like: a collection of large beasts lumbering around in the field instead of a herd of happily prancing foals. But at least the beasts have their feet planted firmly on the ground.
Daniel Lakens (@lakens) suggests via Twitter that the paper is actually not 14 experiments but 7 because the replications can be summarized in a table. According to him, the paper is a warhorse. I agree the metaphor is apter.