Earlier this year I taught a new course titled Foundations of Cognition. The course is partly devoted to theoretical topics and partly to methodological issues. One of the theoretical topics is free will and one of the methodological topics is replication. There is a lab associated with the course and I thought we’d be killing two birds with one stone if we’d try to replicate a study that was discussed in the first, theoretical, part of the course. The students would then have hands-on experience with replication of a study that they were familiar with. Moreover, we could discuss the results in the context of the methodological literature that we read in the second part of the course.
The experiment I had selected for our replication attempt was Experiment 1 from Vohs & Schooler (2008) on whether a lowered belief in free will would lead people to cheat more. I thought that this was a relatively simple experiment—in terms of programming—that could be run on Mechanical Turk (we needed to be able to collect the data fast, given that it was a five-week course). My first impression after a cursory reading of the article was that we might replicate the result.
In the experiment, subjects read one of two texts, both passages from Francis Cricks 1994 book The Astonishing Hypothesis. One passage argues that free will is an illusion and the other passage discusses consciousness but does not mention free will. These texts were cleverly chosen, as they are similar in terms of difficulty and writing style. After reading the passages, the subjects complete the Free Will and Determinism scale and the PANAS.
Next comes the meat of the experiment. Subjects solve 20 mental-arithmetic problems (e.g., 1 + 8 + 18 - 12 + 19 - 7 + 17 - 2 + 8 – 4 = ?) but are told that due to a programming glitch, the correct answer will appear on the screen and that they can make it disappear by pressing the spacebar. So if the subject does not press the spacebar we know they are cheating. Vohs and Schooler (V&S) found that the subjects who had read the anti-free-will text cheated more often than those who had read the neutral text. More about the results later.
My graduate student, Lysanne Post, who is collaborating with me on this, contacted the first author of the paper, informing her about our replication attempt. She was helpful in providing information that could not be gleaned from the paper. It turns out the experiment was run in 2003 and the first author did not remember all of the details of that study. But with the information that was provided and some additional sleuthing we were able to reconstruct the experiment.
We ran the experiment on Mechanical Turk, using 150 subjects. This should give us awesome power because the original experiment used 30 subjects and the effect size was large (.82).
In V&S's study, subjects in the AFW condition reported weaker free will beliefs (M = 13.6, SD = 2.66) than subjects in the control condition (M = 16.8, SD = 2.67). In contrast, we found no difference between the AFW condition (M = 25.90, SD = 5.35) and the control condition (M = 25.11, SD = 5.37), p = .37. Also, our averages are noticeably higher than V&S’s.
How about the effect on cheating?
V&S found that subjects in the AFW condition cheated more often (M = 14.00, SD = 4.17) than subjects in the control condition (M = 9.67, SD = 5.58), p < .01, an effect of almost one standard deviation! In contrast, we found no difference in cheating behavior between the AFW condition (M = 4.53, SD = 5.66) and the control condition (M = 5.97, SD = 6.83), p = .158. Clearly, we did not replicate the main effect. It is also important to note that the average level of cheating we observed was much lower than that in the original study.
V&S reported a .53 correlation between scores on the Free Will subscale and cheating behavior. We, on the other hand, observed a nonsignificant .03 correlation.
There was a further issue. About half our subjects indicated they did not believe the story about the programming glitch (we kind of feared that this might happen). We analyzed the data separately for “believers” and “nonbelievers” but found no effect of condition in either group.
What might account for this series of stark differences between our findings and those of V&S? I will discuss some ideas in my next post. Then I will also talk about some lessons we learned from this replication attempt. Meanwhile, it might be good to reference my first post, which talks about the why of doing replication studies.