The Value of Believing in Free Will: A Replication Attempt

update February 26, 2014. Early March we'll be submitting a manuscript that includes both the experiment described here and another replication attempt run in the lab.

Earlier this year I taught a new course titled Foundations of Cognition. The course is partly devoted to theoretical topics and partly to methodological issues. One of the theoretical topics is free will and one of the methodological topics is replication. There is a lab associated with the course and I thought we’d be killing two birds with one stone if we’d try to replicate a study that was discussed in the first, theoretical, part of the course. The students would then have hands-on experience with replication of a study that they were familiar with. Moreover, we could discuss the results in the context of the methodological literature that we read in the second part of the course.

The experiment I had selected for our replication attempt was Experiment 1 from Vohs & Schooler (2008) on whether a lowered belief in free will would lead people to cheat more. I thought that this was a relatively simple experiment—in terms of programming—that could be run on Mechanical Turk (we needed to be able to collect the data fast, given that it was a five-week course). My first impression after a cursory reading of the article was that we might replicate the result.

In the experiment, subjects read one of two texts, both passages from Francis Cricks 1994 book The Astonishing Hypothesis. One passage argues that free will is an illusion and the other passage discusses consciousness but does not mention free will. These texts were cleverly chosen, as they are similar in terms of difficulty and writing style. After reading the passages, the subjects complete the Free Will and Determinism scale and the PANAS.

Next comes the meat of the experiment. Subjects solve 20 mental-arithmetic problems (e.g., 1 + 8 + 18 - 12 + 19 - 7 + 17 - 2 + 8 – 4 = ?) but are told that due to a programming glitch, the correct answer will appear on the screen and that they can make it disappear by pressing the spacebar. So if the subject does not press the spacebar we know they are cheating. Vohs and Schooler (V&S) found that the subjects who had read the anti-free-will text cheated more often than those who had read the neutral text. More about the results later.

My graduate student, Lysanne Post, who is collaborating with me on this, contacted the first author of the paper, informing her about our replication attempt. She was helpful in providing information that could not be gleaned from the paper. It turns out the experiment was run in 2003 and the first author did not remember all of the details of that study. But with the information that was provided and some additional sleuthing we were able to reconstruct the experiment.

We ran the experiment on Mechanical Turk, using 150 subjects. This should give us awesome power because the original experiment used 30 subjects and the effect size was large (.82).

In V&S's study, subjects in the AFW condition reported weaker free will beliefs (M = 13.6, SD = 2.66) than subjects in the control condition (M = 16.8, SD = 2.67). In contrast, we found no difference between the AFW condition (M = 25.90, SD = 5.35) and the control condition (M = 25.11, SD = 5.37), p = .37. Also, our averages are noticeably higher than V&S’s.

How about the effect on cheating?

V&S found that subjects in the AFW condition cheated more often (M = 14.00, SD = 4.17) than subjects in the control condition (M = 9.67, SD = 5.58), p < .01, an effect of almost one standard deviation! In contrast, we found no difference in cheating behavior between the AFW condition (M = 4.53, SD = 5.66) and the control condition (M = 5.97, SD = 6.83), p = .158. Clearly, we did not replicate the main effect. It is also important to note that the average level of cheating we observed was much lower than that in the original study.

V&S reported a .53 correlation between scores on the Free Will subscale and cheating behavior. We, on the other hand, observed a nonsignificant .03 correlation.

There was a further issue. About half our subjects indicated they did not believe the story about the programming glitch (we kind of feared that this might happen). We analyzed the data separately for “believers” and “nonbelievers” but found no effect of condition in either group.

What might account for this series of stark differences between our findings and those of V&S? I will discuss some ideas in my next post. Then I will also talk about some lessons we learned from this replication attempt. Meanwhile, it might be good to reference my first post, which talks about the why of doing replication studies.

Reacties

Unknown18 maart 2013 om 15:48
I look forward to hearing your thoughts on this. In my own research I directly compared a priming study conducted in person and online, and found significant results for the in-person version but not online (not even close). Do you consider the Vohs and Schooler study to be "priming"? My best explanation is that, for any priming effects that actually exist, they could be wiped out by any number of distractions experienced by online participants (e.g., television).
BeantwoordenVerwijderen
Reacties
hans18 maart 2013 om 15:50
Plus, how likely is it that one really believes such an online programming glitch?
BeantwoordenVerwijderen
Reacties
Mark Brandt18 maart 2013 om 16:55
I think the failed manipulation check and the high number of people who reported not believing the story make it difficult to draw *any* conclusions from this replication attempt other than it might not be worthwhile to run all types of studies on Mturk.
BeantwoordenVerwijderen
Reacties
Brock Dubbels18 maart 2013 om 20:15
I think you should change the mechanic.

I would recommend that you have a puzzle game, where the subject puts together a puzzle of different shapes. (http://goo.gl/SDdA8)

They should be presented a card with the shape on the screen, and be told that the answer is on the back. That they can flip the card to check their answer. This will allow them the choice to cheat, and you can check the state of the puzzle before they flip the card.

You could also present the card "accidentally" with the answer facing the user. This would allow them the chance to flip it or discard it. We could build in a function to discard, as well as just to move a card to the completed pile and full-scale cheat. This could be pushed with a timer and completion counter.

You can also introduce an aspect to see how a multiplayer, or monitored option effects the outcome.

I'll make the game for you.
BeantwoordenVerwijderen
Reacties
Greg Francis19 maart 2013 om 15:13
I think the most curious aspect of the replication attempt is demonstrated in the comments. I suspect people would not have been so critical of the methods and MTurk if the result had shown the expected effect. That is a troubling reaction, because the validity of the experiment should be judged by the methods, not by the results.
BeantwoordenVerwijderen
Reacties
hans19 maart 2013 om 15:18
Right. Not true, but nice for confirmatory evidence if you are seeking to get confirmation of your hypothesis that (other) people are bad researchers.

But, that is why the a priori registration (which Rolf is actually doing in his Frontiers Special Issue) is so nice. Rolf has told me about this experiment before (and before telling me the results) and my response was the same.
BeantwoordenVerwijderen
Reacties
Anoniem19 maart 2013 om 16:49
I'm confused. What's the value here? There's no measurement or control of the population sampled, environment, experimenter demands, individual differences, etc. So why are these results so different than those collected in the lab?

Until we begin studies specifically aimed at answering that question, I fail to see any value in compiling a list of failed replications on M-Turk. It's like asking why polar ice cap melting rates can't be replicated in my kitchen. Context matters.
BeantwoordenVerwijderen
Reacties
Fili20 maart 2013 om 12:23
this is a worth while effort and discussion. I must say that I've also failed on two related tasks when it comes to online administration on Mturk - (1) couldn't make this specific free will manipulation work (meaning, manipulation checks didn't work out), and (2) i couldn't get a meaningful variance in various cheating measures that have worked for me in the lab.

with that said, i'm not entirely sure this is all about the mturkers or running this online. though i know of a few studies that have successfully ran this manipulation with similar or conceptually related DVs (aggressive behavior, prosocial behavior, etc.), i also know of quite a few failed runs. it would be worthwhile to try and aggregate those findings (meta?) to try and understand when and why this works or doesn't.
BeantwoordenVerwijderen
Reacties
Anoniem20 maart 2013 om 21:38
Meta-analysis is key here - failure to replicate is way too common for my comfort level, even within my own lab! I recall one study I did as a post-doc, finding completely opposite (significantly so) results before and after spring recess. And it was a "simple" memory study. My advisor at the time told me "welcome to human behavior." As they say... if the brain were simple enough to understand, we'd be too simple to understand it.
BeantwoordenVerwijderen
Reacties
Unknown21 december 2013 om 15:36
When a remarkable finding is published, few challenge it. But when the finding does not replicate, readers bend over backwards to explain why the effect might be real. In reports of priming, effects are large and robust, so why small procedural changes wipe it out entirely? Moreover, a number of online priming effects have been published, including studies with MTurk subjects (e.g., Caruso, Vohs, et al. JEPG 2013).
BeantwoordenVerwijderen
Reacties
mambo_bab21 december 2013 om 15:57
I felt it was a little difficult to understand the experiment result.
What is the difference between two texts?
What cause the cheat difference?
BeantwoordenVerwijderen
Reacties
Unknown11 juni 2016 om 18:52
Where might one go to figure out which passages you (and earlier Vohs & Schooler) used for free will experiments? (Philosopher of science here, I'm not planning a replication myself.)
BeantwoordenVerwijderen
Reacties

Reactie toevoegen

Drang naar Samenhang

Zoeken in deze blog

The Value of Believing in Free Will: A Replication Attempt

Reacties

Een reactie posten