(1) The authors note (without boring the reader with details) that philosophers and historians have argued that science plays a key role in the moral vision of a society of “mutual benefit.” From this they derive the prediction that this notion of science facilitates moral and prosocial judgments. Isn’t this a little fast?
(2) Images of the “evil scientist” (in movies usually portrayed by an actor with a vaguely European accent) pervade modern culture. So if it takes only a cursory discussion of some literature to form a prediction, couldn’t one just as easily predict that priming with science makes you less moral? I’m not saying it does of course; I’m merely questioning the theoretical basis for the prediction.
(3) In Study 1, subjects read a date rape vignette (a little story about a date rape). The vignette is not included in the paper. Why not? There is a reference to a book chapter from 2001 in which that vignette was apparently used in some form (was it the only one by the way?) but most readers will not have direct access to it, which makes it difficult to evaluate the experiment. In other disciplines, such as cognitive psychology, it has been common for decades to include (examples of) stimuli with articles. Did the reviewers see the vignette? If not, how could they evaluate the experiments?
(4) The subjects (university students from a variety of fields) were to judge the morality of the male character’s actions (date rape) on a scale from 1 (completely right) to 100 (completely wrong). Afterwards, they received the question “How much do you believe in science?” For this a 7-point scale was used. Why a 100-point scale in one case and a 7-point scale in the other? The authors may have good reasons for this but they play it close to the vest on this one.
(5) In analyzing the results, the authors classify the students’ field of study as a science or a non-science. Psychology was ranked among the sciences (with physics, chemistry, and biology) but sociology was deemed a non-science. Why? I hope the authors have no friends in the sociology department. Communication was also classified as a non-science. Why? I know many communication researchers who would take issue with this. The point is, this division seems rather arbitrary and provides the researchers with several degrees of freedom.
(6) The authors report a correlation of r=.36, p=.011. What happens to the correlation if, for example, sociology is ranked among the sciences?
(7) Why were no averages per field reported, or at least a scatterplot? Without all this relevant information, the correlation seems meaningless at best. Weren't the reviewers interested in this information? And how about the editor?
(8) Isn’t it ironic that the historians and philosophers, who in the introduction were credited with having introduced the notion of science as moral force in society are now hypothesized to be less moral than others (after all, they were ranked among the non-scientists)? This may seem like a trivial point but it really is not when you think about it.
(9) Study 2 uses the vaunted “sentence-unscrambling task” to prime the concept of “science.” You could devote an entire blog post to this task but I will move on only to make a brief observation. The prime words were laboratory, scientists, hypothesis, theory, and logical. The control words were…. Well what were they? The paper isn’t clear about it but it looks like paper and shoes were two of them (there’s no way to tell for sure and apparently no one was interested in finding out).
(10) Why were the control words not low-frequency long words (assuming shoe and paper are representative for this category) that are low in imageability like the primes? Now the primes stick out like a sore thumb among the other words from which a sentence has to be formed whereas the control words are a much closer fit.
(11) Doesn’t this make the task easier in the control condition? If so, there is another confound.
(12) Were the control words thematically related, like the primes obviously were?
(13) If so, what was the theme? If not, doesn’t it create a confound to have salient words in the prime condition that are thematically related and can never be used in the sentence and to have non-salient words in the control condition that are not thematically related?
(14) Did the researchers inquire after the subjects’ perceptions of the task? Weren't the reviewers and editor curious about this?
(15) Wouldn’t these subjects have picked up on the scientific theme of the primes?
(16) Wouldn’t this have affected their perceptions of the experiment in any way?
(17) What about the results? What about them indeed? Before we can proceed, we need to clear up a tiny issue. It turns out that there are a few booboos in the article. An astute commenter on the paper had noticed anomalies in the results of the study and some impossibly large effect sizes. The first author responded with a string of corrections. In fact, no fewer than 18 of the values reported in the paper were incorrect. Here, I’ve indicated them for you.
You will not find them in the article itself. The corrections can be found in the comment section.
(18) It is good thing that PLoS ONE has a comment section of course. But the question is this. Shouldn’t such extensive corrections have been incorporated in the paper itself? People who download the pdf version of the article will not know that pretty much all the numbers that are reported in the paper are wrong. That these numbers are wrong is the author’s fault but at least she was forthcoming in providing the corrections. It would seem to be the editor's and publisher's responsibility to make sure the reader has easy access to the correct information. The authors would also be served well by this.
(19) In her correction (which comprises about 25% the size of the original paper), the first author explains that the first three studies were reran because the reviewer requested different, more straightforward dependent variables that directly assessed morality judgments rather than related judgments related to punitiveness or blame, or that were too closely tied to the domain of science, which were used in the original submission. Apparently, many of the errors occurred because the manuscript was not properly updated with the new information. Why did the reviewers and editor miss all of these inconsistencies, though?
(20) And what happened to the discarded experiments? Surely they could have been included along with the new experiments? There are no word limitations at PLoS ONE. Having authored a 14-experiment paper that was recently published in this journal, I'm pretty sure I'm right on this one.
Let’s return to the paper armed with the correct (or so we assume) results.
(21) The subjects in Study 2 were primed with “science” or read the neutral words (which were not provided to the reader) and then read the date rape vignette (which was not provided to the reader) and made moral judgments about the actions in the vignette (whatever they were). The corrected data show that the subjects in the experimental condition rated the actions as more immoral than did the control condition. However, as the correction also states, the standard deviation was much higher in the control condition (28.02) than in the experimental condition (7.96). These variances are highly unequal; doesn’t this compromise the t-test that was reported?
(22) The corrections mention that the high variance in the neutral condition is caused by two subjects, one giving the date rape a 10 on the 100-point scale (in other words, finding it highly acceptable) and the other a 40. The average for that condition is 81.57, so aren’t these outliers, at least the 10 score? (By the way, was this date-rape approving subject reported to the relevant authorities?)
(23) In Study 3 subjects received the same priming manipulation as in Experiment 2 and they rated the likelihood that they would engage in one of the several activities the next month, some of which were prosocial, some which were not. The prosocial actions listed were giving to charity, giving blood, and volunteering. Were these all the actions that were used in the experiment? It is not clear from the paper.
(24) Were the values that were used in the statistical test the averages of the responses to the categories of items (e.g., the average rating for the three prosocial actions)?
(25) And what happened to the non-prosocial activities? Shouldn't a proper analysis have included those in a 2 (prime) by 2 (type of activity) ANOVA?
(26) If this analysis is performed, is the interaction significant?
(27) In the corrected data the effect size is .85. Doesn’t this seem huge? Readers of my previous post already know the answer: Yes, to the untrained eye perhaps but it is the industry standard (Step 7 in that post).
(28) The corrections state that Study 4 originally contained a third condition but that it was left out at the behest of a reviewer who felt that it muddles rather than clarifies the findings (yes, we wouldn’t want the findings to be muddled, would we?). I appreciate the honesty but was everyone, including the editor, on board with this serious amputation?
(29) The initial version of the corrections (yes, I forgot to mention that there were two versions of corrections) mentioned that there were 26 participants in the control condition and 17 in the experimental condition. Where does this huge discrepancy come from? And does it affect the analyses?
(30) In the discussion it is mentioned that Study 2 investigated academic dishonesty. This was one of the experiments that was dropped, right? Another (minor) addition for the corrections perhaps.
I guess there are a great many more questions to ask but let me stop here. The article uses logical, hypothesis, theory, laboratory, and scientist as primes. I can make a sentence out of those: Absent a theory, it is logical that there is no basis for the hypothesis that was tested in the laboratory and (sloppily) reported by the scientist.