Wednesday, January 27, 2016

Linguistic Cues and Firing Guns: Some Background on a Registered Replication Report

Today the second Registered Replication Report (RRR) was published in Perspectives on Psychological Science.  I am one of its authors. Here is some background on the project. I'll take you on a brief trip across three fields: linguistics, cognitive psychology, and social psychology. I'll discuss the results of the RRR and conclude with some lessons learned.

Language is a tool with which we can “shape events in each other’s brains with exquisite precision” [1]. Linguistic analysis shows how subtle a tool language is. Take for example grammatical aspect. We can describe the same event by saying He was running to the finish line or by saying He ran to the finish line. Linguists have argued that these utterances “construe” the event differently [2]. Whereas the past progressive (was running) opens up the internal structure of the event, the simple past (ran) describes the event as completed.

To get a better feel for this distinction, let's append a clause to the sentence: He was running to the finish line but he tripped and broke his ankle. This makes perfect sense—he was in the middle of something and then something else happened and he never got to finish the first thing. But now let's append the clause to the perfective sentence: He ran to the finish line but he tripped and broke his ankle. This doesn’t make sense; he’s already at the finish line and so the event’s internal structure is not accessible. There is no room, so to speak, for him to trip and break his ankle.

Unlike linguists, cognitive psychologists are charged with figuring out whether such distinctions impact language processing. A long time ago, I took a stab at this. Take the sentence He stirred/was stirring his coffee. The action of stirring coffee normally involves a spoon. The idea was to measure priming for the word spoon as a function of grammatical aspect, the hypothesis being that there should be more priming after was stirring than after stirred. My student and I found initial evidence for this hypothesis—subjects responded faster to spoon after a was stirring than after a stirred sentence (we obviously used many such sentences)and presented it at a conference [3]. The plan was to conduct follow-up experiments but all of a sudden my student informed me of his decision to move back to his home state and the project was abandoned.

Fortunately, other researchers took the idea an ran with it [4]. The evidence showed that grammatical aspect indeed seems to modulate how much we activate about an event. For example, there is more activation for arena after was skating than after skated. My student Carol Madden and I later found that comprehenders are focused on the endpoint of the action in the perfective but don’t show a clear temporal focus when presented with the imperfective [5]. Joe Magliano and his student showed that in progress activities had a higher likelihood of being perceived as ongoing in the subsequent context than do completed activities [6].

Social psychologists have taken this work one step further [7]. The researchers hypothesized that grammatical aspect can modulate the attribution of intentionality, which could be crucial in legal cases. Take the following vignette.
Varying grammatical aspect had a large effect on judgments of criminal intentionality, as well as on two other dependent measures. For example, criminal intentionality was rated higher in the imperfective case (was firing) than in in the perfective case (fired). 

At a conference in Seattle in 2011, a group of us were talking about this study. We found it an interesting extension of our own work on aspect. We were also somewhat puzzled by the effect sizes reported in the paper, given the more modest effect sizes usually found in cognitive experiments and the fact that the intention attribution presupposes a cognitive representation of the event. If anything, you’d expect the more indirect effect to be smaller than the more direct effect.

We reasoned that the indirect effect might be relatively large for three reasons. First, the study only used one vignette and had a between-subjects design. Had a cognitive psychologist designed the experiment, they’d probably have each subject evaluate something like 24 vignettes, with 12 per aspect condition, and a counterbalancing scheme and an equal number of fillers thrown in to hide the manipulation. We suspected that maybe within-subjects designs depressed the effect sizes in cognitive experiments.  Subjects somehow got used to the manipulation, so that its effects weakened over time. What spoke against this is that in informal analyses of other experiments, we’d never seen much of a decline of the effect across trials, just a general speed up. But maybe knowing ahead of time whether you’d be reading a single or 48 vignettes made the difference.

Second, vignettes in cognitive psychology characteristically are rather bland affairs (people stirring coffee or skating on ice rinks), whereas here we had to do with extreme violence. Maybe higher emotional content generates larger effects.

And third, the cognitive experiments assessed response times and one cannot reliably assess response times in a single-observation between-subjects design. We speculated that maybe ratings were more stable measures.

We decided to build on this research given our combined interests in language and legal decision making (some of us even blogged about it), using the same paradigm but larger samples of subjects. We were particularly interested in manipulating the provocation of a violent event. Would people judge the crime differently as a function of the severity of the provocation? We manipulated severity via grammatical aspect. And we were interested in verdicts.  Would people’s ultimate judgments of first vs. second degree murder be swayed by verb aspect? To cut a long story short, the evidence for intentionality was disappointingly weak and mixed, although we found strong effects on a cognitive measure [8]. Based on the results reported in [7], we’d expected to find more robust effects, given that we used upwards of 2.5 their number of subjects.

Was it the case, perhaps, that people were insensitive to aspect altogether in this paradigm? To examine this, we included (from Experiment 2 onwards) a measure that asked subjects: how many times did X hit Y? Here we consistently found highly robust effects of grammatical aspect. If the progressive was used (e.g., was hitting), subjects estimated more instances of hitting than with the simple past (e.g., hit). Evidently, people were sensitive to grammatical aspect in this paradigm but this did not extend convincingly to intentionality judgments. Because of this, our fourth experiment was a close replication of Experiment 3 in [7]. We found no effect.

At this point, we decided to take a few steps back to check the foundations on which we were building. We initiated a registered replication project, performing direct replications of the Experiment 3 in [7]. The results have now been published in Perspectives on Psychological Science [9]

The forest plot below represents one of the three main dependent measures, criminal intentionality. It shows the original study at the top (the difference between the simple past and progressive in terms of intentionality ratings). Underneath are the 12 replication studies. The meta-analytic effect (which excluded the MTurk study and the original experiment, as dictated by the protocol) is practically zero; if anything, it goes in the opposite direction from what was predicted. [Note how the one MTurk study is spot on with regard to the meta-analytic effect, whereas the original effect is not.] None of the replication experiments showed an effect in the predicted direction, although two showed an effect in the opposite direction. The same pattern emerged for the two other dependent measures, intention attribution and detailed processing: no effects in the predicted direction.

There was one change made to the original vignette. The editor, Alex Holcombe, noted that it is odd to state he was pulling out and he was pointing because you cannot be doing both at the same time. This makes sense in light of the description of the imperfective aspect that I gave at the beginning of this post. In the vignette, was pulling indicates that we're in the middle of a pulling event. Given the temporal dependency between pulling and pointing--you first have to finish pulling a gun before you can point it--you cannot be pulling and pointing at the same time. So the proper way to convey that the action was completed is to use the perfective, pulled. Both the original authors and replicators agreed with the editor's proposed change to the protocol. After all, it would be silly if the original effect hinged on a nonsensical phrase.

And the evidence we have suggests that it doesn't. As the Replication Report mentions, one more replication was run, by Chris Kurby. This experiment used the original text. It could not be included in the meta-analysis because it was not preregistered but it is available on the Open Science Framework. This study did not yield significant effects for any of the three dependent measures (all ps >.28). For example, the means for criminal intentionality are 3.79 for imperfective and 3.86 for perfective, which is quite similar to the meta-analytic effect shown above.

So what are we to make of this? Evidently, the large original effect was thoroughly non-replicated. Yet our own findings, with much more power, showed some, but inconsistent, effects of grammatical aspect on intentionality judgments and legal decision making, although not in the experiment (Experiment 4) that was modeled after Hart and Albarracin (2011). At the same time, we found strong evidence that our subjects were sensitive to the semantics of the aspect manipulation. This sensitivity, however, clearly failed to extend consistently to intentionality judgments, most notably in experiments that were modeled after the original experiment.

So I think the most reasonable conclusion at this point is this. Linguistic analysis and cognitive processing data suggest that grammatical aspect can have an impact on event comprehension. However, we need to think of a different paradigm to investigate whether these effects extend to social cognition.

There are several lessons we can draw from this.
  1. First conduct a direct replication before you decide to build on a finding. Of course, Richard Feynman already said this (pp. 12-13) a long time ago.
  2. Even if you find large cognitive effects based on a linguistic manipulation, these do not readily translate to consistent effects at the social level. We need to know more about the mechanisms involved.
  3. A single nonreplication may not mean much, but a dozen surely does.
  4. Don’t call something by the wrong name in a scientific contribution, as I did by saying verb aspect rather than grammatical aspect, because later authors will copy your mistake (see references).


[1] Pinker, S. (1994). The Language Instinct. New York, NY: Harper Perennial Modern Classics, p. 15.

[2] Langacker, R.W. (1987). Foundations of cognitive grammar. Vol. 1, Theoretical prerequisites. Stanford, CA: Stanford University Press.

[3] Truitt , T. P., & Zwaan, R. A. (1997, November). Verb aspect affects the generation of instrument inferences. Paper presented at the 38th Annual Meeting of the Psychonomic Society, Philadelphia.

[6] Magliano, J.P. & Schleich, M.C. (2000) Verb Aspect and Situation Models, Discourse Processes, 29, 83-112.

[7] Hart, W., & AlbarracĂ­n, D. (2011). Learning about what others were doing: Verb aspect and attributions of mundane and criminal intent for past actions. Psychological Science, 22, 261-266. doi: 10.1177/0956797610395393