Specifying the Generalizability of Our Conclusions

The other day, the economist Andreas Ortmann kindly put in a plug for my latest blog post on Facebook. This elicited a response from another researcher, who said:
One of my friends tried to replicate one of Rolf Zwaan's findings on verb aspect and failed ages ago, except they did it in Cantonese and not Dutch. This isn't a perfect replication, but if you read the conclusions, then it should be. So part of the problem isn't even the strength of the data, it's also the fact that being over-certain and over-generalizing conclusions is the standard way psychology papers are written.
The commenter then helpfully provided this link to the article (with which I was not familiar). I found the comment interesting, even though it incorrectly states that our study (see also this post from January) was conducted in Dutch. (You can’t even perform the study in Dutch due to grammatical differences with English.) More importantly, though, the claim that our finding was not replicated in Cantonese is also incorrect in a significant way. More about this in a minute.

The most important aspect of the comment is its relevance to the current discussion about replicability, (hidden) moderators, and specifying the generalizability of effects. So let's consider this more closely.

The comment states that psychology studies routinely overgeneralize conclusions. I wholeheartedly agree with this sentiment. In fact, I'd already been thinking to devote a blog post to this and will do so in the near future. However, I happen to think our article was a particularly good example of this hyperbolic tendency. If anything, our conclusion was rather modest (if not tepid).
The topic of verb aspect must be empirically addressed in future research so that it can yield a better understanding of how the imperfective and perfective aspects affect situation models during comprehension. The present study furthers our understanding of how subtle grammatical cues such as verb aspect influence the representations formed when we read or hear language. However, this is only the beginning of our pursuit to understand how situation models are constructed from our complicated linguistic code.
Everyone will agree that we didn’t exactly go out on a limb here. We didn’t state explicitly that we thought our findings would extend to other languages, such as Cantonese (a language that we have no knowledge of). On the other hand, we also didn’t stipulate that the conclusions would be restricted to English.

The question that concerns us here is what the proper way to state the limitations of the conclusions would have been. One extreme would be to provide a list of languages for which one would expect the effect to replicate. However, this is humanly impossible. Nobody knows all nearly 7000 languages in the world. The other extreme would be to state that the effect would only be expected to replicate in English, but this also presupposes intimate knowledge of all the world’s other languages. A more realistic option would be to state that one expects the effect to replicate in English and to possibly extend to other languages with similar aspectual systems. It might have been good to add this specification but we didn’t think to do so. The final option would be to say nothing, which is what we did, relying on the reader to infer that we are agnostic with regard to whether the results will replicate in other languages.

Over to the replicability of the finding. Our target finding, called the “perfective-advantage effect” (read the papers if you’re interested in the details) was indeed replicated:
The results from both experiments 1a and 1b show that there is a perfective advantage with accomplishment verbs. This advantage is robust across two different types of perfective aspect markers in Cantonese, zo2 and jyun4.”, p. 2413.
Yee-haw! We've made inroads into Cantonese. Our first step toward world domination. India, you’re next!

Hold your horses, pardner! The researchers observed that thus far researchers (apparently our finding was replicated in other Asian studies as well) had only looked at one type of verb, called “accomplishments” in Zeno Vendler’s taxonomy. Accomplishments are actions that have an endpoint and that are incremental or gradual. An example of an accomplishment is painting a picture. The action is finished when the picture is completed.

Would the conclusions generalize to other verb types? And here the researchers showed that the perfective advantage reverses to an imperfective advantage (in Cantonese at least), with another verb type namely “activities,” which do not have an endpoint, e.g., run.

So did our effects replicate? Yes, there is a conceptual replication in a different language, Cantonese, with the same verb type and the same task. And no, the effect flips (in Cantonese at least), with a different verb type and the same task. In other words, the effect replicates and does not replicate, but there is method to the madness and this is what matters theoretically. Grammatical aspect interacts with lexical aspect (accomplishment vs. activities). In other words, we have a better understanding of how grammatical aspect affects processing.

Should we (in 2003) have mentioned that our prediction only applied to accomplishments? Yes, I think so. However, this limitation had simply not occurred to us.

There are several lessons here.

(1) It is important to specify the limitations of our findings. However, (a) it is not always possible to do so and (b) sometimes we lack the perspective to do so. We definitely shouldn't oversell our results, though.

(2) As is obvious, there are degrees of directness in replications. Replicating a finding in a different language is powerful support for a prediction. It is, however, not a direct replication of an effect.

(3) Science proceeds by performing replications and extensions of predictions and by detecting their limitations (i.e., failed extensions).

(4) It’s easy to misremember studies. I do it all the time.

I plan to return in a future post to explore the question of how best to determine the generalizability of our predictions.

