Thursday, April 24, 2014

Why do We Make Gestures (Even when No One Can See Them)?

The gesture doesn't work all the time
Why do we gesture? An obvious answer is that we gesture to communicate. After having just taken down his opponent in a two-legged flying tackle, the soccer player puts on his most innocent face while making a perfect sphere with his hands. This gesture conveys the following thought: “Ref, I was playing the ball! Sure, my opponent may be lying there writhing in pain and will soon be carried off on a stretcher but that’s beside the point. I do not deserve a red card.”

But we also gesture when our conversation partner cannot see us. Years ago I saw a madwoman walking in the Atlanta airport. See seemed to be talking to no one in particular while gesticulating vehemently. For a moment I was worried she might pull out a machine gun and mow us all down. But when she got closer I noticed she was speaking into a little microphone that was connected to her mobile phone (a novelty at the time). Evidently, the person that was on the receiving end of her tirade could not see her maniacal gestures.

So why do we gesture when no one can see our hands? According to one explanation, such gestures are merely habitual. We’re used to making gestures, so we keep on making them anyway even when nobody can see them. It’s a bit like a child on a tricycle. He lifts his legs but the pedals keep rotating. There is motion but it is not functional. The problem with this explanation is that it implies that we expend a lot of energy on a useless activity.

An alternative explanation proposes that gesturing is functional. It helps us retrieve words from our mental lexicon. The speaker says “At the fair we went into a uh…” He falls silent and makes a circular motion with his hand. He then looks relieved and finishes the sentence with “Ferris wheel.” The idea here is that the motoric information that drives our gestures is somehow connected with our word representations in our mental lexicon. The latter get activated when the gesture is made. Though plausible, a problem with this explanation is that is does not specify why the gesture is needed in the first place. If the motor program that drives the gesture is already present in the brain, then why loop out of the brain to make the gesture?

In a paper coming out today in Frontiers in Cognitive Science, a group of us—spearheaded by graduate student Wim Pouw—ventures an answer to this question.* People make noncommunicative gestures to reduce memory load and to externally manipulate information. We need to keep concepts active across stretches of time while performing a task, for instance solving a problem or conversing over the telephone. Rather than relying on memory to keep this information active, we outsource this task to the hands. They provide proprioceptive and visuospatial information that sustains access to a concept over time and allow us to perform operations on it (for instance manual rotation).

Support for this proposal comes from several sources. One is a classic paper by David Kirsh and Paul Maglio. Kirsh and Maglio observed that expert Tetris players often rotate objects on the screen before inserting them into their slots. They could have used mental rotation but instead prefer to rely on what Kirsh and Maglio call epistemic actions, operations on the environment to support thought.

Another line of support for our proposal comes from research on abacus users. People who learned arithmetic on an abacus make bead-moving finger gestures during mental arithmetic, when there is no abacus available. The better they are at mental arithmetic, the fewer gestures they make. This is consistent with the notion that noncommunicative gestures are epistemic actions that serve to reduce memory load. When you’re better at a task, performing it requires fewer memory resources, so you need to rely less on gestures.

So the next time you see people make gestures to no one in particular, you know that they’re just giving their memory a little hand. And if you want to know more about this topic, just read our paper.

* Our proposal was inspired by work by Richard Wesp and colleagues and by Andy Clark (see paper for references).

Tuesday, April 8, 2014

The Undead Findings are Among Us


A few months ago, I was asked to review a manuscript on social-behavioral priming. There were many things to be surprised about in this manuscript, not the least of which was that it cited several papers by Diederik Stapel. These papers had already been retracted, of course, which I duly mentioned in my review.  It has been said that psychology is A Vast Graveyard of Undead Theories. These post-retraction Stapel citations suggests that this cemetery might be haunted by various undead findings (actually, if they were fabricated, they weren’t really alive in the first place but let's not split semantic hairs).

There are several reasons why someone might cite a retracted paper. The most obvious reason is that they don’t know the paper has been retracted. Although the word RETRACTED is splashed across the first page of the journal version of the article, it will likely be absent on other versions that can still be found on the internet. Researchers working with such a version, might be forgiven for being unaware of the retraction.

But citing Stapel??? It is not like the press, bloggers, colleagues at the water cooler, the guy next to you on the plane, and not to mention Retractionwatch haven’t been all over this case!


A second reason for citing a retracted article is obviously to point out the very fact that that paper has been retracted. Nevertheless, a large proportion of citations to retracted papers are still favorable, just like the Stapel citations.

"Don't expect any help from us."
Does this imply that retracted findings have a lasting pollutive effect on our thinking? A recent study suggests they do. The Austrian researcher Tobias Greitemeyer presented subjects* with findings from a now retracted study by Lawrence Sanna (remember him?). Sanna reported that elevated height (e.g., riding up escalators) led to more prosocial (prosocial being the antonym of antisocial) behavior than lowered height (e.g., riding down escalators). The findings were found to be fabricated, which is why the paper was retracted.

Greitemeyer formed three groups of subjects. He told the first two groups about the Sanna study but not the third group. All subjects then rated the relationship between physical height and prosocial behavior.

Next the subjects wrote down all their ideas about this relationship. At the end of the experiment, half of the subjects who had received the summary, the debriefing condition, learned that the article had been retracted because of fabricated data and that there was no scientific evidence for the relation between height and prosocial behavior. Subjects in the no-debriefing and the control condition did not receive this information. Finally, all three groups of subjects responded to the same two items about height and prosocial behavior that they had responded to earlier.

As you might expect, the subjects in the debriefing and no-debriefing conditions made stronger estimates about the relation on the initial test than did those in the control condition. More interesting are the responses on the second test, after the debriefing condition (but not the other two conditions) had heard about the retraction. On this test the subjects in the no-debriefing condition had the highest score.  But the crucial finding was that the debriefing condition still exhibited a stronger belief in the relation between height and prosocial behavior than did the control condition. So, the debriefing lowered belief in the relation but not sufficiently.

Greitemeyer provides an explanation for these effects. It turns out that the number of explanations that subjects gave for the relationship between height and prosocial behavior correlated significantly with post-debriefing beliefs. A subsequent analysis showed that belief perseverance in the debriefing condition appeared to be attributable to causal explanations. So retraction does not lead to forgetting and that this cognitive inertia occurs because people have generated explanations of the purported effect, which presumably lead to a more entrenched memory representation of the effect. 

But we need to be cautious in interpreting these results. First, it is only one experiment. A direct replication of these findings (plus a meta-analysis that includes the two experiments) seems in order. Second, some of the effects are rather small, particularly the important contrast between the control and the no-debriefing condition. In other words, this study is a perfect candidate for replicating up.

After a successful direct replication, conceptual replications would also be informative. As Greitemeyer himself notes, a limitation of this study is that the subjects only read a summary of the results and not the actual paper. Another is that the subjects were psychology students rather than active researchers. Having researchers read the entire paper might produce a stronger perseverance effect, as the entire paper likely provides more opportunities to generate explanations and the researchers are presumably more capable of generating such explanations than the students in the original experiment were. On the other hand, researchers might be more sensitive to retraction information than students, which would lead us to expect a smaller perseverance effect.

Greitemeyer makes another interesting point. The relation between height and prosocial behavior seems implausible to begin with. If an effect has some initial plausibility (e.g., meat eaters are jerks) retraction might not go very far in reducing belief in the relation.

So if Greitemeyer’s findings are to be believed, a retraction is no safeguard against undead findings. The wights are among us...


*The article is unfortunately paywalled

Thursday, April 3, 2014

Replicating Down vs. Replicating Up


More and more people are involved in replication research. This is a good thing.

Why conduct replication experiments? A major motivation for recent replication attempts appears to have been because there are serious doubts about certain findings. On that view, unsuccessful replications serve to reduce the initially observed effect size into oblivion. I call this replicating down. Meta-analytically speaking, the aggregate effect size becomes smaller with each replication attempt and confidence in the original finding will dwindle accordingly (or so we would like to think). But the original finding will not disappear from the literature.

 No, I'm not Noam Chomsky
Replicating down is definitely a useful endeavor but it can be quite discouraging. You’re conducting an experiment that you are convinced doesn’t make any sense at all. Suppose someone conducted a priming study inspired by a famous quote from Woody Allen’s Husbands and Wives: I can't listen to that much Wagner. I start getting the urge to conquer Poland. Subjects were primed with Wagner or a control composer (Debussy?) and then completed an Urge-to-Conquer-Poland scale. The researchers found that the urge-tot-conquer-Poland was much greater in the Wagner than in the Debussy condition (in that condition, however, people scored remarkably higher on the Desire-to-Walk-Around-with-Baguettes scale). The effect size was large, d=1. If you are going to replicate this and think the result is bogus, then you’re using valuable time and resources that could have been spent toward novel experiments. Plus you might feel silly performing the experiment. The whole enterprise might feel all the more discouraging because you are running the experiment with twice or more the number of subjects that were used in the original study: an exercise in futility but with double the effort.

Other replication attempts are conducted because replicators have at least some confidence in the original finding (and in the method that produced it) but want to establish how robust it is. We might call this replicating up. A successful replication attempt shores up the original finding by yielding similar results and providing a more robust estimate of the effect size. But how is this replicating up? Glad you asked. Up doesn’t mean enlarging the effect size but it means raising the confidence we can have in the effect.

So while replicating down is certainly a noble and useful enterprise, a case could be made for replicating up as well. A recent nice example appears in a special topics section of Frontiers in Cognition that I’m co-editing. My colleagues Peter Verkoeijen and Samantha Bouwmeester performed a replication of an experiment by Kornell and Bjork (2008) that was published in Psychological Science. This experiment compared spaced (or actually “interleaved”) and massed practice in learning painting styles. In the massed practice condition, subjects saw blocks of six paintings by the same artist. In the spaced condition, each block contained six paintings by six different artists. Afterwards, they participated in a recognition test. Intuitively you would think that massed practice would be more effective. Kornell and Bjork thought this initially, as do the subjects in the experiments. Kornell and Bjork were therefore surprised to find that interleaved practice was actually more effective.

Verkoeijen and Bouwmeester replicated one of Kornell and Bjork’s experiments. One difference from the original experiment, which was run in the lab, was that the replication was run on Mechanical Turk. However, given that several other replication projects had shown no major differences between MTurk  experiments and lab experiments, there was no reason to think the effect could not be found in an online experiment. As Verkoeijen and Bouwmeester note:

For one, nowhere in their original paper do Kornell and Bjork (2008) indicate that specific sample characteristics are required to obtain a spacing effect in inductive learning. Secondly, replicating the effect with a sample from a more heterogeneous population than the relatively homogeneous undergraduate population would constitute evidence for the robustness and generality of the spacing effect in inductive learning and, therefore, would rule out that the effect is restricted to a rather specific and narrow population.

To cut to the chase, the replication attempt was successful (read the paper for a thoughtful discussion on this). Just as in the original study, the replication found a significant benefit for interleaved over massed practice. The effect sizes for the two experiments were quite similar. As the authors put it:

Our results clearly buttress those of Kornell and Bjork (2008) and taken together they suggest that spacing is indeed beneficial in inductive learning.

This is a nice example of replicating up. Moreover, the experiment has now been brought to a platform (MTurk) where any researcher can easily and quickly run replication attempts.

It seems that I’ve basically sung the virtues of successful replication. After all, isn’t any successful replication an upward replication? Of course it is. But I’m not talking about the outcome of the replication project. I’m talking about the motivation for initiating it. Replicating down and replicating up are both useful but in the long run upward replication is going to prove more useful (and less frustrating).

Perhaps a top-tier of journals should be created for solid findings in psychology (see Lakens & Koole, 2012 for a similar proposal). This type of journal would only publish findings that have been thoroughly replicated. The fairest way to go about this would be to have the original authors as first authors and the replicators as co-authors. Rather than trying to remove nonreplicable findings from the literature via downward replication, upward replication basically creates a new level in the literature, entrance to which only can be gained via upward replication. 

(I thank Peter Verkoeijen for pointing me toward the Woody Allen quote)


[update April 22, 2014: in my next post I discuss a study that would be a good candidate for replicating up.]