Monday, January 28, 2013

The Preliminary Results are in!


Update February 3, 2013, I removed the figure for reasons explained in my next post. However, I wanted to keep the text of this post to provide a documentation of the process. I'm trying to be open as well as accurate.

Holy cow! The data are in already! 

As I explained in my previous post, I was interested to see what the effects of amusing titles are on the perception of the results of studies published over the years in Psychological Science. Subjects read the titles and the associated abstracts of twelve articles and indicated their confidence and interest in the results of these studies as presented in the abstracts. They used an 11-point scale to do so, with 0 being extremely low and 10 extremely high confidence and interest in the findings.

The preliminary results are in! It only took about 5 hours to collect the data. I was worried that the Mechanical Turk subjects would find the task too difficult and/or boring. Instead, they seemed to like it quite a bit! 

I haven’t had time to do very detailed analyses yet but I wanted to present the initial findings now. 

My prediction was that amusing titles would lead to lower confidence but higher interest in the findings. I was partly right and partly wrong, as the initial results show.

Indeed, amusing titles lead to lower confidence in the associated findings. Surprisingly (to me at least), they also lead to lower interest in the findings. Overall, the subjects had moderate amounts of confidence and interest in the findings of the twelve studies. 

The differences between amusing and non-amusing titles may be small (at least, a half point seems small to me) but they are highly significant.

The error bars indicate 95% confidence intervals. I also computed Bayes factors. They show that there is very strong evidence (but see my next post) that amusing titles lead to lower confidence in the results of the study than do non-amusing titles. They also show that there is very strong evidence for the surprising result that amusing titles lead to lower interest in the associated results than do non-amusing titles. In other words, I did not just find significant effects because of the large sample size. The Bayes factor guards against false positives like this.

So clearly (but see my next post), amusing titles affect the perception of research in a negative way! 

I guess it’s unusual and a little risky to design an experiment, pre-register it on the same day, run it a few days later, analyze the data the same day that they were collected, write an initial report a few hours later, and post it online immediately afterwards (no kidding!). If only I were younger,  I'd attribute it to youthful enthusiasm. Now I don't have an excuse.

I will provide a more detailed report later. 

Wednesday, January 23, 2013

Proposing a study on the effects of amusing titles


Update 1 after feedback from @hansijzerman on Twitter
Update 2 after comments from Thomas Schubert (see below) and Steve Fiore (via Facebook)
Final update after comments from Eefje Rondeel and Michał Parzuchowski (see below) as well as some further thinking on my part.
Final final update. After a test-run, I decided to present titles first and then title-plus-abstract. This makes it harder to ignore the titles and provides me with RTs on the titles. 

In my last post I showed that amusing titles are on the rise in Psychological Science. It reached its highest proportion of amusing titles (PAT) in 2012: .41. Similar PATs can be found in recent volumes of Social Psychological and Personality Science and the Journal of Experimental Social Psychology.

I am interested to see what the effects of these titles are on the impression that readers have of the associated research. I am therefore proposing to examine this question empirically in an online experiment. In doing this study, I will combine various things that I have talked about in this blog in the past few weeks.
  •          I hereby preregister the design and sampling plan of the experiment.
  •          I will use Mechanical Turk to collect the data.
  •          I will post the data on the Open Science Framework.
  •          I will discuss the results in a subsequent post.

So here is the proposed study.

Hypothesis. Amusing titles lower the confidence in the associated research but may enhance its newsworthiness or perceived interestingness. (This is the hypothesis that seems to be implied by what I have been writing. I’m not sure I completely believe it, but that’s why we do research.)

Subjects. Two-hundred subjects will be recruited from Mechanical Turk. (This is not ideal, as one could argue that MTurk subjects are not the typical readers of psychology articles.)

Stimuli. The stimuli are 12 article titles plus their associated abstracts taken from Psychological Science. There are two versions of each title-abstract pair, one with an amusing title and one with a non-amusing title. Non-amusing versions of titles were created by removing the pre-colon part of the title. An example is:
Something in the Way She Sings: Enhanced Memory for Vocal Melodies. 
The non-amusing version is:
Enhanced Memory for Vocal Melodies. 

After each title-abstract pair, subjects will be asked to indicate their level of (1) confidence and (2) interest in the results of the associated study by using a slider showing which yields values from 1-10 (and shows a pretty nifty thermometer). 

There is also one true-false statement per article. This statement summarizes the main findings of the study or states the opposite of the main findings. 

Design. Type of title will be counterbalanced across subjects such that each subject will see six pairs with amusing titles and six without and each version will be seen by an equal number of subjects. This yields a 2 (title condition) X 2 (counterbalancing list) X 2 (question type: confidence vs. interest) design, with title and question type as within-subjects factors and list as a between-subjects factor.

Titles were selected such that the pre-colon part was judged not to provide additional information relative to the post-colon part. Abstracts were selected that had a minimum amount of technical terms, making them suitable for a lay audience.

Psychological Science has varied over the years how it presents titles. Overall, the pre-colon part has been emphasized over the post-colon part. Early on, the pre-colon part of the title was in all caps whereas the post-colon part was in title case. Later on, title case was used throughout but the pre-colon part was printed in a larger font than the post-colon part. In recent years, the same font size has been used for both sides of the colon. The present experiment, presents the pre-colon part in a larger font (18) than the post-colon part (14), which is consistent with what the full-text looks like on the Psych Science site, e.g., here. In the non-amusing condition, the post-colon part is presented in 18 font.

The true/false judgments are included mainly to gauge the subjects' understanding of the articles. One might hypothesize that amusing titles lead to lower scores than non-amusing titles (I have no idea at this point how difficult this task will be for the subjects, so there might be a floor effect). 

Procedure. Subjects will be instructed that they are to judge the scientific value of 12 psychology studies based on short descriptions of the research that was performed. Each subject will see the 12 studies in a different random order. In each trial, subjects will first see a screen with the title. The next screen will consist of the title of the study plus the abstract, Viewing times for each screen will be measured. The final screen presents the two statements asking subjects about their level of confidence and interest in the study, respectively. Subjects will use sliders to indicate their confidence and interest.

Next, the subjects will judge the twelve true/false statements (in a different random order for each subject). Finally, they will fill out a demographic questionnaire asking about age, gender, native language, education, and interest in and knowledge of psychological research. 


Subject exclusion. Data from nonnative speakers of English will be excluded. Data from the last-run subject(s) of the longest list will be discarded to ensure that the counterbalancing lists are of equal length. In addition, data from subjects will be removed if their viewing times for 2 or more abstracts are < 30 sec.

Data exclusion. If the viewing time is below 30 sec, the associated trial will be eliminated because this time is far too short to meaningfully read the abstract.

Analysis. The data will be analyzed using analysis of variance. The main prediction is that amusing titles yield lower confidence ratings but higher attention-worthiness ratings than non-amusing titles, in other words an interaction between condition and question type. Follow-up analyses are planned. Bayes factors will be computed to guard against false positives.

I expect to have the results sometime next week.


Monday, January 21, 2013

Amusing Titles in Psychological Science II: What is your PAT?



 It may only be a matter of time before the pun rises again says the BBC. Is that time already upon us in psychological science? Let’s see.

In my previous post, I provided a qualitative analysis of amusing titles in Psychological Science. Here is the quantitative part. I counted the number of amusing titles per year in the journal in the past decade— least, that’s what I said yesterday. But then I got busy and analyzed all volumes of Psych Science instead.

There were some issues I had to face. First, as Psych Science was trying to find its form, it featured article types like general articles, feature reviews, commentaries letters to the editor, and so on. I limited my analysis to empirical articles, called research reports and research articles (although there was a period in which they were called “original articles”).

Second, I am a cognitive psychologist and therefore less familiar with social, clinical, and developmental psychology. This is somewhat problematic because I wasn’t always sure whether something was a theoretical construct or an allusion. For example, as a cognitive psychologist I know that “flashbulb memory” is a construct. Otherwise, I might have thought it was part of an amusing title. I probably wasn’t as discerning with regard to the other fields, so my picks might reflect a certain “cognitive bias” and I might have false-alarmed to some titles in the other sub-areas.

Third, there was an issue of what to do with retracted papers by acknowledged fraudsters like Diederik Stapel and Lawrence Sanna. I decided to keep them in. After all, the papers were accepted at the time. Also, it doesn’t look like there’s a correlation between fraud and amusing titles. As I said in my previous post, everyone uses amusing titles.

In discussing the results, I’m introducing a new index, the PAT (Proportion of Amusing Titles). I computed the annual PATs for Psych Science from its inception in 1990 through 2012 (the last complete year).

Without further ado, here are the results. They clearly show that amusing titles are on the rise in Psych Science. There was an early start of .15 but then the PAT dropped down to 0, making 1993 the only year in Psych Science history without amusing titles. The most amusing year in Psych Science history was 2012, with a PAT of .41. This is the record to beat.

So Psych Science's PAT more than doubled over the years. Of course, the PAT corrects for the number of articles per volume. If we look at absolute numbers, we get an idea of how many amusing titles entered the scientific literature via Psych Science.  The low was zero, as we already know. The absolute high was reached in 2010 with 105 amusing titles! 

What are the PATs from specialty areas within psychological science? Do social psychologists have higher PATs than cognitive psychologists? And is the increase in PATs as seen in Psych Science part of a more general trend? I addressed these questions by comparing Psych Science to two other journals, which I selected because along with Psych Science, they are part of the reproducibility project.

A group of researchers is trying to replicate the findings published in the 2008 issues of Psych Science, the Journal of Personality and Social Psychology (JPSP) and the Journal of Experimental Psychology: Learning, Memory, and Cognition (JEP:LMC). I computed the PATs for the 2008 and 2012 volumes of all these two journals and then compared them to Psych Science.

It is clear that—at least for the journals examined—social psychologists have higher PATs than cognitive psychologists. Also clear, however, is that the increase in PATs I found for Psych Science is not in evidence for the other two journals (at least not for the periods I examined).

For a more complete picture, it might be interesting to have PATs for other journals and maybe even individual researchers and entire areas of research and fields. Then we can not only ask What is your H-index or What is your impact factor but also What is your PAT

Sunday, January 20, 2013

Amusing Titles in Psychological Science


I recently wrote a post about amusing article titles, pointing to  a tendency in the current psychological literature (and proposal as well in other fields) to blur the boundaries between scientific and popular scientific discourse. Here I want to discuss this trend in more detail.

I want to start by saying that I’m not immune to this trend myself. I managed to resist it until 2004 but more than half of my 2012 articles had “amusing” titles. In fact, my co-authors snuck two very similar titles up on me: Out of Sight out of Mind and Out of Mind out of Sight. I also had Spreading the Words and Language in the Balance. In my defense, I only was responsible for the last one.

In my previous post, I talked about the reasons for using amusing titles. The main one is the pressure to make your research relevant to a broader audience. But is it true that amusing titles are on the rise?

I take Psychological Science as my test case, examining amusing article titles published in that journal in the last decade. I have published in Psych Science myself (four articles and a fifth one in press) and think it has been a great addition to the field in many ways. There clearly was a need for short incisive articles. Psych Science was the first to fulfill that need and has quickly risen to prominence in the field.

At the same time, it is obvious that the journal has come under fire in recent years. I agree with some of the criticism. For example, when you see a single issue featuring the following titles, you can’t help but wonder what kind of image of psychological science (the field) we are creating.

Sticky Thoughts: Depression and Rumination Are Associated With Difficulties Manipulating Emotional Material in Working Memory
Knowing Your Own Mate Value: Sex-Specific Personality Effects on the Accuracy of Expected Mate Choices
Becoming a Vampire Without Being Bitten: The Narrative Collective-Assimilation Hypothesis
Of Blood and Death: A Test of Dual-Existential Systems in the Context of Prosocial Intentions
Time Crawls: The Temporal Resolution of Infants’ Visual Attention
Power and Choice: Their Dynamic Interplay in Quenching the Thirst for Personal Control
Learning Words in Space and Time: Probing the Mechanisms Behind the Suspicious-Coincidence Effect
Who Took the “×” out of Expectancy-Value Theory?: A Psychological Mystery, a Substantive-Methodological Synergy, and a Cross-National Generalization
The Jekyll and Hyde of Emotional Intelligence: Emotion-Regulation Knowledge Facilitates Both Prosocial and Interpersonally Deviant Behavior

What do I mean by amusing article titles? 

I mean titles that are not directly descriptive of the theory, method, or findings. An example of a descriptive title is this.

Infants' Perception of Phrase Structure in Music

And here is an example of an amusing title.

Serial vs. Parallel Processing: Sometimes They Look Like Tweedledum and Tweedledee but They Can (and Should) Be Distinguished


A descriptive title just names the phenomenon, a theory, a model, the method, or the findings, something like The Effect of X on Y,  A Theory of Q, or A New Method for Assessing Z. An amusing title adds nondescriptive information to this or uses nondescriptive information exclusively. (By the way, these two titles are from the very first issue of Psych Science published in January 1990, so the journal was at it at an early age.)

I culled amusing titles from the 2003-2012 issues of Psych Science. It was not always easy to determine what was an amusing title and what not. For example, I initially false-alarmed to this one.

Chicks Like Consonant Music

It actually is descriptive. So it may be that others would come up with a slightly different set of titles than I did. I think the differences will be small though.

In this post I will share some qualitative observations (if you want the entire list of amusing titles, just contact me). In my next post, I will present some quantitative information and compare Psych Science to two other journals.

What types of amusing titles are there?

Alluring allusions

The authors refer to some literary work, or song—mostly songs, actually (I realize that some of these are also regular expressions, of course).

Don't Stand So Close to Me: The Effects of Self-Construal on Interpersonal Closeness
Running on Empty: Neural Signals for Self-Control Failure
Comfortably Numb: Desensitizing Effects of Violent Media on Helping Others
You Can't Always Get What You Want: Infants Understand Failed Goal-Directed Actions
Something in the Way She Sings: Enhanced Memory for Vocal Melodies

But there are also movies:

Scent of a Woman: Men’s Testosterone Responses to Olfactory Ovulation Cues
Apocalypse Soon?: Dire Messages Reduce Belief in Global Warming by Contradicting Just-World Beliefs

And, yes, literary allusions:

Peace and War: Trajectories of Posttraumatic Stress Disorder Symptoms Before, During, and After Military Deployment in Afghanistan
For Whom the Mind Wanders, and When: An Experience-Sampling Study of Working Memory and Executive Control in Daily Life
How Can I Connect With Thee?: Let Me Count the Ways
How Do I Love Thee? Let Me Count the Words: The Social Effects of Expressive Writing
Who Shalt Not Kill? Individual Differences in Working Memory Capacity, Executive Control, and Moral Judgment
The Jekyll and Hyde of Emotional Intelligence: Emotion-Regulation Knowledge Facilitates Both Prosocial and Interpersonally Deviant Behavior
Local Jekyll and Global Hyde: The Dual Identity of Face Identification

And allusions to linguistic theory:

Colorless Green Ideas (Can) Prime Furiously

As well as odd political allusions:

Misconceptions of Memory: The Scooter Libby Effect

But some managed to resist the alluring power of allusion:

Distraction and Placebo: Two Separate Routes to Pain Control
Two Forms of Spatial Imagery: Neuroimaging Evidence

Kudos to the authors for not sneaking  “A Tale of Two” in there.

Proverbial titles

Some titles use (variations on) common expressions, slang, and proverbs.

Read My Lips: Asymmetries in the Visual Expression and Perception of Speech Revealed Through the McGurk Effect
Why the Sunny Side Is Up: Associations Between Affect and Vertical Position
Falling on Sensitive Ears: Constraints on Bilingual Lexical Activation
Connecting the Dots Within: Creative Performance and Identity Integration
Discovering That the Shoe Fits: The Self-Validating Role of Stereotypes
The Left Hand Doesn't Know What the Right Hand Is Doing: The Disruptive Effects of Attention to the Hands in Skilled Typewriting

Stimulus packaging

Some titles are intended to create initial puzzlement (WTF?) by elevating one of the stimulus items from the experiment(s) to titular status.

On Wildebeests and Humans: The Preferential Detection of Negative Stimuli
Cherry Pit Primes Brad Pitt: Homophone Priming Effects on Young and Older Adults' Production of Proper Names
Head Up, Foot Down: Object Words Orient Attention to the Objects' Typical Location
Leaning to the Left Makes the Eiffel Tower Seem Smaller: Posture-Modulated Estimation

Always alliteration

In my previous post on this topic, I mentioned that amusing titles are representative of the poetic function of language. Nowhere is this more obvious than in alliteration.

Animals and Androids: Implicit Associations Between Social Categories and Nonhumans
Connections From Kafka: Exposure to Meaning Threats Improves Implicit Learning of an Artificial Grammar
Of Snakes and Succor: Learning Secure Attachment Associations With Novel Faces via Negative Stimulus Pairings
Company, Country, Connections: Counterfactual Origins Increase Organizational Commitment, Patriotism, and Social Investment
Facing Freeze: Social Threat Induces Bodily Freeze in Humans
Border Bias: The Belief That State Borders Can Protect Against Disasters
Tough and Tender: Embodied Categorization of Gender
Etiquette and Effort: Holding Doors for Others
Money and Mimicry: When Being Mimicked Makes People Feel Threatened
Story Spoilers Don’t Spoil Stories
The Cost of Collaboration: Why Joint Decision Making Exacerbates Rejection of Outside Information
The Cost of Callousness: Regulating Compassion Influences the Moral Self-Concept (
The Herding Hormone: Oxytocin Stimulates In-Group Conformity


Why use amusing titles?

These are the most common categories. I think they all serve the same set of causally connected purposes: (1) attract attention to themselves (the poetic function that I talked about in my previous post), (2) therefore be memorable, (3) therefore become a sound bite for the popular and social media, (4) therefore appeal to the general public, and (5) therefore show university administrators and politicians that our work is relevant to the world.

Who are the perpetrators?

Pretty much everyone. I have already turned myself in. Perpetrators include a self-acknowledged fraudster like Diederik Stapel.

The Secret Life of Emotions [retracted]
Emotion Elicitor or Emotion Messenger?: Subliminal Priming Reveals Two Faces of Facial Expressions [retracted]

(Obviously “[retracted]” was not part of the original titles.) But they also include those who are very vocal about the current state of the field and are proposing reforms. For example, Hal Pashler committed:

Measuring the Crowd Within: Probabilistic Representations Within Individuals

And Joseph Simmons, co-author of the well-known article (in Psych Science) on false-positive psychology is guilty of:

Believe It or Not: On the Possibility of Suspending Belief
Moniker Maladies: When Names Sabotage Success

And last but not least, even Nobel Prize winners are in on the act. Daniel Kahneman wrote:

Zeroing in on the Dark Side of the American Dream: A Closer Look at the Negative Consequences of the Goal for Financial Success

Leave it to a Nobel Prize winner to come up with a title with three amusing components!

More in my next post.

Saturday, January 12, 2013

Overly Amusing Article Titles


As the recent hashtags #overlyhonestmethods and #overlyhonestreviews show, scientists have a great sense of humor. This humor is not just evident on Twitter. It is also transparent in the titles of many journal articles. Two favorites of mine are:



And here is a very recent one not from psychology (I hope you are not eating):


I’m not sure I have actually read the two psychology articles (I am sure I didn’t read the fecal one) but their titles certainly have stuck with me. Does having an amusing title may make an article more memorable and therefore more likely to be cited? According to the availability heuristic the answer is yes but the data suggest otherwise.

A citation analysis concludes the following: the use of exceptionally amusing titles (2 standard deviations above the average rated amusement) was associated with a substantial ‘penalty’ of around 33% of the total number of citations. Of course this does not mean that amusing titles cause an article to be less cited but clearly amusing titles are not associated with high citation rates.

Why use amusing titles then? I can think of a few reasons.
  • The authors want to show that they’re not stodgy intellectuals but actually a lot of fun.
  • The authors want to show they’re no strangers to literature or popular culture. (Did you get the Nirvana allusion, nudge nudge, wink, wink?)
  • The allusion is almost inevitable. When a study compares two things, authors can’t seem to avoid starting their title with “A tale of two…” I entered this phrase in Google Scholar and got 168,000 hits! There are tales of two theories, methods, studies, paradigms, diseases, auto plants, serines, webspaces, calcium channels, futures, perspectives, responses, mechanisms,researchers, sciences, phases, semiconductor nanocrystals, fibrillations, and yes, cities.
  • Some people are attracted by perceived inappropriateness of combining science and humor. It’s naughty. Science is serious business, sousing humor is a little like swearing in church or sex in the workplace (not that I would know, of course).
  • The title is used as a clever repartee to the title of anearlier study. A sort of intellectual jiu jitsu: using your opponent’s force against himself. In a class I teach, I assign an article called Turning the tables (which is a punny description of the task used in the experiments) and the response to it, which is called Returning the tables (don’t worry, I don’t assign these articles just because of their titles.)
  • The researchers involved in the project informally refer to it in a clever way during lab meetings. The pun sticks and becomes the title of the article.

All of these factors may play a role in the creation of amusing titles but I think there is an even more potent factor. Researchers are under increasing pressure to communicate their findings to a broader audience. A memorable title can be used in a press release and readily be adopted as a headline in a news report. Researchers provide their own ready-made headlines. I have some recent experience with this, where a title intended as a parody on non-literal titles took on a life of its own and may have contributed to us winning an Ig Nobel Award. But more about this in a future post.

The Russian linguist Roman Jakobson has developed a theory about the functions of language. In his model, there is a sender who sends a message to a receiver about a certain context through a certain channel using acertain code. For example, I am sending you a message through the visualchannel in the code of English about the context of amusing titles.

Each of these components can be emphasized in the message. So there are six functions of language. The ones that are relevant here are thereferential function and the poetic function. In the referential function, the focus of the message is on some content area. In the poetic function, the focus of the message is on the message itself. Using the expression piece of shit in the title of a scientific article certainly puts the focus on the title itself (though calling it poetic might seem a stretch).

Multiple functions are at work simultaneously in a text but there is always one dominant function. In the type of titles we have discussed here the poetic function is dominant. This also explains why amusing titles are easier to remember: they focus attention on themselves.

In the current climate, journal articles will likely become less narrative and more report-like, as I described in an earlier post. As a result, the poetic function will give way to the referential function. The focus will be on the content and not on the writing. This does not mean that amusing titles and puns will disappear from scientific discourse though. Nor should they. They have a natural habitat in the blogs that researchers write about their work.

Update
Neil Martin just sent me a whole list of hilarious psychology titles. Thanks, Neil!

Tuesday, January 8, 2013

Opening the Floodgates?


Well, here it is, our call for proposals regarding pre-registered replication studies on cognition. When I say “our” I hasten to add that we stole and adapted the text of this call, with their permission, from Brian Nosek and Daniël Lakens who are guest editing a special issue of Social Psychology. Their original text is a great model for how replication studies should be solicited. It might become a standard feature of many journals in the years to come.
           
Our special issue will be an interesting experiment, the results of which we are awaiting with some trepidation. How many submissions can we expect? No idea. Are we opening the floodgates or will it be a slow trickle? Only time will tell. We take courage in our Dutch heritage (and I don’t mean Dutch courage). Taming the waters is in our blood.



Call for Proposals
Special Issue of Frontiers in Cognition
“Replications of Important Results in Cognition”
Guest Editors: René Zeelenberg & Rolf A. Zwaan

A signature strength of science is that the evidence is reproducible.  However, direct replications rarely appear in psychology journals because standard incentives emphasize novelty over verification (for background see Nosek, Spies, & Motyl, 2012, Perspectives on Psychological Science). This special issue, “Replications of Important Results in Cognition,” alters those incentives.  We invite proposals for high-powered, direct replications of important results in all areas of cognitive psychology, ranging from perception to social cognition. The review process will focus on the soundness of the design and analysis, not whether the outcome is positive or negative. 

What are important results?
Importance is subjective but demonstrable.  Proposals must justify the replication value of the finding to be replicated. To merit publication in this issue, the original result should be important (e.g., highly cited, a topic of intense scholarly or public interest, a challenge to established theories), but also should have uncertain truth-value (e.g., few confirmations, imprecise estimates of effect sizes). The prestige of the original publishing journal is not sufficient to justify replication value. 

What replication formats are encouraged?
Proposals should be for direct replications that faithfully reproduce the original procedure, materials, and analysis for verification. Conceptual replications that attempt to improve theoretical understanding by changing the operational definition of the constructs will not be considered for this issue. Articles in the issue can take two forms:

(1) Registered replication.  Authors submit the introduction, methods, and analysis plan for a replication study or studies. These proposals will be reviewed for their importance and soundness. Once provisionally accepted, if authors complete the study as proposed, the results will be published without regard to the outcome. Registered replication proposals also could include: (a) collaborations between two or more laboratories independently attempting to replicate an effect with the same materials, (b) joint replication by the original laboratory and another laboratory, or (c) adversarial collaborations in which laboratories with competing expectations prepare a joint registered proposal and conduct independent replications. Only adequately powered tests of results with high replication value will be considered.

(2) Registered replication + existing replication attempts.  Researchers may already have performed several experiments attempting to replicate published findings. These experiments may be included in the submission, but each submission should include at least one registered replication. Authors should report the experiments they have already completed (including the results) and describe the registered experiment that they plan to run.


How do I propose a replication project?
Interested authors should contact the guest editors before preparing a formal proposal (Rolf Zwaan, zwaan@fsw.eur.nl; René Zeelenberg, zeelenberg@fsw.eur.nl). These pre-proposal discussions will occur in early 2013, with the special issue scheduled for publication in 2014. Deadlines for the formal proposal and final manuscript depend on the type of project.  Registered replication proposals should be submitted by April 1, 2013 to leave time for initial review, revision, provisional acceptance, data collection, manuscript preparation, final review, and acceptance of the final report.








Registered Replications

Replication teams submit a replication proposal for review prior to initiating data collection.  Peer review includes an author of the original study and other relevant experts.  Review addresses two questions: (1) Does the finding have high replication value?, and (2) Is the proposed design a fair replication?  If accepted, the proposal is registered at the Open Science Framework (http://openscienceframework.org/) or equivalent registration site, and then data collection may commence.[1]  Proposals are accepted for publication conditional on following through with a competent execution of the proposed design.

Registered Replication Proposal
Replication proposals include a short introduction that describes the to-be-replicated finding, summarizes the existing evidence about the finding, and articulates why the finding has high replication value.  If the authors have already performed one or more replication attempts and wish to include those in their final paper these experiments and the results should be described in the proposal. The methods section is the central part of the replication proposal.  Because data collection for the registered experiment cannot start before the paper is provisionally accepted there are no results or discussion sections for registered experiments in the proposal.  The following should be included in the methods section:

1. Sampling plan.  Power analysis based on effect size of the existing evidence for the finding; planned sample size, manner of recruitment, and anticipated sample characteristics.  Ideally, studies will achieve .95 power to have equal likelihood of erroneously rejecting and accepting the null hypothesis.  When such power is not feasible, provide justification. 

2. Materials and procedure.  Ideally, authors obtain the original study materials to maximize comparability of the replication attempt. If not feasible or desirable, explain why.  Procedures are described as completely as possible so that reviewers can identify potential design improvements.  Additional material - such as videos simulating experimental conditions - can be made available to enhance transparency and review.

3. Known differences from original study.  No replication is exact.  Fair replications reproduce the features considered critical for obtaining the result.  Authors describe known differences and explain why these are not critical for a fair replication of the original finding. In general, authors should avoid making “improvements” on research designs unless a reasonable expert would agree that the changes improve the sensitivity of the design to detect the effect.

4. Confirmatory analysis plan.  The ideal analysis plan includes an executable script that would process the data and produce the confirmatory results.  At minimum, authors should describe the data cleaning plan - i.e., exclusion criteria for participants and observations; and the analysis process for evaluating the replication attempt. It should also describe the basis for evaluating the success of the replication.

Registering the Replication Project
Accepted proposals, and all shareable materials, are registered and made available publicly through the Open Science Framework (http://openscienceframework.org/) or equivalent registration venue.

Other research teams, including the original study authors, will have the opportunity to submit proposals to conduct a parallel replication following the registered project protocol.  Interested teams should email one of the editors of the special issue for information on applying. If accepted, these parallel replications will be very short independent papers that follow the “primary” report quickly summarizing design alterations, confirmatory results, and a brief conclusion.

Final Reports
After data collection and analysis.  Authors add a results section, discussion section, and an abstract.  The results section separated into two parts: confirmatory analysis and exploratory analysis.  The confirmatory analysis section reports the outcome of the registered analysis plan.  Authors may add exploratory analyses to further examine the finding.  Exploratory analyses should not address questions orthogonal to verification of the target of replication.  The discussion question summarizes the findings, draws conclusions about the results, and identifies limitations of the research.  Authors are encouraged to report a meta-analytic estimate and confidence interval for the study combined with the original study and any other replications. 

After acceptance of the final report, and any other parallel replications, the original study authors may be invited to write a brief commentary.  The commentary could also include reporting of a parallel replication attempt. 


Monday, January 7, 2013

Pre-registration at the journal desk


I was recently asked to co-guest-edit a special issue of Frontiers in Cognition on “failures to replicate.” I liked the idea of a special issue. I just didn’t think it had the right angle. What if someone had “successfully” replicated a study, would they not be allowed to submit? I was worried this would create a kind of reverse file drawer problem. Only if the replication was unsuccessful was it a candidate for publication. Others have expressed the same concern.

If you think about it, it makes sense. Nonreplications are in a superficial sense more informative than replications.  Replications are like someone in the desert yelling: “Look over there, an oasis” followed by someone else yelling “Yes, I see it too.” A nonreplication is like the second person yelling “No, that’s not an oasis, it’s a mirage.”   

At a deeper level, however, both are informative. The replication gives us greater confidence in the presence of an oasis. After all, how can we stake our lives on a single dehydrated member of our crew of explorers? The nonreplication decreases our confidence in the presence of an oasis and helps us in potentially wasting our resources (or even lives). Still, nonreplications seem sexier than replications. I fell for this myself when, in a previous post, I said “The most interesting effect occurred…” referring to the one nonreplication in the paper.

So how do we eliminate this inherent bias toward nonreplication? The highly useful Psychfiledrawer site lists replication attempts in psychology. Right now, there are about twice as many nonreplications as there are replications listed but it is still early in the game, so there is certainly no evidence of a nonreplication bias. On the contrary, the curious fact presents itself that the site reports a successful replication of Bem’s work on precognition (as well as an unsuccessful one). Moreover, we really have no idea what the percentage of findings is that will replicate. The Reproducibility Project will give us an estimate for the 2008 volumes of three different journals.

Still, there is a way to avoid bias and that is to use pre-registration. The steps required are nicely outlined here. Researchers register their replication attempt beforehand. They indicate why it is important to replicate a certain study, they perform power analyses, and they specify the research plan. This proposal is reviewed and if it checks out, the paper is provisionally accepted, regardless of the results. Provisionally accepted studies are carried out and the results are included in the paper. The full paper will then be reviewed to make sure the authors have delivered what they promised to do and for methodological accuracy and a fair discussion. The outcome of the experiment will play no role anywhere during the evaluation process.

The editors of Frontiers in Cognition liked our plan and so we are going to go ahead with it. I will provide more information and a call for proposals in my next post.

To close off with an anecdote, here is the labyrinthine route toward nonreplication that we once took. We discussed an interesting paper outside of our research area during a lab meeting. We developed ideas on how to tweak the paradigm described in the paper for our own studies on language.

Our first experiment, titled “Object 1” (maybe we had the precognition that this was the first in a series) was an abysmal failure. Not a failure to replicate—we weren’t even trying to replicate—just a bad experiment. Object 2 was not much better and then we realized we should probably move closer to the original experiment. This is what we did in successive steps in Object 3 through Object 12. By now we were pretty close to the original experiment. Object 13 was our final attempt: a very close replication. Again no effect. We gave up. Apparently, this paradigm was beyond our capabilities.

I discussed our failed attempts with a colleague at a conference. He said he had also had repeated failures to get the effect and then contacted the author (which we should have done as well, of course). He found out there was a critical aspect to the manipulation that was not mentioned in the paper. With this component, the effect proved reproducible.

The authors can be faulted for not including this component in the paper. It wasted a lot of our, the colleague’s, and probably a lot of other people’s time. But maybe the authors had simply forgotten to mention this critical detail or they were not aware of its critical role. This just goes to show that no detail is too trivial to mention in a method section.

There is another point and maybe it doesn’t reflect well on us. We went about it bass ackwards. Rather than taking the paradigm and run with it, we should have sat down and try an exact replication of the original finding first— Object 1 in this alternate universe. If we hadn’t been able to replicate the original finding, there probably would not have been alternate Object 2 through 13 and we would have had a lot of alternate time to run other experiments.

Saturday, January 5, 2013

What shall we do with the drunken subject?



Early in the morning a research assistant is preparing for her first subject. She is a little nervous and quietly rehearses the instructions she is about to give. In walks the subject. His gait is a little unstable and it sure looks like he hasn’t had a shower in a long time. His eyes have trouble focusing as she starts giving the instruction and he smells funny. He is drunk.

What is she supposed to do? Should she bar him from the experiment? But he did show up on time and he will be docked course credit if she refuses to let him participate. She decides to go ahead with the experiment. Should she tell her supervisor that their first subject was drunk?

That’s the easy question. The answer is yes. But what should the supervisor do? That’s the more difficult question. Dan Ariely describes an experiment that was run in his lab. He discovered there was an inebriated subject in one of the two conditions of the experiment. There was no significant difference between the conditions. No difference unless he threw out the data from the subject who was ten sheets to the wind. This subject performed badly on the task but happened to be in the condition that Ariely had predicted would outperform the other one. He basically dragged down his team.

So initially Ariely threw out the subject’s booze-based data. But then he and his students had second thoughts. Suppose, they reasoned, that the subject had been in the condition that was predicted to do poorly on the task. Then the drunk’s data would have greatly enhanced the effect. He would have been his team's MVP! Ariely and his students probably would not have discarded the data. The group decided to rerun the experiment.

Ariely didn’t say what happened to the original experiment. In line with the emerging view on psychological experimentation that I have been describing in previous posts, the ideal solution would  be to (1) keep the original experiment, (2) throw out the dipsomaniac’s data, (3) rerun the experiment, now with an exclusion rule for intoxicated subjects, and (4) report both experiments.

Yes, you did slightly “torture” your data in the first experiment but that’s okay because it’s only an exploratory experiment. The second one is confirmatory. By including both you’re not wasting any data AND you have a replication. By also posting the data, others can see the effect of including or excluding the troublesome subject.

There are two other points here. The first one is that if your effect hinges on one subject, you probably don’t have enough power. My hunch is that that there are many such studies in the literature. With larger samples, a single subject doesn’t make the difference.

The other point is that it might be good if the field converges on a number of basic subject-exclusion rules. There already seems to be some sort of implicit consensus but it might be good to make this explicit. If my experience is par for the course, most experimenters will have had to deal with subjects who were drunk, stoned (contrary to public perception, we have had many more of those in the United States than in the Netherlands), ill, distraught, preoccupied with an exam, in physical pain, numb from recent dental work, and plain uncooperative. 

There are also subject-exclusion conventions that are based on the data. Data that deviate strongly from the average (for example more than three standard deviations) or that are above or below a fixed threshold are often omitted. 

Including all of these rules in each and every paper would seem a tad excessive but perhaps there should be a centralized checklist that researchers can refer to in a pre-registration of their experiment. I’d be interested to hear comments on what this list should contain—if people think this is a useful idea, that is.

Often subjects are excluded because they “fail to follow instructions.” It is not always clear what is meant by this. It seems an easy way to brush inconvenient data under the rug. On the other hand, subjects are surprisingly creative at not following instructions. I could fill several posts with examples.

I’ll just give one. The very first subject I ever ran. The task was to read sentences from a computer screen and I was measuring their reading times. The subject, a law student, came out of the sound-attenuated booth and proudly announced that he had read each sentence twice. My first instinct was to raise my arms ostentatiously and yell: “You fool! I’m measuring reading times! You were instructed to read normally!” But then I realized that reading “normally” for a law student probably meant trying to memorize every word. So the subject had followed the instructions. It is just that his interpretation of them differed from mine. I didn't throw out the subject's data.

And then there are examples of subjects that defy classification. We once had an experiment with a practice task, in which subjects judged pairs of words and decided if they were antonyms. This was just to make the subjects familiar with the task of pressing yes and no keys in response to words. One of my graduate students had a bewildering interaction with a subject. I don’t recall the details of the dialogue but here is my I rendition of it.

EXPERIMENTER: In this task you are going to judge antonyms. Antonyms are words that have opposite meanings, like high-low, warm-cold, young…
SUBJECT: I get it! Like cat and dog.
EXPERIMENTER: (you're kidding, right?)  No, I mean opposites, like deep-shallow, hard-soft…
SUBJECT: Yes, that’s what I’m saying, like cat and dog.
EXPERIMENTER: (what have you been smoking?) Maybe I didn’t explain it properly. I mean that high is exactly what low is not. When something is not at all low, it is high (which is probably what you are right now).
SUBJECT:  Yes that’s exactly it. If something is not a cat, it is probably a dog.
EXPERIMENTER: Yes (you clown) but if it’s not a cat, it can be a million other things as well. It could be a hamster or a cow or even a garbage truck or an unsolved math problem.
SUBJECT: That doesn’t make any sense. What do garbage trucks have to do with cats? Not as much as dogs, that’s for sure
EXPERIMENTER: (I’m going to kill you and then I’m going to kill you again) Let’s start with the experiment.

If we had to cover cases like this, there would be no end to the list. But I think it is feasible to generate a list of the most common exclusion rules. Maybe it already exists. If so, I’d love to hear about it. If not, it might be useful to consider which rules should go on the list.