Early in the morning a research assistant is preparing for her first subject. She is a little nervous and quietly rehearses the instructions she is about to give. In walks the subject. His gait is a little unstable and it sure looks like he hasn’t had a shower in a long time. His eyes have trouble focusing as she starts giving the instruction and he smells funny. He is drunk.
What is she supposed to do? Should she bar him from the experiment? But he did show up on time and he will be docked course credit if she refuses to let him participate. She decides to go ahead with the experiment. Should she tell her supervisor that their first subject was drunk?
That’s the easy question. The answer is yes. But what should the supervisor do? That’s the more difficult question. Dan Ariely describes an experiment that was run in his lab. He discovered there was an inebriated subject in one of the two conditions of the experiment. There was no significant difference between the conditions. No difference unless he threw out the data from the subject who was ten sheets to the wind. This subject performed badly on the task but happened to be in the condition that Ariely had predicted would outperform the other one. He basically dragged down his team.
So initially Ariely threw out the subject’s booze-based data. But then he and his students had second thoughts. Suppose, they reasoned, that the subject had been in the condition that was predicted to do poorly on the task. Then the drunk’s data would have greatly enhanced the effect. He would have been his team's MVP! Ariely and his students probably would not have discarded the data. The group decided to rerun the experiment.
Ariely didn’t say what happened to the original experiment. In line with the emerging view on psychological experimentation that I have been describing in previous posts, the ideal solution would be to (1) keep the original experiment, (2) throw out the dipsomaniac’s data, (3) rerun the experiment, now with an exclusion rule for intoxicated subjects, and (4) report both experiments.
Yes, you did slightly “torture” your data in the first experiment but that’s okay because it’s only an exploratory experiment. The second one is confirmatory. By including both you’re not wasting any data AND you have a replication. By also posting the data, others can see the effect of including or excluding the troublesome subject.
There are two other points here. The first one is that if your effect hinges on one subject, you probably don’t have enough power. My hunch is that that there are many such studies in the literature. With larger samples, a single subject doesn’t make the difference.
The other point is that it might be good if the field converges on a number of basic subject-exclusion rules. There already seems to be some sort of implicit consensus but it might be good to make this explicit. If my experience is par for the course, most experimenters will have had to deal with subjects who were drunk, stoned (contrary to public perception, we have had many more of those in the United States than in the Netherlands), ill, distraught, preoccupied with an exam, in physical pain, numb from recent dental work, and plain uncooperative.
There are also subject-exclusion conventions that are based on the data. Data that deviate strongly from the average (for example more than three standard deviations) or that are above or below a fixed threshold are often omitted.
Including all of these rules in each and every paper would seem a tad excessive but perhaps there should be a centralized checklist that researchers can refer to in a pre-registration of their experiment. I’d be interested to hear comments on what this list should contain—if people think this is a useful idea, that is.
Often subjects are excluded because they “fail to follow instructions.” It is not always clear what is meant by this. It seems an easy way to brush inconvenient data under the rug. On the other hand, subjects are surprisingly creative at not following instructions. I could fill several posts with examples.
I’ll just give one. The very first subject I ever ran. The task was to read sentences from a computer screen and I was measuring their reading times. The subject, a law student, came out of the sound-attenuated booth and proudly announced that he had read each sentence twice. My first instinct was to raise my arms ostentatiously and yell: “You fool! I’m measuring reading times! You were instructed to read normally!” But then I realized that reading “normally” for a law student probably meant trying to memorize every word. So the subject had followed the instructions. It is just that his interpretation of them differed from mine. I didn't throw out the subject's data.
And then there are examples of subjects that defy classification. We once had an experiment with a practice task, in which subjects judged pairs of words and decided if they were antonyms. This was just to make the subjects familiar with the task of pressing yes and no keys in response to words. One of my graduate students had a bewildering interaction with a subject. I don’t recall the details of the dialogue but here is my I rendition of it.
EXPERIMENTER: In this task you are going to judge antonyms. Antonyms are words that have opposite meanings, like high-low, warm-cold, young…
SUBJECT: I get it! Like cat and dog.
EXPERIMENTER: (you're kidding, right?) No, I mean opposites, like deep-shallow, hard-soft…
SUBJECT: Yes, that’s what I’m saying, like cat and dog.
EXPERIMENTER: (what have you been smoking?) Maybe I didn’t explain it properly. I mean that high is exactly what low is not. When something is not at all low, it is high (which is probably what you are right now).
SUBJECT: Yes that’s exactly it. If something is not a cat, it is probably a dog.
EXPERIMENTER: Yes (you clown) but if it’s not a cat, it can be a million other things as well. It could be a hamster or a cow or even a garbage truck or an unsolved math problem.
SUBJECT: That doesn’t make any sense. What do garbage trucks have to do with cats? Not as much as dogs, that’s for sure
EXPERIMENTER: (I’m going to kill you and then I’m going to kill you again) Let’s start with the experiment.
If we had to cover cases like this, there would be no end to the list. But I think it is feasible to generate a list of the most common exclusion rules. Maybe it already exists. If so, I’d love to hear about it. If not, it might be useful to consider which rules should go on the list.