Thursday, February 26, 2015

Can we Live without Inferential Statistics?

The journal Basic and Applied Social Psychology (BASP) has taken a resolute and bold step. A recent editorial announces that it has banned the reporting of inferential statistics. F-values, t-values, p-values and the like have all been declared personae non gratae. And so have confidence intervals. Bayes factors are not exactly banned but aren’t welcomed with open arms either; they are eyed with suspicion, like a mysterious traveler in a tavern.

There is a vigorous debate in the scientific literature and in the social media about the pros and cons of Null Hypothesis Significance Testing (NHST), confidence intervals, and Bayesian statistics (making researchers in some frontier towns quite nervous). The editors at BASP have seen enough of this debate and have decided to do away with inferential statistics altogether. Sure, you're allowed to submit a manuscript that’s loaded with p-values and statements about significance or the lack thereof, but they will be rigorously removed, like lice from a schoolchild’s head.

The question is whether we can live with what remains. Can we really conduct science without summary statements? Because what does the journal offer in their place? It requires strong descriptive statistics, distributional information, and larger samples. These are all good things but we need to have a way to summarize our results, not just because so we can comprehend and interpret them better ourselves and because we need to communicate them but also because we need to make decisions based on them as researchers, reviewers, editors, and users. Effect sizes are not banned and so will provide summary information that will be used to answer questions like:
--what will the next experiment be?
--do the findings support the hypothesis?
--has or hasn’t the finding been replicated?
--can I cite finding X as support for theory Y?*

As to that last question, you can hardly cite a result saying This finding supports or does not support the hypothesis but here are the descriptives. The reader will want more in the way of a statistical argument or an intersubjective criterion to decide one way or the other. I have no idea how researchers, reviewers, and editors are going to cope with the new freedoms (from inferential statistics) and constraints (from not being able to use inferential statistics). But that’s actually what I like about the BASP's ban. It gives rise to a very interesting real-world experiment in meta-science. 

Sneaky Bayes
There are a lot of unknowns at this point. Can we really live without inferential statistics? Will Bayes sneak in through the half-open door and occupy the premises? Will no one dare to submit to the journal? Will authors balk at having their manuscripts shorn of inferential statistics? Will the interactions among authors, reviewers, and editors yield novel and promising ways of interpreting and communicating scientific results? Will the editors in a few years be BASPing in the glory of their radical decision?  And how will we measure the success of the ban on inferential statistics? The wrong way to go about this would be to see whether the policy will be adopted by other journals or whether or not the impact factor of the journal rises. So how will we determine whether the ban will improve our science?

Questions, questions. But this is why we conduct experiments and this is why BASP's brave decision should be given the benefit of the doubt.


I thank Samantha Bouwmeester and Anita Eerland for feedback on a previous version and Dermot Lynott for the Strider picture.

* Note that I’m not saying: “will the paper be accepted?” or “does the researcher deserve tenure?” 


  1. As I've mentioned before, it's pretty strange to accept submissions with p-values but to weed them out after acceptance. That's not so much an experiment in science as an prescriptivist editorial reflex in science reporting: p-values will still be part of the analytical process, and possibly even of the submission and review process.

    To try to take the editors' side (I don't really care since I've never heard of the journal in question before), decent descriptive statistics - both numerical and graphical - often render inferential statistics superfluous. One category is the 'so obvious it hits you between the eyes' one (cf. the intra-ocular trauma test). The other is the one where it's pretty clear that the predictions just don't pan out. In both cases, you don't really need inferential statistics - if inferential stats tells you something different from what a look at the data shows, it's usually the latter that right. For the other cases, it's usually straightforward to compute the inferential statistics yourself if you have to for most experimental designs.

    That said, I think that insistence on more or less any reporting requirement, be it standardised effect sizes, power levels or p-values, is fundamentally misguided as it presupposes that anything that works for one's own (often ANOVA-based) studies should work for analyses that are more left-field.

  2. Regarding your questions:
    "do the findings support the hypothesis?"
    "can I cite finding X as support for theory Y?"
    First, there are actually researchers who don't test hypotheses, but instead do research to answer theoretical questions. So the non-exclusive way to put it is "do the findings answer researcher's question?"

    Second, the usual way that the research is digested is this. Researcher presents results in the results section. Then in the discussion section she discusses how the results answer the question at hand and how well her answer is supported. The ban won't change anything about this. The additional argumentative step from results to conclusions has always been there and has always been necessary.

    Finally, as John Kruschke pointed out, the editorial does not discuss and exclude bayesian parameter estimation. So yes, there is inference beyond NHST...

  3. This is a radical editorial choice for which I see no justification. The reproducibility crisis in science has nothing to do with the framework that you use for statistical inference. Instead, it has everything to do with selectivity (in participant selection, data analysis, and reporting). Any selection mechanism (including the Bayesian ones) through which you funnel noisy data will introduce bias.

    I'm curious to see what comes out of this experiment, but I do not expect any radical changes. At the end, scientists need a way to deal with the uncertainty (noise, randomness) in their data, and the different flavours of statistical inference are just different principled ways of data-based decision making under uncertainty. An editor may prohibit authors to report on the outcome of these decision-making tools, but he cannot prevent author or readers applying their own decision making tool on the data. They are then trying to solve the same problem (data-based decision making under uncertainty) that professional statisticians have tried to solve for us.

    In my own field (cognitive and systems neuroscience), we do need principled ways of data-based decision making under uncertainty. I can show you the most beautiful 2D maps of brain activity, free for interpretation like a Rorschach plate, but nothing more than smoothed noise. Of course, although correct application of the appropriate statistical techniques will identify these maps as pure noise, this thus does not imply that any significant result obtained with those techniques will also be scientifically revealing.


    1. typo: "... that any significant result ..." should be "... that every significant result ...".