Statistical hypothesis testing has always been close to my heart. I’ve long been critical of the use of p values, especially as most people seem to misunderstand their interpretation. I may even have failed a job interview once due to my stance on this. I suspect my interviewers didn’t believe that I knew what I was talking about when the subject came up.
This week, I read another two papers on this theme, one in finance, one in psychology. The first, Evaluating Trading Strategies by Campbell Harvey and Yan Liu looks at empirical evaluation of stock trading strategies. It is a nice illustration of the usual pitfalls of data mining – the bad kind where you do so many tests that some are bound to be statistically significant by chance alone – and has a useful discussion of ways of correcting for such multiple testing.
The second, Kuhberger, Fritz & Scherndl’s Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size samples a thousand published psych articles and finds a negative correlation between sample size and effect size, and the usual clustering of results at just within the statistically significant level. In other words, suggesting massive publication bias problems.
These two papers aren’t anything revolutionarily new, as this stuff has been known and talked about for ages. And yet, lots of people still fall into these traps, especially when it comes to publishing their work, and evaluating previous findings (for example doing a meta-analysis). My first thought was that it must be due to institutional factors: academics must publish, so they will fish and mine and p hack until they find something statistically significant to publish, and journals need to fill their pages somehow so they stick to the old rules, even though everyone knows that it’s all a bit of a sham and not to be trusted.
But then I came across Hoekstra, Morey, Rouder & Wagenmakers’s Robust misinterpretation of confidence intervals. This one interviewed 120 psychology researchers and 442 students about their understanding of confidence intervals for a simple hypothesis test. Both groups were equally misinformed about the interpretation of the test results, such as confidence intervals and p values. And what’s more (quoting from the abstract): “Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever.”
So maybe there’s more to this than just publish or perish?