Irony is still alive

It shouldn’t come as a surprise that psychological studies on “priming” may have overstated the effects. It sounds plausible that thinking about words associated with old age might make someone walk slower afterwards for example, but as has been shown for many effects like this, they are nearly impossible to replicate.

Now Ulrich Schimmack, Moritz Heene, and Kamini Kesavan have dug a bit deeper into this, in a post at Replicability-Index titled “Reconstruction of a Train Wreck: How Priming Research Went off the Rails”. They analysed all studies cited in Chapter 4 of Daniel Kahneman’s book “Thinking Fast and Slow”. I’m also a big fan of the book, so this was interesting to read.

I’d recommend everyone with even a passing interest on these things to go and read the whole fascinating post. I’ll just note the authors’ conclusion: “…priming research is a train wreck and readers […] should not consider the presented studies as scientific evidence that subtle cues in their environment can have strong effects on their behavior outside their awareness.”

The irony is pointed out by Kahneman himself in his response: “there is a special irony in my mistake because the first paper that Amos Tversky and I published was about the belief in the “law of small numbers,” which allows researchers to trust the results of underpowered studies with unreasonably small samples.”

So nobody, absolutely nobody, can avoid biases in their thinking.

Two views on behaviour change

I was at the Food Matters Live exhibition again today. We tasted lots of drinks for Club Soda’s dry January programme The MOB, which will again have non-alcoholic drink reviews/suggestions for every day of the month. Our tastings included camel’s milk (disappointingly very similar to cow’s milk), half a dozen tree sap drinks (varying quality), and a countless number of “healthy” fruit juice drinks (healthy to very varying degree I would say…). We also tried an “oxygen” drink, which is basically a fruit juice that has been foamed. Rather heroic health benefits were made for this concoction as well. I do get that oxygen is very good for you, but I’m not sure eating/drinking it is the best way of absorbing the goodness.

A panel discussion on food and behaviour change had Ben Goldacre and Richard Wiseman on it. BG’s opening was a very good brief statement on how most of the misleading media stories on food and health actually come from academia (in particular press releases). He noted that clean, good quality information is the first thing that is needed by consumers if they want to eat a healthier diet. And that requires good evidence-based quidance from the experts.

RW talked a bit about health information and messaging as well. “Keep it simple, keep it positive” was his summary. We as humans like simplicity and positivity. So far so not controversial.

Where things got more interesting was around the “what is to be done” question. BG was quite adamant that the main issues are top-down: society and culture need to change so that people can more easily make healthier choices. RW on the other hand insisted that there are still small choices everyone can make, despite the social pressure for fatty and sugary treats (this difference came about while discussing children, and children’s parties’ catering in particular).

Hmm. I will now make a horrible generalisation, and probably libel many good people (including BG who I have a lot of time for). But there is probably something here, between a stereotypical doctor and a humble psychologist. An “I will tell you what is best for you and you will do exactly so” versus an “I will try to help you to make better (but not perfect) decisions for yourself”? I would take the latter any day myself.

A few bad apples or rotten to the core?

An article in Nature on the ethics of bankers has been widely reported. The researchers asked bankers and non-bankers to toss a coin in private, and then report the numbers of heads and tails – the higher the number of one of the two, the bigger the payout they received. This is a commonly used setting to study honesty. Even though the researchers will never know how honest each participant was individually, overall the amount of heads and tails should be 50/50 if everyone reports their results honestly. Any major deviation from the expected figures indicates some foul play. And on the whole, people tend to be remarkably honest in studies like this; definitely more honest (and therefore poorer as a result) than standard economic theory would predict.

An interesting twist in this particular study, however, was that half of the bankers were primed by asking them questions about their work before the coin tossing; in effect they were reminded what they did and where they worked. And that changed the results significantly. Without the priming, the bankers were as honest as everyone else. But with the nudge about their livelihood, they became much less honest. The effect was clearly significant (in all senses of the word) in the overall average cheating figures.

But what hasn’t been noted in the newspaper articles I’ve seen, was the change in the distribution of the results? Did all bankers become a bit more dishonest, or did some become a lot more crooked? Well, the original article shows the entire distributions for the control and treatment groups, and it seems to me that the answer is: a bit of both. The entire distribution of reported results shifts slightly, resulting in maybe 10% more in rewards than the participants should have received. But there is also a massive increase in the number of people who claim to have got nothing but heads (or tails, whichever gave them the reward).

In conclusion then, based on this study, most bankers in their natural habitat are probably a little bit naughty, but some of them are really quite seriously bad.

Or maybe most people just don’t understand statistics (p<0.01?)

Statistical hypothesis testing has always been close to my heart. I’ve long been critical of the use of p values, especially as most people seem to misunderstand their interpretation. I may even have failed a job interview once due to my stance on this. I suspect my interviewers didn’t believe that I knew what I was talking about when the subject came up.

This week, I read another two papers on this theme, one in finance, one in psychology. The first, Evaluating Trading Strategies by Campbell Harvey and Yan Liu looks at empirical evaluation of stock trading strategies. It is a nice illustration of the usual pitfalls of data mining – the bad kind where you do so many tests that some are bound to be statistically significant by chance alone – and has a useful discussion of ways of correcting for such multiple testing.

The second, Kuhberger, Fritz & Scherndl’s Publication Bias in Psychology: A Diagnosis Based on the Correlation between Effect Size and Sample Size samples a thousand published psych articles and finds a negative correlation between sample size and effect size, and the usual clustering of results at just within the statistically significant level. In other words, suggesting massive publication bias problems.

These two papers aren’t anything revolutionarily new, as this stuff has been known and talked about for ages. And yet, lots of people still fall into these traps, especially when it comes to publishing their work, and evaluating previous findings (for example doing a meta-analysis). My first thought was that it must be due to institutional factors: academics must publish, so they will fish and mine and p hack until they find something statistically significant to publish, and journals need to fill their pages somehow so they stick to the old rules, even though everyone knows that it’s all a bit of a sham and not to be trusted.

But then I came across Hoekstra, Morey, Rouder & Wagenmakers’s Robust misinterpretation of confidence intervals. This one interviewed 120 psychology researchers and 442 students about their understanding of confidence intervals for a simple hypothesis test. Both groups were equally misinformed about the interpretation of the test results, such as confidence intervals and p values. And what’s more (quoting from the abstract): “Self-declared experience with statistics was not related to researchers’ performance, and, even more surprisingly, researchers hardly outperformed the students, even though the students had not received any education on statistical inference whatsoever.”

So maybe there’s more to this than just publish or perish?