Irony is still alive

It shouldn’t come as a surprise that psychological studies on “priming” may have overstated the effects. It sounds plausible that thinking about words associated with old age might make someone walk slower afterwards for example, but as has been shown for many effects like this, they are nearly impossible to replicate.

Now Ulrich Schimmack, Moritz Heene, and Kamini Kesavan have dug a bit deeper into this, in a post at Replicability-Index titled “Reconstruction of a Train Wreck: How Priming Research Went off the Rails”. They analysed all studies cited in Chapter 4 of Daniel Kahneman’s book “Thinking Fast and Slow”. I’m also a big fan of the book, so this was interesting to read.

I’d recommend everyone with even a passing interest on these things to go and read the whole fascinating post. I’ll just note the authors’ conclusion: “…priming research is a train wreck and readers […] should not consider the presented studies as scientific evidence that subtle cues in their environment can have strong effects on their behavior outside their awareness.”

The irony is pointed out by Kahneman himself in his response: “there is a special irony in my mistake because the first paper that Amos Tversky and I published was about the belief in the “law of small numbers,” which allows researchers to trust the results of underpowered studies with unreasonably small samples.”

So nobody, absolutely nobody, can avoid biases in their thinking.

Guest blog on “Academic collaboration – a startup point of view”

I was asked to write a guest blog to University College London Centre for Behaviour Change‘s Digi-Hub. My brief was to talk about collaboration between businesses and academia, in particular from the point of view of a small startup company like Club Soda.

My post, which is part of a longer series of guest blogs, deals with evidence, evaluation, and the tension that working across organisational boundaries can create.

You can read the post here.

Guest blog on “Behaviour change for pubs and bars”

I was asked to write something for the Society for the Study of Addiction about our Nudging Pubs work in changing the behaviour of pubs and bars.

My guest post was on the two theoretical foundations of our project: a taxonomy of behaviour change tools, and a typology of nudges. The first is a UCL-led project, the second is from Cambridge University’s Behaviour and Health Research Unit.

Read the post at SSA’s website.

A typology of nudges

We’re working on an assessment tool to use with pubs and bars. The tool is meant to measure how welcoming the venues are to their non-drinking (or “less-drinking”) customers. We have been pondering all the various factors we could include in the tool, and how to classify them.

Having met some people from the Behaviour and Health Research Unit (BHRU) at Cambridge, they pointed me to their paper “Altering micro-environments to change population health behaviour: towards an evidence base for choice architecture interventions” in BMC Public Health. It could just help us get some of our ideas in order too.

The article has a nice typology for “choice architecture interventions in micro-environments”; I’ll just call them nudges from now on. There are nine types of nudges in this scheme:

    • Ambience (aesthetic or atmospheric aspects of the environment)
    • Functional design (design or adapt equipment or function of the environment)
    • Labelling (or endorsement info to product or at point-of-choice)
    • Presentation (sensory properties & visual design)
    • Sizing (product size or quantity)
    • Availability (behavioural options)
    • Proximity (effort required for options)
    • Priming (incidental cues to alter non-conscious behavioural response)
    • Prompting (non-personalised info to promote or raise awareness)

The first five types change the properties of “objects of stimuli”, the next two the placement of them, and the final two both the properties and placement.

I can see how we could use this as a basis for our thinking on the factors we want to measure pubs and bars on. For example, some basics like the choice of non-alcoholic / low-alcohol drinks would be about Availability, display of non-alcoholic drinks could be Presentation, Proximity and also Priming, drinks promotions would be Prompting and Labelling, and staff training could perhaps be about Prompting too?

I can’t instantly think of anything that we couldn’t fit into the typology (although we might need some flexibility of interpretation!). Interestingly, when the Cambridge researchers reviewed the existing literature, they could only find alcohol related nudges of the ambience, design, labelling, priming and prompting types. And not many studies overall, especially compared to research on diet which was the most popular topic for these types of nudges.

On the other hand, we could probably also find at least one metric for every one of the nine types of nudges, but they might not be the most interesting or important ones for this project. But it could still be a useful exercise to go through.

Progress with p values – perhaps

The American Statistical Association (ASA) has published their “statement” about p values. I have long held fairly strong views about p values, also known as “science’s dirtiest secret”, so this is exciting stuff for me. The process of drafting the ASA statement involved 20 experts, “many months” of emails, one two-day meeting, three months of draft statements, and was “lengthier and more controversial than anticipated”. The outcome is now out, in The American Statistician, with no fewer than 21 discussion notes to accompany it (mostly people involved from the start as far as I can gather).

The statement is made up of six principles, which are:

  1. P-values can indicate how incompatible the data are with a specified statistical model.
  2. P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
  3. Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
  4. Proper inference requires full reporting and transparency.
  5. A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
  6. By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.

I don’t think many people would disagree with much of this. I was expecting something a bit more radical – the principles seem fairly self-evident to me, and don’t really address the bigger issue of what to do about statistical practice. That question is addressed in the 21 comments though.

It probably says something about the topic that it needs 21 comments. And that’s also where the disagreements come in. Some note that the principles are unlikely to change anything. Some point out that the problem isn’t with p-values themselves, but the fact that they are misunderstood and abused. The Bayesians, predictably, advocate Bayes. About half say updating the teaching of statistics is the most urgent task now.

So a decent statement as far as it goes, in acknowledging the problems. But not much in the way of constructive ideas on where to go from here. Some journals have banned p-values altogether, which sounds like a knee-jerk reaction in the other extreme direction. I’d just like to see poor old p’s downgraded to one of the many statistical measures to consider when analysing data. Never the main one, and definitely not the deciding factor on whether something is important or not. I may have to wait a bit longer for that day.

Digital health & wellbeing conferencing

Last week saw the second of UCL’s behaviour change conferences, this year subtitled Digital Health & Wellbeing. And quite a bit bigger than last year’s first one. I spoke in a panel on “Challenges to creating sustainable, high impact interventions” (see below), and also had a poster on Club Soda’s Month Off Booze programme (a “prize-nominated poster” no less, though the prize went to someone else…).

UCL panel tweet

Some of the themes that I picked up on over the two days were:

Tailoring of messages – e.g. app prompts, emails, social media messages and so on. The more personalised these can be made, the better the engagement. This may also include personalizing the tools by the users themselves (e.g. adding bookmarks and notes).

Importance of good design – nobody likes an ugly app. Some features divide opinion (e.g. cartoon talking heads), some are not liked by anyone, and sometimes people take you by surprise. For example, German youth much prefer factual information about alcohol harms to “fun” factoids. Well, thinking about this a bit more, perhaps it’s not so surprising that teens don’t find funny the things that public health officials think they should do…

Communities/social support – several interesting projects included some elements of this, and with good results too.

Not just apps – this is one of my personal bugbears, but I did hear other people as well talk about the fact that apps are no longer the only game in town. They may be a part of a bigger intervention, or they may not be included at all. And sometimes the preferred medium is not what you expect at all: in one example, people much preferred text messages to emails, as emails “reminded them of work”(!).

Not just RCTs – a few critical comments on these too. There are alternatives available, which can be much quicker and easier to do.

New recruitment avenues – GumTree was mentioned several times as the source of study participants!

Evaluation of eHealth/mHealth interventions – this research is making progress. A Cochrane review of digital alcohol reduction interventions is nearing completion, with some interesting findings on what seems to work and what doesn’t. I’m really looking forward to reading the full study soon.

Poor engagement levels – an oft-cited figure was the 20% of apps that are only ever used once and then ignored. And very few are used at anything like “frequently”. This creates problems for evaluation as well, as the drop-out rates in some studies can be over 90%.

Dose – again, several speakers mentioned this as an open issue. What is the “dose” of a digital intervention, can it be altered, how to measure it, and does it make a difference?

Qualitative data too! – A fascinating comment by Nikki Newhouse: when she interviewed people about their use of a website, the stories completely contradicted the researchers’ conclusions from the quantitative data. For example, people had seemingly spent lots of time on one page, but had in fact found it so confusing that they had often “gone to make tea instead” and not actually read it at all!

All in all a stimulating two days again, with lots to take away and ponder.