Agile evaluation of digital behaviour change interventions
By Carmen E Lefevre, on 2 November 2016
By Robert West; Centre for Behaviour Change, University College London
Digital behaviour change interventions (DBCIs) are typically apps and websites that aim to achieve lasting behaviour change in users, for example stopping smoking, reducing alcohol consumption, increasing levels of physical activity or reducing calorie intake.
We want to know whether DBCIs are effective, and if so how effective and for whom, but have a major challenge in doing so.
On the one hand, we can’t tell whether there has been lasting change unless we follow up a sufficiently large sample for at least several months after they have started using the DBCI. Effect sizes are usually small so the sample typically needs to be in the order of hundreds. We also need to have a suitable reference against which to compare whatever change has been observed to be confident that the change would not have happened anyway.
On the other hand, we need to be more agile in our evaluations of DBCIs. Rapid changes in the digital landscape mean that DBCIs that are effective in one year may not be effective a couple of years later. Moreover, DBCIs have many different components and we cannot do large scale RCTs with long-term follow up on all the different permutations to come up with a combination that works.
So what is the best we can do when faced with this challenge? Probably the starting point is to adjust our expectations about what is achievable, particularly when we are first developing the DBCIs. We are unlikely to be able to achieve the same level of confidence in the lasting effect of a DBCI as we can with less context-sensitive clinical interventions. Once we have accepted that, we can look for more agile evaluation methods which can give an acceptable degree of confidence in the robustness and generalisability of the findings. Each situation will be different but a few key approaches suggest themselves.
One is to use short-term outcome measures that are known to predict long-term outcomes quite well (e.g. short-term smoking abstinence rates or self-reported craving).
Another is to use designs such as A-B testing to compare different variants of DBCIs (e.g. here).
A third is to move away from classical statistical methods with a fixed ‘significance level’ (typically p<0.05) and sample size to a Bayesian decision making approach in which we accumulate data until we reach a threshold of confidence in effectiveness or lack of effectiveness.
Finally, and perhaps most importantly, from the very start of any evaluation we should monitor engagement and early markers of possible effectiveness for signs that the DBCI is not working. If, among the first 20 users, only a tiny proportion, are getting past the first screen of an application that requires extensive engagement, there is little point in continuing with the evaluation – something will need to be changed.
For a guide on the development and evaluation of DBCIs click here