By Mira Vogel, on 20 August 2015
When asked for evidence of effectiveness of digital education I often find it hard to respond, even though this is one of the best questions you can ask about it. Partly this is because digital education is not a single intervention but a portmanteau of different applications interacting with the circumstances and practices of staff and students – in other words, it’s situated. Another is that evaluation by practitioners tends not to be well resourced or rewarded, leading to a lack of well-designed and well-reported evaluation studies to synthesise into theory. For these reasons I was interested to see a paper by Tuan Nguyen titled ‘The effectiveness of online learning: beyond no significant difference and future horizons‘ in the latest issue of the Journal of Online Learning and Teaching. Concerned with generalisability of research which compares ‘online’ to ‘traditional’ education, it offers critique and proposes improvements.
Nguyen directs attention to nosignificantdifference.org, a site which indicates that 92% of distance or online education is at least as effective or better than what he terms ‘traditional’ i.e. in-person, campus-based education. He proceeds to examine this statistic, raising questions about the studies included and a range of biases within them.
Because the studies include a variety of interventions in a variety of contexts, it is impossible to define an essence of ‘online learning’ (and the same is presumably true for ‘traditional learning’). From this it follows that no constant effect is found for online learning; most of the studies had mixed results attributed to heterogeneity effects. For example, one found that synchronous work favoured traditional students whereas asynchronous work favoured online students. Another found that, as we might expect, its results were moderated by race/ethnicity, sex and ability. One interesting finding was that fixed timetabling can enable traditional students to spend more time-on-task than online students, with correspondingly better outcomes. Another was improvements in distance learning may only be identifiable if we exclude what Nguyen tentatively calls ‘first-generation online courses’ from the studies.
A number of the studies contradict each other, leading some researchers to argue that much of the variation in observed learning outcomes is due to research methodology. Where the researcher was also responsible for running the course there was concern about vested interests in the results of the evaluation. The validity of quasi experimental studies is threatened by confounding effects such as students from a control group being able to use friends’ accounts to access the intervention. One major methodological concern is endogenous selection bias: where students self-select their learning format rather than being randomly assigned, there are indications that the online students are more able and confident, which in turn may mask the effectiveness of traditional format. Also related to sampling, most data comes from undergraduate courses and wonders whether graduate students with independent learning skills might fare better with online courses.
Lest all of this feed cynicism about bothering to evaluate at all, only evaluation research can empower good decisions about where to put our resources and energies. What this paper indicates is that it is possible to design out or control for some of the confounding factors it raises. Nguyen makes a couple of suggestions for the ongoing research agenda. The first he terms the “ever ubiquitous” more-research-needed approach to investigating heterogeneity effects.
“In particular, there needs to be a focus on the factors that have been observed to have an impact on the effectiveness of online education: self-selection bias, blended instruction, active engagement with the materials, formative assessment, varied materials and repeatable low-stake practice, collaborative learning communities, student maturity, independent learning skills, synchronous and asynchronous work, and student characteristics.”
He points out a number of circumstances which are under the direct control of the teaching team, such as opportunities for low stakes practice, occasions for synchronous and asynchronous engagement, and varied materials, which are relatively straightforward to adjust and relate to student outcomes. He also suggests how to approach weighting and measuring these. Inevitably, thoughts turn to individualising student learning and it is this, particularly in the form of adaptive learning software, that Nguyen proposes as the most likely way out of the No Significant Difference doldrums. Determining the most effective pathways for different students in different courses promises to inform those courses ongoing designs. This approach puts big data in the service of individualisation based on student behaviour or attributes.
This dual emphasis of Nguyen’s research agenda avoids an excessively data-oriented approach. When evaluation becomes diverted into trying to relate clicks to test scores, not only are some subject areas under-researched but benefits of online environments are liable to be conceived in narrowed terms of the extent to which they yield enough data to individualise student pathways. This in itself is an operational purpose which overlooks the educational qualities of environments as design spaces in which educators author, exercise professional judgment, and intervene contingently. I had a bit of a reverie about vast repositories of educational data such as LearnSphere and the dangers of allowing them to over-determine teaching (though I don’t wish to diminish their opportunities, either). I wished I had completed Ryan Baker’s Big Data in Education Mooc on EdX (this will run again, though whether I’ll be equal to the maths is another question). I wondered if the funding squeeze might conceivably lead us to adopt paradoxically homogeneous approaches to coping with the heterogeneity of students, where everyone draws similar conclusions from the data and acts on it in similar ways, perhaps buying off-the-shelf black-box algorithmic solutions from increasingly monopolistic providers. Then I wondered if I was indulging dystopian flights of fancy, because in order for click-by-click data to inform the learning activity design you need to triangulate it with something less circumstantial – you need to know the whys as well as the whats and the whens. Click data may provide circumstantial evidence about what does or doesn’t work, but on its own it can’t propose solutions. Speculating about solutions is a luxury – using A/B testing on students may be allowed in Moocs and other courses where nobody’s paying, but it’s a more fraught matter in established higher education cohorts. Moreover Moocs are currently outside many institutions’ quality frameworks and this is probably why their evaluation questions often seem concerned with engagement rather than learning. Which is to say that Mooc evaluations which are mainly click and test data-oriented may have limited light to shed outside those Mooc contexts.
Evaluating online learning is difficult because evaluating learning is difficult. To use click data and test scores in a way which avoids unnecessary trial and error, we will need to carry out qualitative studies. Nguyen’s two approaches should be treated as symbiotic.
Video HT Bonnie Stewart.