X Close

Centre for Education Policy and Equalising Opportunities (CEPEO)


We create research to improve the education system and equalise opportunities for all.


Too Much, Too Little? Finding the ‘Goldilocks’ Level of Assessment to Advance Personalized Approaches to Education for Everyone

By Blog editor, on 24 October 2023

By Dr Dominic Kelly

This article was first published by UNESCO MGIEP as part of The Blue Dot 17: Reimagining Assessments.

I like to think my teachers would say I was a relatively good child, but I am not sure they would say I was the most consistent one. In school, concentration on my studies was too often distracted by Pokémon cards, Arsenal F.C., and elaborate daydreams. As inconsistent as I was, from my experience, I knew that some of my classmates could be even less consistent – how they behaved yesterday could be radically different to how they behaved today, regardless of how clever they could be at their best. Given the challenges that many children face at home, there were many reasons for these inconsistencies. Did they get a good night’s sleep, despite a noisy, overcrowded house (Hershner, 2020)? Did they even have breakfast that morning (Hoyland et al., 2009)? Therefore, if you had entered our classroom with a clipboard and a page of arithmetic on a random Wednesday afternoon, I am unsure that you would have caught all of us at our best – or even at our most typical. Likewise, whether our typical selves happened to be present on the same day as standardized tests were administered, was certainly not a given. Perhaps this all seems obvious to you but, despite this, why are single assessments of children often assumed to be representative or reliable?

In recent years, there have been understandable worries that we assess children too much (Hopkinson, 2022). Students and parents have reported that schools put too much focus on ‘high stakes’ testing, potentially to the detriment of children’s ‘love of learning’ (More Than a Score, 2020) and to their mental health (Newton, 2021) – although it should be noted that recent empirical research in a British sample found no relation between children’s wellbeing or happiness in school and participating in standardized testing (Jerrim, 2021). Either way, there is a distinct possibility that high- stakes standardized assessments are not the most representative way of assessing children’s educational capabilities (e.g., Morgan, 2016). Furthermore, I would also suggest that the most vulnerable children from the least consistent home settings are often those assessed the least fairly. Increasing evidence suggests that our cognitive performance in any given moment is affected by many contextual factors (e.g., Chaku et al., 2021). If so, given the variability that we know all children but especially the most disadvantaged show, conclusions about academic behaviours which are drawn from single measurements may not be as representative of a student’s capabilities as once thought because these measurements are affected by external factors such as sleep, stress or nutrition. Given this, there should be a real concern that single assessments, whether standardized or not, could be a format that works to the advantage of children from affluent backgrounds, while being particularly unfair to children from disadvantaged ones. For this reason, I would argue that education experts and developmental psychologists typically assess children too little. Instead of having occasional high-stakes assignments which potentially disrupt learning and increase tension in the classroom, I argue that there is a need for more frequent, low-stress assessments that occur in the background of the learning environment without disrupting instruction, which are not only more representative of achievement but also allow us to really engage with what makes a child’s classroom experience so variable from day to day.

Technological advances in educational technology (EdTech) offer us the potential to fundamentally change how interventions are developed for students, which can represent their variability in a manner which is much closer to “real time”, especially in high-income countries where many classrooms might have these technologies already available. Largely because of the substantial amount of labour and expenditure required to administer assessments, longitudinal educational studies have traditionally had long measurement intervals – for example, years or months apart. But what might appear to be relative stability in educational behaviours when assessed infrequently may in fact be a highly dynamic process with substantial fluctuations between days. Modern technology in the classroom setting provides the opportunity to dramatically reduce costs and both increase the number of assessments and decrease the intervals between assessments – for example, intervals of days, hours or even minutes. This latter approach to assessment can be considered prototypical of intensive longitudinal designs, which involve the collection of many repeated observations per person (also known as micro-longitudinal designs). Data for these studies are often collected by measuring individuals’ thoughts and behaviours, typically in familiar environments (e.g., the classroom, at home), instead of unfamiliar laboratory environments, with relatively non-intrusive smartphones, tablets, wearable technology, and so on. These assessments go beyond traditional continuous assessments as contextual, non-cognitive factors can be collected too. These studies may also be more accurate due to the decreased intervals between when thoughts and behaviours occurred and when they are reported (Trull & Ebner-Priemer, 2014).

One of the most important benefits of collecting intensive longitudinal data is the potential to adapt instruction to the needs and variability of each child. Instead of generalising broad conclusions across students, we have the potential to utilize previously unfathomable amounts of data collected from EdTech to create highly personalized models for every child. To date, intensive longitudinal studies have disproportionately featured adults (e.g., Kelly & Beltz, 2021) and have rarely been set in the classroom. Yet, compared to data collected much less frequently, intensively collected data on the variability of student’s’ experience can be sought regularly in the classroom – learning behaviours and outcomes, wellbeing, peer interactions, and so on. Personalized education is a burgeoning field focused on leveraging ‘big data’ to develop complex but parsimonious models based on students’ needs and nuances, which can lead to effective interventions, but there is still relatively little known about what factors in children’s daily lives are important for their academic achievement and wellbeing. Intensive longitudinal studies can inform this and facilitate potentially powerful personalized interventions. This personalization is particularly important given the diversity we see in the classroom. Many intervention efforts for equalizing educational outcomes have been designed for the ‘average student’. Yet, no student is average: students’ learning processes are contextualized by the intersections and interactions of each element of their identity, background, and history, which may not be consistent in how they manifest in the classroom every day. Rather than apply broad educational practices across students, leveraging intensive longitudinal data offers enormous potential for developing highly personalized models and interventions tailored to each student’s unique needs.

Given that personalized approaches to education require a greater number of assessments than other approaches, there is some concern that administering regular assessments could be burdensome for teachers and potentially disrupt learning. The innovative applications of EdTech, so that assessment goes relatively unnoticed while providing the most benefit, are therefore essential. Many classrooms in high-income countries already have some relevant technological infrastructure in place, even if it isn’t intended for that purpose yet. Daily educational data is already being collected: namely, formative assessments which are used at the moment by educational professionals to monitor progress. Although continuous forms of assessment can potentially be useful for reducing the pressure on students on specific occasions, their potential is being underutilized: these data also allow for a more fine-grained understanding of what predicts and what is predicted by students’ daily variability. The thoughtful measurement and modelling of this data could be elucidating, but there is still a lack of suitable methods, leaving the field “data rich but information poor”. If this data could be complemented by other short-form, easy-to-administer surveys about behaviour or cognition, it would be possible to address questions about children’s individual progress and setbacks in the classroom, without placing extra stress on teachers. To ensure this, thoughtful teacher training will need to be developed and provided, which itself will likely need to be tailored to teachers’ existing knowledge of EdTech. An important challenge will be determining the right number of assessments that provide enough fine-grained detail to understand the complexity of a child, but that is not so demanding that it impedes the classroom – in fact, that ’Goldilocks’ number of assessments may itself be unique to each child. Of course, there are continued inequities in access to these opportunities as it is mostly high-income economies that have embedded technology in their classrooms, and there are also notable differences in opportunities within those economies. As EdTech decreases in cost and hopefully spreads to more diverse settings, an important challenge will be designing and administering assessments which are culturally specific to local educational needs and resources.

Another potential limitation of intensive longitudinal designs is that they track fluctuations over short periods of time, but do not alone allow for plotting long-term changes. Therefore, there is a clear need for studies and interventions that combine both traditional and intensive longitudinal assessments together – what are called ‘burst designs’ (Stawski et al., 2015). For example, one could measure children’s academic performance and experience in the classroom every day for two weeks, every year for five years. Such a design would have the potential to address unique questions about how short-term fluctuations become long-term change. Are there specific times in a child’s life where they are the least consistent in their behaviours, and does that matter? Is a child’s lack of consistency in daily assessments indicative of problem behaviours in later life? Only by integrating intensive longitudinal data and traditional longitudinal data can these questions be addressed.

In sum, we have good reason to question whether single assessments can truly represent the variability of a child’s experiences in the classroom. Contextual factors can lead to substantial fluctuations in cognitive performance. Intensive longitudinal studies to measure these fluctuations have previously been used primarily with adults, but this work has generally not yet translated to the classroom or with children, despite the potential that the thoughtful leverage of this type of assessment offers for our understanding of variability and the future of personalized education. Furthermore, there is a distinct need for research that suitably assesses both short-term fluctuations and long-term change together, to determine how the former becomes the latter in ways that are potentially unique to each child. I believe this to be a worthy endeavour – individualized approaches to education which fully engage with the heterogeneity of the unique disparities that students face, have the potential to reduce barriers, equalize outcomes, and improve social mobility. The inconsistency of a child’s cognition or behaviours should not be treated as error, noise or inconvenience but as a vital, and long overlooked, aspect of their development.

I’d like to thank my doctoral dissertation committee – Drs. Adriene Beltz, Pam Davis-Kean, Robin Edelstein, and Ioulia Kovelman – for their insight in developing this line of research with me.


Leave a Reply