In September 2019 Ofsted will introduce a new Framework of Inspection and it has already begun to work on revising the current version. That timescale and the arrival of a new Chief Inspector offer the chance of change, but whether that change becomes merely cosmetic or genuinely radical will in part depend upon the amount of pressure we, the teaching profession, are prepared to apply. The danger is that Ofsted will settle for a superficial respray to make it look fresh and up-to-date, but what we need is a model designed anew from first, educational principles – a recovery vehicle rather than a war chariot.
I want Ofsted, for instance, to abandon its grading scale which attaches a single label (“inadequate” or “outstanding”) to the new mega Further Education Colleges (with 40,000 students and 30+ departments). This is just one egregious example of the growing evidence that (more…)
Over the last 20 years, secondary school performance measures have had an enormous impact on schools’ behaviour, parental preference and, indeed, local house prices. Published in local and national league tables, they have been based on the proportion of a school’s students who secure 5 A*-C GCSEs in English, mathematics and three other subjects.
Just before Christmas, in what appeared to be a leak of the Government’s review of secondary school accountability, the Daily Telegraph reported that this measure will go, to be replaced with an average points score. For example, 8 points would be given for an A*, 7 for an A and so on. Unfortunately, and hopefully not on the basis of leaks from the DfE, the Telegraph spoilt its story with two schoolboy howlers: first by suggesting that this provided a “more precise calculation of achievement”, rather than a measure of examination attainment, and second by arguing that this would prevent “over-generous marking” by teachers – clearly, the way scores are reported for accountability purposes says nothing about the assessment methods which make them up.
In principle, it makes good sense to hold schools to account for the progress made by all their pupils rather than the sub-set who achieve the highest grades. On this principle alone, the Telegraph-reported proposal is welcome and a big advance on current arrangements. At present, the grade C threshold encourages schools to focus attention on that threshold to the detriment of other thresholds, and certainly to the detriment of reporting the progress made by all children. Effectively, we have been using a secondary school performance measure which relates to the performance of only three fifths of young people, which may be not unrelated to our ingrained problem of the long tail of low performance. The idea of calculating a points score is an advance.
As ever, though, in matters of assessment policy, things are not that simple. In the first place, the measure as reported treats grade boundaries as single steps up an equally calibrated staircase – the step from a G to and F would be treated as the same step up (one step) as from a C to a D or from an A to an A*. But this is misleading on several grounds. As any grade boundary archive makes plain not all grade boundaries are the same size. Normally, the critical C boundary is set, and then other boundaries are derived statistically based on deviations from the C boundary. Some boundaries are then set as equal steps between marks, but not all are. The conversion of grades to numbers for the purpose of deriving a total average grade assumes a statistical pattern which is not there in pupils’ performance: not all grades are simple steps up in marks.
It gets more complex: although it is important to hold schools to account for the progress made by all pupils, in practice the C/D borderline is important for post-16 progression. Getting a C in maths allows a pupil to progress in ways that getting a D doesn’t, but a B opens very few additional progression possibilities which a C does not. Whether the C boundary should be as important as it has become can be debated, but it does matter – and internationally, the idea of thresholds for functional numeracy and facility in the national language is gaining currency. The C/D borderline measures this – crudely and ineffectively in all sorts of ways – in the way that a measure across the attainment range does not. American schools report their graduation rates – and some students take longer than others to graduate; American graduation rates are reporting a threshold, and they are often fairly incurious about performance above the threshold. High school graduation, as too many teen movies bear witness, is graduation.
There is a further difficulty. Most observers argue that schools should be held to account for the progress made by the pupils they teach, and it has long been pointed out that the performance of some schools is flattered by the focus on the proportion securing 5 A*-C GCSEs: there are schools which should be doing much better than they are given their intake. The proposed calculation, although it is a step forward, is still not a progress measure. For schools, the key indicator is not the measure of the overall attainment of a cohort, but the measure of levels of progress from entry to exit. That is a much more genuinely inclusive measure. But even that is complicated as the education system is gently tilted back towards norm- rather than criterion-referenced assessment methods, so that not all pupils may be able to make three levels of progress.
And there is one more complexity, which matters if you accept that accountability measures can drive perverse behaviours. The focus on the C threshold may encourage schools to invest considerable resources at the C/D borderline, but the concern with the number of students reaching a threshold does force schools to be concerned with the performance of individuals. Basing accountability on average scores shifts the focus from individuals to grades. Most of us want schools to be concerned with outcomes for individuals.
The Telegraph report reminds us that “accountability” for performance in education is complex. Developing measures which genuinely allow schools to demonstrate what they have achieved with young people is complex. Translating it into a readily understood format which can be communicated clearly is perhaps even more complex. At root, society needs clarity about what it wants to hold schools to account for: the progress made by individual pupils, in which case we should worry less about thresholds, or their ability to move all pupils to an agreed threshold, in which case we should worry less about above threshold performance, or their ability to push the most able to elite levels of performance, in which case we need to reflect on how to map the performance of all. Until we clarify that, we will struggle with inadequate measures in which we vest too much confidence
If nothing else, today’s Ofqual report into this year’s GCSE reminds us that few things in education are more technically complex than assessment. The controversial report itself is a difficult document to navigate. The differences between marks, grades and awards, between syllabus content and specification structure, between coursework, controlled assessment and terminal assessment and the different things they can tell us are all a reminder that to construct, develop and manage an assessment regime is an enormous challenge. Ofqual picks its way through this complexity and has come up with a clear view: GCSEs went wrong in 2012 because the highly regulated system is overburdened. We expect too much of our assessment system, and as a result, our system drives perverse behaviour.
As Ofqual remind their readers, the English GCSEs which year 11 students completed this year were new: the GCSEs they replaced in 2010 had been in place for eight years and teachers and schools had become used to them. The replacement of coursework by controlled assessment – assessments completed in schools under controlled conditions – had been designed to address perceived problems of external help and plagiarism (para 1.48) – but threw up new challenges about the management of controlled assessment in school. For Ofqual, the results this year were a crisis of regulation and of complexity. They point out that the reliance on controlled assessment – 60% of the marks in English GCSE – placed a big emphasis on the role of schools and “we do not regulate schools” (para 1.49). The report heaps some blame on the now (perhaps, in the circumstances, conveniently) abolished Qualifications and Curriculum Authority for failing to grasp the “difficulties of maintaining standards in a set of new qualifications of such complexity” (para 1.48 again).
These are devastating conclusions. Ofqual claim that regulation failed at the point of specification design, and introduced a major unregulated component into the assessment system. For Ofqual’s numerous critics, this is a whitewash, shifting the blame for the crisis onto teachers who over-marked controlled assessments, and diverting attention away from Ofqual’s own regulation of key aspects of the system – including the moderation of controlled assessment: essentially, examination board moderators did not cavil at schools’ marks. No-one reading the report from a dispassionate perspective can feel satisfied about the regulation and management of a complex examination system.
Tucked away in the report is perhaps the most important sentence: “We have found evidence that this [the use of examination thresholds at grade C] can lead to undue pressure on schools in the way they mark controlled assessments. A recurring theme in our interviews with schools was the pressure exerted by the accountability arrangements, and the extent to which it drives teachers to predict and manage grade outcomes” (para 6.3).
Over the last 30 years, we have placed greater and greater weight on grade boundaries: they determine not only children’s futures, but also the fate of schools and, increasingly, individual teachers’ career progression. Schools below threshold are subject to intervention strategies and may be taken over For teachers, the mooted possibility of performance related pay systems would simply lay greater emphasis on the importance of examination results.
I blogged earlier this year about the infamous Atlanta testing scandal in the United States, where cheating became endemic because of the rewards for “success”. We have, collectively, to reflect now on the school accountability system, and whether a crude examination-led accountability system is not always going to lead us into difficulty. Once again, Campbell’s law is vindicated: “The more any quantitative indicator is used for decision-making, the more subject it is to corruption pressures and the more apt it will be to distort the processes it monitors”.
If nothing else, the Ofqual report might put another nail in the coffin of the current school accountability system. Schools need to be held accountable, and the highest standards of attainment matter – but we appear to have created a system which drives the most perverse behavior – “cheating” as one highly respected journalist puts it.
Teachers are angry about the Ofqual report. They believed that they were acting not only professionally and morally but also with great technical accuracy. No-one who has examined the extra-ordinary sophistication of schools’ data tacking systems can fail to be impressed. They believed that they were doing what they were expected to do: using all their data internally and externally to map progress, to monitor performance, to predict outcomes and to design interventions. I’m lucky: I get to talk to teachers, school leaders and policymakers from around the world. They are in awe of the technical abilities displayed in monitoring performance which are routine in English schools. They understand that our information and performance systems are exceptional and our schools highly skilled.
Informed commentators in England, such as John Dunford, have argued that the time has come to move away from a system of external assessment to one based on internal assessment led by chartered assessors. Implicitly, the Ofqual report appears to make this more difficult. Its strong undercurrent – and another reason for the widespread professional anger – is that regulated assessment cannot be left to schools. That feels a disappointment, because properly conducted internal assessment can be much richer and more productive than most external examinations.
The Ofqual report is technically complex, and fascinating reading for those absorbed in the complexities of assessment, but it fails to pose really tough questions about the long-term future of assessment in England. It sets out the challenges of running a modern assessment system without really making the point that complexity is inevitable; it accurately highlights the consequences for schools of the over-emphasis on single accountability measures, but it does not yet pursue the logic of this for the long-term development of assessment systems in England.
Perhaps this is because of a structural flaw in the makeup of Ofqual: it is, after all, a regulator. But there is enough in the report which documents the systemic failures of regulation and the perverse behaviour driven by the overlap between our assessment and accountability systems to be clear that something needs to be done. We need a full scale, politically neutral review of our education accountability framework.