The great algorithm fiasco
When A-level examinations were cancelled in March 2020, the government instructed the Office of Qualifications and Examinations Regulation (Ofqual) to instead issue ‘calculated grades’ based on teacher predictions moderated by an algorithm (Kelly, 2021). The education secretary required Ofqual to ‘ensure, as far as possible, that… the distribution of grades followed a similar profile to previous years’ (Adams et al., 2020). In response, schools provided a list of ‘centre assessed grades’ (CAGs) in each subject with the students ranked in order; the historical distribution of actual grades for the previous three years was then tested against predicted grades for each school to check previous accuracy; and finally the predicted distribution of grades at each school, based on school-level GCSE attainment, was examined and CAGs were raised or lowered to fit that distribution. For cohorts with fewer than 15 students, the algorithm was not applied and the raw CAGs were used instead.
Algorithms are normally tested against known outcomes from previous years. Ofqual did this, but the test was misleading because, although there were teacher-predicted grades in 2019, there were no teacher-predicted rankings. Ofqual generated these retrospectively for the trial by ordering students according to the marks they actually got in their exams, but this weakened the test because the real grades and the estimated grades were based on the same set of marks, so that a high level of agreement was inevitable. And yet, even then, Ofqual had to admit that there was large variance in the algorithm’s accuracy. [block quote]
‘The algorithm test was misleading because, although there were teacher-predicted grades in 2019, there were no teacher-predicted rankings.’
The 2020 A-level grades were announced on 13 August. Students at small schools and subjects with fewer than 15 students, which are more common in the fee-paying sector, received more raised calculated grades than students at state schools, who saw their grades plummet. There was uproar. Ofqual’s deputy chief regulator, Michelle Meadows, confirmed that pupils from poor backgrounds were ‘more likely to have seen a bigger downward adjustment’, but attributed this to the ‘tendency for more generosity in the predictions for students from lower socioeconomic status backgrounds’ (cited in Hazell, 2020). It was a disconcerting explanation, but there was a simpler one that pointed to a flaw in the algorithm; namely, that statistical adjustment is methodologically unsound for small cohorts, but small cohorts are much more common at private schools.
Following the publication of results, serious political pressure was applied to reverse the decision to use calculated grades, but Gavin Williamson had only been in office for eight months and understandably had to rely on advice from Ofqual. On 15 August the prime minister weighed-in, stating that the results were ‘robust and dependable’, but later the same day Ofqual suspended the system. Two days later, Ofqual announced that students would be awarded their CAGs instead, despite a whopping 12.5 per cent inflation. On 25 August 2020, the head of Ofqual resigned. Three days later, the most senior civil servant at the Department for Education, the permanent secretary, followed suit. It later emerged that the Royal Statistical Society (RSS) had offered to help Ofqual with the construction of the algorithm, but Ofqual delayed replying for nearly two months (Murray, 2020). There is no evidence that the education secretary was aware of this offer. It also emerged that Oxford, Cambridge and RSA Examinations had warned that the algorithm was producing rogue results, but Williamson and his department were assured by Ofqual that these would be corrected by the appeals procedure.
‘What the policy collapse seems to have exposed was not incompetence but the longstanding mistrust that policymakers have of teachers’ capacity for impartial professional judgement.’
On the stakeholder front, universities passively complained about the turmoil (Fazackerley, 2020), but strangely had not themselves proposed skipping calculated grades altogether and using GCSE results instead. This at least would have been students’ own actual attainment. If only the effort and resources that went into the algorithm had been put instead into training for schools about how to take ‘mock’ examinations into account and how to adjust predicted grades based on prior attainment and trajectory. Algorithms are by their nature a sign of mistrust: the supposedly objective replacement of supposedly subjective judgement. What the policy collapse seems to have exposed was not incompetence but the longstanding mistrust that policymakers have of teachers’ capacity for impartial professional judgement. The upcoming 2021 examination cycle is an opportunity for the teaching profession to put this right by accepting ownership of assessment and independent moderation. It will be the first time since the William Tyndale controversy of the 1970s that the profession can take such a lead. Let’s hope it turns out better this time.
This blog is based on the article ‘A tale of two algorithms: The appeal and repeal of calculated grades systems in England and Ireland in 2020’ by Anthony Kelly, published in the British Educational Research Journal on an open-access basis.
Adams, R., Elgot, J., Stewart, H., & Proctor, K. (2020, August 19). Ofqual ignored exams warning a month ago amid ministers’ pressure. Guardian. https://www.theguardian.com/politics/2020/aug/19/ofqual-was-warned-a-month-ago-that-exams-algorithm-was-volatile
Fazackerley, A. (2020, August 21). ‘We were already full’: Universities face nightmare of exams chaos and Covid-19. Guardian. https://www.theguardian.com/education/2020/aug/21/universities-warn-of-pandemic-risk-if-upgraded-a-level-students-admitted
Hazell, W. (2020, August 13). A-level results 2020: 39% of teacher predicted grades downgraded by algorithm amid calls for U-turn. iNews. https://inews.co.uk/news/education/a-level-results-2020-grades-downgraded-algorithm-triple-lock-u-turn-result-day-578194
Kelly, A. (2021). A tale of two algorithms: The appeal and repeal of calculated grades systems in England and Ireland in 2020. British Educational Research Journal. https://doi.org/10.1002/berj.3705
Murray, J. (2020). RSS hits back at Ofqual in exams algorithm row. Guardian. https://www.theguardian.com/education/2020/aug/24/royal-statistical-society-hits-back-at-ofqual-in-exams-algorithm-row