Skip to content

Blog post

This is not a one-year blip: If we have to have a national assessment system, it shouldn’t be this one

Oliver Belas, University of Bedfordshire

This is not a one-year blip: If we have to have a national assessment system, it shouldn’t be this one

Although calls for root-and-branch reform of our national examinations systems aren’t new, they’ve been re-energised by the controversy of this year’s – possibly illegal (Elgot & Adams, 2020) – results by algorithm. No one thought calculated grades desirable; and it’s sadly no surprise that the most disadvantaged students were the most adversely affected in the first instance (Adams, 2020), and continue – following the late decision to delay the release of BTEC results (Dickens, 2020) – to face uncertainty. But this isn’t a story of a one-year blip, caused by the Covid-19 crisis. Rather, it’s one of a fundamentally flawed and inequitable examinations system (Sodha, 2020; Richardson, 2020a; Torrance, 2018).

Ofqual rationalised standardisation as a corrective to centre assessed grades (CAGs) that over- and underpredicted students’ grades– the former especially, which, research shows, is more common (Ofqual, 2020, p.15; fn.18 for the research cited therein); see also Harlen, 2005, Torrance, 2018). Days before Ofqual decided to follow Scotland, Wales and Northern Ireland, Laura McInerney (2020) acknowledged that, ‘in a world of terrible options, calculating a rough grade was not itself a stupid idea,’ as ‘granting teacher-predicted scores would have almost doubled top grades, meaning some universities and employers would have needed to give places by lotteries, or defer large numbers of students.’  With the government’s U-turn allowing students to keep their best grades – CAG or calculated – this is exactly the position many universities and their prospective students are now in (see Weale & Adams, 2020; Weale, 2020), while lower-tariff institutions will, especially with the sudden lifting of the cap, likely struggle to recruit (TES, 2020). We’ve shifted from a system known to be flawed (Adams, Elgot, Stewart, & Proctor, 2020) but intended to minimise ‘grade-inflation’ (a politically freighted idea) to a forced acceptance of maximal inflation, and the root of the problem seems to be a pernicious mistrust in teachers (Richardson, 2020a and 2020b).

‘We’ve shifted from a system known to be flawed but intended to minimise ‘grade-inflation’ (a politically freighted idea) to a forced acceptance of maximal inflation, and the root of the problem seems to be a pernicious mistrust in teachers.’

We needn’t be in this situation. The research (Ofqual, 2020, fn.18) that shows teacher overprediction is more common than underprediction also shows accurate prediction to be more common than both; it suggests, too, a correlation between the end of modular courses and the decoupling of AS and A-levels, on the one hand, and a dip in the accuracy of teacher predictions, on the other. Teachers ‘accurately’ predict students’ final grades around half the time, though that accuracy rate is really a matching score; it says nothing about whether teachers or exam boards grade students’ work ‘better’. The correlation of teachers’ and exam boards’ rank-ordering of students, however, tends to be much stronger (Ofqual, 2020; Harlen 2005). (A case in point: I know of one departmental head who analysed three years of her department’s predicted and actual A-level grades, and then used that data to predict this year’s outcomes. She was remarkably close. Crucially, though, under her analysis, several students sitting at the border between two grades would have been given the benefit of the doubt.) There is, moreover, research which indicates high validity and internal consistency among rigorous systems of teacher assessment, as well as lower-than-assumed reliability of external assessment (not to mention the psychological impact of the latter on teachers and students) (Harlen, 2005).

Not only are modular course-design and devolved, teacher-based assessment common internationally, they are the current norm in higher education in this country, and were once the norm in secondary education (Harlen, 2005; Black, 1998). Bearing this in mind: given that secondary schools and exam boards alike are geared up for moderation of (a much-reduced proportion of) centre-assessed work, and given that we’ve ended up with a version of centre-assessed final awards anyway, one wonders why a plan for rigorous centre-based assessment, subject to moderation, wasn’t Plan A. It would have been preferable this year. It would also have been preferable to the system to which we look set to return. Again, though it can’t be proved, it’s hard not to think that the issue is one of trust: were schools and teachers involved in students’ summative assessments and awards, and were courses modular in design, some of this year’s difficulties would have been avoided. Richer data would have been available upon which fairer judgements might have been based. Things wouldn’t have been perfect (could they ever be?); but they’d likely have been much better.

This year’s mess should remind us of the basic inadequacies of the current examinations system. We need to redesign the mechanisms of national assessment, with teacher assessment – properly supported and resourced – at the centre. This is neither a new argument nor practice (see Harlen, 2005 and references therein). But such change would require, among other things, a re-centring of our educational culture around a presumption of trust in teachers.


Adams, R. (2020, August 19). Disadvantaged pupils will be biggest winners from GCSE results. Guardian. Retrieved from

Adams, R., Elgot, J., Stewart, H., & Proctor, K. (2020, August 19). Ofqual ignored exams warning a month ago amid ministers’ pressure. Guardian. Retrieved from

Black, P. (1998). Testing: Friend or foe? The Theory and practice of assessment and testing. London: Falmer Press.

Dickens, J. (2020, August 19). Pearson announces eleventh-hour grading U-turn on BTECs – telling schools NOT to issue results tomorrow. Schools Week. Retrieved from

Elgot, J., & Adams, R. (2020, August 19). Ofqual exam results algorithm was unlawful, says Labour. Guardian. Retrieved from

Harlen, W. (2005). Trusting teachers’ judgement: Research evidence of the reliability and validity of teachers’ assessment used for summative purposes. Research Papers in Education, 20(3), 245–270.

McInerney, L. (2020, August 15). A-level students are victims of a farce Gavin Williamson had five months to prevent. Guardian. Retrieved from

Office of Qualifications and Examinations Regulation [Ofqual]. (2020). Awarding GCSE, AS, A level, advanced extension awards and extended project qualifications in summer 2020: Interim report. Retrieved from

Richardson, M. (2020a, August 18). A-level debacle has shattered trust in educational assessment. The Conversation. Retrieved from

Richardson, M. (2020b, May 3). Teacher trust and educational assessment in the wake of COVID-19 [Podcast] Ed. Space (episode 2). Retrieved from–Space-Episode-2-Mary-Richardson—Teacher-trust-and-educational-assessment-in-the-wake-of-COVID-19-edib1v/a-a23fr8m

Sodha, S. (2020, August 18). The fake meritocracy of A-level grades is rotten anyway – universities don’t need them [Opinion]. Guardian. Retrieved from

Times Educational Supplement [TES]. (2020). A levels: Higher-tariff universities ‘eat the sandwiches’ of others. Retrieved from

Torrance, H. (2018). The return to final paper examining in English national curriculum assessment and school examinations: Issues of validity, accountability and politics, British Journal of Education Studies, 66(1), 3–27.

Weale, S. (2020, August 19). Durham University offers students money to defer entry. Guardian. Retrieved from

Weale, S., & Adams, R. (2020, August 17). Not all UK students will get first-choice place, universities warn. Guardian. Retrieved from


More content by Oliver Belas