A-Level Results 2020: why data shouldn’t speak for themselves.

Note: since this blog was published, the UK Government has committed a u-turn whereby students in England, Wales and Northern Ireland will receive their teachers’ estimated grades or higher.

If knowledge is power, as the adage goes, then so too education is the key to acquiring the knowledge necessary to understand, wield and stand up to power in society. Education opens doors to career opportunities, new social and cultural networks, and ultimately positions of influence that then allow individuals to shape the lifestyles and life chances of others in society. Yet educational provision and the benefits accrued from it (or more specifically from exam results in an age of test metrics) are not evenly distributed. Last year, students in England and Wales from the poorest one-third of postcodes were almost twice less likely to acquire basic school-level qualifications than those in the richest third. 

In this context, the UK’s 2020 A-Level results fiasco is particularly damaging. Rather than students commanding their own agency to acquire results that they deserve, they’ve been given grades that reflect the aggregate trajectories of those who went before them. This has, in turn, led to a widening of the attainment gap between the least and most advantaged that will have a significant ripple effect if left unaddressed. Indeed, Prime Minister Boris Johnson’s promises to ‘level up’ all parts of the country are currently sounding particularly hollow.  

So how did this happen?

It all comes down to how the Office of Qualifications and Examinations Regulation – otherwise known as Ofqual – and the Department for Education decided to calculate grades this year in light of the coronavirus pandemic and consequent interruptions to ordinary schooling. In this instance, they relied on three metrics. 

The first are CAGs (centre assessment grades), which were provided by teachers and were meant to represent the grade that students would have been most likely to achieve if teaching and learning had continued and students had taken their exams as planned.  

The second are ROGs (rank order grades), whereby Ofqual asked teachers to provide a rank order of students for each grade for each subject. This was meant to provide improved accuracy when making decisions about students who would have otherwise been hovering at grade boundaries. The logic behind this was based on existing research on the differential accuracy of people’s relative predictions compared to isolated case predictions. Indeed, initial analysis of over five million CAGs testified to a general level of teacher optimism that would have led to ‘implausibly high’ grades (A* grades would have gone up from 7% of all grades to 14% nationally).

So in light of this, a standardisation method was also used known as the Direct Centre Performance model (DCP), which works by predicting the distribution of grades for each individual school or college based on the historical performance of that school or college. That distribution was then used to downweight or upweight CAGs to make them, supposedly, more realistic. 

What does this mean for student grades?

In reality, the standardisation process will have privileged students at previously high performing schools and disadvantaged those at previously poor performing schools. As such, the biggest winners in this scenario would be underachieving students at high performing schools (whose grades would be pulled up) and the biggest losers would be high achieving students at poor performing schools (whose grades would be pulled down). This process effectively writes out the abilities of individual students (and especially those at either end of the attainment spectrum).

Let’s put it another way that might be more relatable for, say, our elected representatives. Imagine that there was a general election planned for 2020 that was subsequently called off because of the coronavirus pandemic. Imagine that the Electoral Commission then offered an algorithmic alternative that used prior data on party success and polling scores to simulate the election. The algorithm *may* produce the same aggregate result that would have occurred in reality, but the efforts (successful or otherwise) of individual candidates and campaigns would be eliminated. Aspiring politicians with particularly good new ideas, strong local or regional followings, and effective campaign strategies would be [even more] hamstrung by the past performance of their predecessors. 

These hypothetical problems are paralleled in the A-Level results debacle. In the same way as elections, universal education and school examinations are [ideally] supposed to be a meritocratic tool for rewarding individual talent and hard work regardless of a child’s circumstances. Those young people who see neither reflected in the grades awarded to them by a computer will be justifiably aggrieved. 

To make matters worse - and here's the real rub - the DCP cannot reliably predict grades for particularly small groups of students. This meant (by Ofqual’s own admission) that more weight was placed on CAGs where subject cohorts in a school had fewer than 15 submissions. And where numbers were particularly small (say just five or so candidates), then the CAGs were used without any standardisation at all. 

Given that independent schools teach in much smaller cohorts regardless of subject, this has meant that student grades in those schools were less likely to get downweighted by the DCP. Therefore, the grades of students at independent schools are (a) higher this year than they otherwise might have been and (b) further removed from the grades of high performing state school students who have been taught in larger classes and therefore more likely to have their CAG downweighted by standardisation. In the context of the UK’s already drastic levels of socio-economic inequality, this is totally unacceptable.

The venerable John Keane (a professor of political theory at the University of Sydney) once told me, as a naïve young PhD student immersed in quantitative analysis, that ‘data do not speak for themselves and never will.’ In that single line is an important lesson about the dangers of decoupling statistical analysis and model-based actions from the broader principles of social science. In many ways, the young people receiving their A-Level grades this week were not failed by an algorithm, but rather they were failed those who (a) designed it without any circumspection as to whether or not it could be applied equally and or fairly, and (b) those who analysed the results, saw the disparities therein, and decided to go ahead anyway without stopping to reconsider. 

What are the options going forward? 

Education Secretary Gavin Williamson has promised a triple lock to provide security for those students who did not receive the grades they were predicted. Students may, according to this announcement, take the grades from prior mock results (where an official process of verification is approved) or resit exams in the autumn (or rather sit them for the first time!). Neither are particularly suitable options. 

As a former secondary school teacher, I would suggest that mock exams are an equally if not more imperfect assessment of student ability than weighted CAGs. They are often taken less seriously by students; they are often marked harshly by teachers to encourage diligence in the final months of the school year; and students often improve considerably in the final months before an exam (often crossing predicted grade boundaries in the process). As for sitting exams in the autumn, this will further penalise those students who had hoped to go to university this year but have now missed out on their place. It will also require students to take difficult examinations at short notice after missing months of school-based learning and preparation.

Retrospectively, it would have made far more sense to delay the announcement of A-Level results and to release grades to schools in private. A further moderation process based on negotiation and appeals between school teachers/leaders and Ofqual could then have taken place to iron out particularly unreasonable, unfair or unfathomable results. 

Writing in The Telegraph, Gavin Williamson has also argued that using unstandardised CAGs ‘would devalue the results for the class of 2020.’ However, there is an important trade-off to be considered here between diluting top grades across the board or ensuring that no students are undervalued in an otherwise discriminatory fashion. It would seem far better and more equitable to use unstandardised CAGs for all students, and thus require universities and employers to work harder as arbiters of future success (in its myriad forms) based on a wider pool of possible candidates, than to deny many young people the chance of success at all. 

Previous
Previous

Principles of survey research: A few top tips.

Next
Next

Out now! My book on the psychology of British politicians