Skip to content

One issue raised in the debate over evidence-based policymaking and practice has been what should count as evidence. I was prompted to think further about this by a recent paper in the British Educational Research Journal (Connolly et al., 2019). In it, the authors compare the allocations of students to maths sets with their individual results on key stage 2 (KS2) tests. They label discrepancies as ‘misallocations’, and document variation in these as regards gender and ethnicity (there was no variation as regards social class). They conclude that their findings ‘lend credence to the argument that, if setting is to be used, it must be applied purely on the basis of prior attainment, to avoid misallocation and the creeping prejudice that our findings suggest applies in misallocated cases’ (p23).

These are weighty conclusions. While I would not be surprised if the charge of discrimination were true, it is still necessary to ask: How well does this research support them? There are two aspects to this: whether the factual conclusions produced by the research are sufficiently cogent, and whether those findings warrant the evaluations and recommendations made. I cannot make a full assessment here (see Gomm, 2020a, 2020b) but I will mention some concerns.

  • The authors do not document how the teachers actually allocated students to sets, though they mention information that could have been relied upon to do so. Instead, they infer ‘misallocation’, and indeed ‘prejudice’ on the part of teachers, solely on the basis of discrepancies between KS2 results and allocations. But how reliably did the test results measure the attainment of individual children at the time of the test? And how reliable were they as a measure of the children’s level of attainment at the later point of set allocation? At the very least there are margins of error here that need to be taken into account.
  • The authors claim general validity for their findings, but the sample of schools used were those in the ‘control’ arm of a randomised controlled trial. (This trial did not find any effect on students’ progress resulting from allocating students to sets solely on the basis of their KS2 scores.) While the sample was quite large (46 schools), we do not know how representative the findings are across schools in the UK or over time.
  • The third concern is a more fundamental one: that no empirical facts can, on their own, provide sufficient support for practical evaluations and recommendations, since these necessarily involve value assumptions. In this case these relate to what counts as equity, and to how much weight should be given to equity compared with other values involved in educational decisions. I don’t believe that researchers are in a superior position to decide such matters, and therefore to make authoritative evaluations of teachers’ decisions or practical recommendations. What could legitimately be done is to draw value conclusions conditionally. So, the authors might argue that the set allocations studied were inequitable in relation to gender and ethnicity if we take ‘equity’ to mean ‘equal chance of allocation to a maths set that corresponds to KS2 results in maths’. However, alongside this it must be made explicit that there are other reasonable ways of defining ‘equity’ in this context: for example, on the basis of other measures of attainment; or we may want to define ‘equity’ in terms of capability. More subtly, if also more vaguely, we could interpret it as ‘equal chances for students of being allocated to a set in which their mathematical abilities would flourish’. We also need to recognise that, in practical terms, there are other considerations that teachers could reasonably take into account in allocating children to sets. And not only can empirical research not tell us, on its own, what conception of equity to employ, it also cannot tell us what weight should be given to various other considerations that might be relevant in allocating students to sets.

It seems to me that what these authors report are findings that indicate a starting point for further investigation. They do not provide a basis for their claim that there are gender and ethnic differences in set allocation that contribute to aggregate level inequalities. Even less do they warrant the evaluative conclusions the authors draw.

The article I have discussed is hardly unique in prompting thought about these issues. However, this makes it even more important for us to consider how we are to judge what is adequate evidence, and to reflect on the responsibility we have, individually and collectively, to avoid going beyond the evidence. There is also a more worrying thought: Can educational researchers even agree about how to determine what is sufficient evidence? If not, what are the implications of this for the use of their work as a basis for policymaking?

Read Martyn Hammersley’s companion piece to this blog, published the following day: ‘Going beyond the evidence: Research and the media‘.


Connolly, P., Taylor, B., Francis, B., Archer, L., Hodgen, J., Mazenod, A., & Tereschenko, A. (2019). The misallocation of students to academic sets in maths: A study of secondary schools in England. British Educational Research Journal, 45(4).

Gomm, R. (2020a). Misunderstanding setting: Ethnicity. Retrieved from

Gomm, R. (2020b). Misunderstanding setting: Gender. Retrieved from

More content by Martyn Hammersley