In education, we’ve put a straitjacket around our notion of experiment. We adhere steadfastly to what Parlett and Hamilton (1972) famously called the ‘agricultural-botany paradigm’. But the agricultural-botany experiment isn’t the way that ‘experiment’ is thought about across the broad spectrum of the natural and applied sciences. Everywhere else, scientists are much more open-minded about what an experiment might be.
‘There have been three incarnations of enthusiasm for experiment-based inquiry in education research – all beginning with zeal and gusto, but each ending in disappointment.’
In my new article in the British Educational Research Journal (Thomas, 2020), I examine our loyalty to the agricultural-botany experiment. I take a brief look at the history of experiment in education and conclude that there have been three incarnations of enthusiasm for experiment-based inquiry in education research – all beginning with zeal and gusto, but each ending in disappointment.
The first incarnation was in the interwar years, which followed the successes of experimental psychology in the 1920s but ended in pessimism about what experiment could tell us. The next – the second coming – began in the 1960s, lasted until the 1980s, and followed Campbell and Stanley’s 1963 exegesis on experimental design, which taxonomised experiment forms. This spawned a new enthusiasm and a fresh tranche of large-scale projects. These again ended in disappointment, though, as it appeared that major, multimillion dollar interventions such as Headstart were having little effect. Indeed, in 1981, Gene Glass – perhaps the leading quantitative researcher of education in the 20th century, and one who had been heavily involved in the post-60s tranche of experiments – came to the conclusion that ‘…the deficiencies of quantitative, experimental evaluation are thorough and irreparable’ (Glass & Camilli, 1981, p.23).
The third coming happened around the turn of the millennium, after the disappointments of the experimentation during the second tranche were attributed to inadequate randomisation in the earlier experiments (see Cook, 2001). The solution? Randomisation. And off we went again with a third tranche – of randomised experiments emerging from work funded by the What Works Clearinghouse in the US and the Education Endowment Foundation in the UK.
We’re now at a point where evaluations have been made of these third, post-millennium tranche findings. These evaluations repeat the findings from the first and second tranches of experiment: the impact of interventions, these evaluations tell us, is routinely low (Malouf & Taymans, 2016; Lortie-Forgues & Inglis, 2019). The comments of the evaluators of the new experiments mirror those of the evaluators of the tranche of experiments that came before: US evaluators concluded that their findings about the 21st century tranche of What Works studies painted a dim picture of the evidence base on education interventions, while in the UK evaluators concluded that the current tranche is yielding ‘small and uninformative effects’.
All this instils an odd sense of déjà vu.
What should we learn from these repeated disappointments? I argue in my paper (Thomas, 2020) that a potential explanation for the frailties of experiment (that is to say, experiment in the agricultural-botany tradition) in social research rests in the power law principle – more commonly called Pareto’s principle or, in different domains of study, Zipf’s law or the ‘law of the vital few’. The underlying idea here is that the stability-of-effect assumptions that dominate much social research using experiment are misplaced. The influence of particular facets of social life is pervasive, unstable and disproportional, such that they will always overwhelm the influence of interventions of interest.
These influences cannot be dismissed merely as ‘noise’. They are a fundamental part of the social landscape and they will have their effect not simply by virtue of their value, but via their interaction with other variables, activating or deactivating the potency of other potential determinants of change. It follows that a few highly significant variables may determine the ultimate effectiveness of most interventions. More than this, the influence of these variables may increase, decay or fluctuate with time, making interaction effects complex and unpredictable. Pareto’s principle offers a means of understanding the apparently nugatory and/or short-lived impact of much education innovation, as well as the inadequacies of formal experiment (in the agricultural-botany tradition) to assess any such impact.
It’s time to get out of the straitjacket and to take a more catholic view of what experiment might be in education research.
This blog is based on the article ‘Experiment’s persistent failure in education inquiry, and why it keeps failing’ by Gary Thomas, published in the British Educational Research Journal on an open-access basis.
Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi‐experimental designs for research. Boston, MA: Houghton Mifflin Co.
Cook, T. D. (2001). Reappraising the arguments against randomized experiments in education: An analysis of the culture of evaluation in American schools of education. Chicago: Northwestern University, Chicago.
Glass, G. V., & Camilli, G. A. (1981). ‘Follow Through’ Evaluation. Viewpoints (120). Washington DC: National Institute of Education.
Lortie-Forgues, H., & Inglis, M. (2019). Rigorous large-scale educational RCTs are often uninformative: should we be concerned? Educational Researcher, 48(3), 158-166.
Malouf, D. B., & Taymans, J. M. (2016). Anatomy of an evidence base. Educational Researcher, 45(8), 454-459.
Parlett, M. & Hamilton, D. (1972). Evaluation as illumination: A new approach to the study of innovatory programs. Occasional paper. Edinburgh: Edinburgh University Centre for Research in the Educational Sciences.
Thomas, G. (2020). Experiment’s persistent failure in education inquiry, and why it keeps failing. British Educational Research Journal. Advance online publication. https://doi.org/10.1002/berj.3660