Why primary school SATs exams and start-ups might help us with testing impact, randomised trials and nurturing innovation.
Over 30,000 parents backed a boycott of England's primary school SATs exams last month. They argued that the tests were not age appropriate for 6-7 year olds and caused undue stress. The Department for Education said the tests were needed to raise standards. Who’s right? And what does this have to do with innovation?
There’s another set of tests taking place across UK schools at the moment, albeit less controversially - randomised controlled trials (RCTs). Instead of testing pupils, RCTs are being used to test education interventions. This is a good thing. Put simply, RCTs tell us ‘what works’ by giving a ‘treatment’ group an intervention (e.g. a textbook), and comparing their progress against a ‘control’ group (e.g. no textbook).
One of the reasons RCTs are attractive to decision-makers is because (like the SATs), they give a final grade of sorts. Not an SATs-style 2a or 3c (or 'beyond expected'), but rather an ‘effect size’ figure. In the field of education, this is often converted to the extra months of learning a child will make due to the programme (e.g. +3 months).
Experts recognise that qualitative or other methods can be more appropriate than RCTs. But for many, RCTs remain the ideal, the ‘gold standard’. And like a pupil's report card or a school league table, people often ignore the comments and focus on that final grade.
Personally, I am a fan of RCTs (and other methods) and am glad that evidence-based education is gaining momentum, thanks to the work of organisations like the Education Endowment Foundation (EEF). I am also a fan of a degree of standardised national testing (if used well). We need to know ‘what works’.
But I do think that we need a simpler language around when not to test, and that the SATs can help us as a cautionary tale. Testing too early, narrowly, and publicly can stifle innovation and innovators, just like labelling a child with a test can hinder their development.
Sometimes a programme (for example, Braingym) needs to be publicly called out. But equally we need to be aware that as the number of RCTs grow, we may be writing off some innovators too soon.
These are innovators who have nurtured their idea (‘their baby’ you might say) through thick and thin. Many are valued by schools, learning and improve their programme each year. We owe it to them to recognise that though they may not have passed this test, they may still have great potential.
I now want to tell you a story from the startups world that features another type of randomised trial, which I think is more constructive. A few years back, two friends who couldn’t pay their rent decided to set up a website to rent their flat when they were away on holiday. They quickly made money and it got them thinking: what if more people rented out their flat this away?
So they built a bigger website based on this exciting idea. The problem was that while some people loved it, it wasn’t taking off. It wasn’t making the grade as far as their investors were concerned.
So they ran a series of their own tests. First they replaced people’s own photos of their flat with professional photos: the number of rentals went up by 2-3 times. Next they conducted a little randomised trial. Visitors were randomly allocated to identical websites except for one difference. In one, people had to click a ‘star’ to show they liked a property, and in another, they clicked a ‘heart’.
The heart triggered a 30 per cent increase in clicks. They did countless other ‘randomised trials’ to optimise their site. And bit-by-bit it improved, until eventually their idea had a critical amount of traction and took off. It is now used by millions of people across the world as Airbnb.
There are a number of interesting things about the way the randomised trials were used by Airbnb, in contrast to RCTs in education. Their tests were used to make specific programme improvements, not overall programme judgements. The findings were not shared publicly, but confidentially and internally to drive practical improvements. And perhaps most interestingly, Airbnb’s backers were not worried by their initially poor results.
Airbnb's backers were interested in the potential. There were promising signs that were measurable (loyalty among initial customers), but some that weren’t (a resilient team). And at the heart, they knew that there was a kernel of something exciting there that needed to be nurtured.
So back to the RCT movement. Many of the teachers or social entrepreneurs who put themselves forward for RCTs have that kernel. They have initial traction. Schools, pupils and teachers know there is something special there, and often the more special, the harder to measure. And like AirBnB, many founders have a strong, resilient team who are eager to learn and improve.
But rather than give constructive feedback, RCTs sometimes give these teams a public grade too early. From a system point of view, this is useful - policymakers need to know what works now, so they can scale it. But from the innovators’ point of view (and longer term) there are downsides.
From my experience, it is often the smaller innovators and founders who are most at risk. They need the funding so apply for RCT grants. By contrast, large educational publishers use evaluations but are less likely to use a high-stakes RCT on flagship programmes. This is different from medicine, where RCTs are useful for keeping large pharmaceutical companies in check.
I can’t speak for the SATs, but the situation on RCTs and evidence based education is still positive. The fact that I can quibble here about interpretating RCT results is a good thing. I can only do this because evidence-based education is increasingly mainstream.
Organisations like the EEF, the Alliance for Useful Evidence and Innovation Growth Lab (among others) are also nuanced. They acknowledge the benefits but also the limitations of RCTs. Last Friday the EEF published a blog which emphasised that. They argued that while RCTs are primarily designed to find out if a particular programme is successful, it is just as important (and useful) for RCTs to find out why and how too.
Now that the RCT movement is gaining momentum, I do think there’s room for simpler language. I think this could help educate and support innovators.
I also think it's time to look at implementation challenges for RCTs specifically in education. Are there certain organisations that are more likely to put themselves forward for RCTs? Are certain types of education intervention better suited to RCTs?
Are there ways we can fund randomised trials that teach rather than test - trials that are formative not summative? Many social entrepreneurs would benefit from funding for the randomised testing Airbnb used, or support from organisations like BIT.
By doing this, I think that we are more likely to nurture game-changing innovations and see early potential fulfilled.
Image courtesy of the Malindine E G (Lt), War Office official photographer and the Imperial War Museum (public domain)