Once your data has been cleaned and formatted correctly, you’re ready to analyse it.

You can find a video and a written guide below to walk you through the process of analysing your data step by step. There are two important concepts you’ll need to know for your analysis – statistical significance and regression. Most of the section is dedicated to explaining these. If you already feel familiar with them, feel free to skip straight to the analysis.

Conducting your analysis in Excel

Analysing your data involves the following steps:

  1. Conducting your regression analysis
  2. Generating your results graph

We have produced two short guides to walk you through this process. You can download them below. Begin with the Word guide to explain the process, and then practice using the Excel guide:

  1. Word guide
  2. Excel guide

You can also watch the short video below, which walks you through how to do it.

Statistical significance: a primer

We usually study a sample to estimate the likely effect of our policy on a wider population. For example, we might test a new policy for increasing childcare take-up with a few thousand parents to estimate how big an impact it would have on all the parents eligible for free childcare.

Because of chance, every sample you take will be slightly different. If you take lots of different samples of parents, each sample is likely to have slightly different rates of childcare take-up.

Imagine that the actual rate of take-up of childcare was 75%. If you took lots of different samples and plotted them on a graph, the graph might look something like this:

This natural randomness generates some uncertainty when you introduce a policy change. In this example, if you ran a trial and one version of your policy resulted in 76% rather than 75% take-up, it might be that your change made a small difference – but given the natural randomness, this might just be down to chance.

The bigger the change, the less likely it is to just be down to chance. If your change resulted in 82% rather than 75% take-up, that’s unlikely to be due to chance.

That is the basic idea behind statistical significance. A slightly simplified definition is that a result is statistically significant if it’s very unlikely to have occurred just by chance.

Testing for statistical significance

There are three main steps to find out whether a result is statistically significant:

  1. Set up a null hypothesis. This is always the hypothesis that you haven’t found anything. For example, that there is no difference between your control group and your intervention group. We always assume the null hypothesis is true until proven otherwise – equivalent to innocent until proven guilty in a court case.
  2. Set up an alternative hypothesis. This is the opposite of the null hypothesis – for example, that there is a difference between your control group and your treatment group. It’s equivalent to a guilty verdict in a court case.
  3. Calculate the probability of getting the results from your study if the null hypothesis was true.

The final stage of the process gives you a p value, or the probability of getting the effect from your study if the null hypothesis was true. This is a number between 0 and 1, which is equivalent to a percentage between 0% and 100%.

The conventional threshold for statistical significance is p = 0.05 or lower. This means we say a result is statistically significant if there is a 5% probability or less that it could have occurred by chance, if in reality, there really was no effect.

The analysis video and guide below will walk you through these stages in practice and teach you how to calculate the p value for your study.

A few things about p values to bear in mind

  • Even if your result looks positive, if it isn’t statistically significant it might have happened by chance and shouldn’t be thought of as reliable evidence.
  • The statistical significance threshold is slightly arbitrary. It’s important to have a threshold in place so that everyone is on the same page, but there isn’t really that much difference between a p value of 0.049 (significant!) and 0.051 (not significant).
  • Statistically significant doesn’t mean significant in practice. If you have a big enough sample size, a result can be statistically significant but practically negligible.

Regression: a primer

Regression is a statistical technique that you can use to analyse the data from your trial. It lets you estimate the strength of the relationships between different variables. Importantly for our purposes, it automatically produces a p value and tells you whether your results are statistically significant.

Imagine that you want to study the link between two different variables – for example, homework time and exam results in a school. You could plot how much homework time each student does and their exam score on a graph like this.

You can draw a line that captures the relationship between the two variables. Here, the line slopes upwards which means that students who do more homework tend to get better exam results (although note that this doesn’t mean homework causes better exam results – just that they are linked in some way).

The steepness of the line tells us how strong the relationship is. A steeper line would mean that every extra hour of homework is associated with a bigger boost in exam results.

Regression looks at data like this and determines how much a change in one variable is linked to a change in the other. In this example, it tells us what exam result change is associated with one extra hour of homework.

Now let’s switch examples. Let’s imagine that you’re looking at the relationship between exam results and extra tutoring. There are two groups – one that gets extra tutoring and the other that doesn’t. As above, each student is represented by a single blue dot. You can plot their exam results on another graph like this.

Just like the first example, you can draw a regression line which calculates the relationship between the two variables (exam results and whether or not the students receive tutoring). You can think of the regression line as telling you the average impact of getting tutoring compared to not getting it.

This is exactly how you can use regression to analyse your trial data. The regression analysis will calculate the impact of your policy change, just like in the example above it tells you the impact of the tutoring on average.

Why do you need to do this?

You might ask why you need regression at all. Can’t you just compare take-up between the groups in your trial and see which one is higher?

In fact, you can do this. But regression has three big advantages, including one big practical advantage when conducting your first trial:

  1. Regression automatically calculates your p value and tells you whether your result is statistically significant.
  2. In more complex trials, regression allows you to control for other factors that might affect the outcome (eg, the relationship between exam results and homework, controlling for gender and school type).
  3. Regression is more versatile and can deal with different types of outcomes. Again, this is an advantage for more complex trials.

Authors

Louise Bazalgette

Louise Bazalgette

Louise Bazalgette

Deputy Director, fairer start mission

Louise works as part of a multi-disciplinary innovation team focused on narrowing the outcome gap for disadvantaged children.

View profile
Dave Wilson

Dave Wilson

Dave Wilson

Advisor

Dave is an Advisor in the Education team at the Behavioural Insights Team (BIT) with a focus on early years projects.

View profile
Fionnuala O’Reilly

Fionnuala O’Reilly

Fionnuala O’Reilly

Lead Behavioural Scientist, fairer start mission

Fionnuala is the lead behavioural scientist in the fairer start mission and is currently seconded from the Behavioural Insights Team (BIT) until March 2023.

View profile