An introduction to statistical power

Statistical power is the probability of your study finding a statistically significant effect if one is really there. It is probably the most complex concept involved in this toolkit, but it is worth taking the time to understand because it influences one of the big decisions involved in any trial: how many participants you need to take part.

In order to understand statistical power, there are two other concepts that you need to be familiar with first:

  1. Sample size: how many people are in your study
  2. Effect size: how big an effect does your policy change have

There’s one other concept that is important too: statistical significance. This is discussed in detail in step six. For now, it is enough to know that it is about how confident you can be that the results of your study are real and not a fluke.

Key concept 1: sample size

Sample size means the number of participants taking part in a study. The general rule is that more is better and gives you more confidence that your result is reliable. It is often referred to using the letter N (so a study with 1,000 participants would be N = 1,000).

An intuitive way to think about it is to imagine a coin flip. If you flip a coin twice and it lands heads both times (N = 2), you wouldn’t think anything was strange. If you flipped it 100 times and it landed heads every time (N = 100), you would know the coin is biased.

You can also think about this using a visual metaphor. As the picture below shows, larger samples = clearer findings.

Key concept 2: effect size

The effect size refers to how big an effect your policy change has.

There are lots of different ways to express an effect size, depending on the study. For childcare take-up, the easiest way is in percentage points.

Module 1 discussed a previous Behavioural Insights Team project to increase take-up of the tax-free childcare scheme. In this trial, the behaviourally informed letter increased take-up from 3.4% to 3.8%. This is an effect size of 0.4 percentage points, or a relative increase of about 12%.

Both concepts influence statistical power

To return to the definition from earlier, statistical power is the probability of your study finding a statistically significant effect (we’ll come back to this concept in step 6) if one is really there.

Conventionally, scientific studies aim for at least 80% power:

  • This means that a study has an 80% probability of finding a statistically significant effect if there really is an effect.
  • For example, a trial comparing two two-year-old offer letters with 80% power would have an 80% chance of finding a statistically significant difference between them if one letter really was more effective than the other.

A key thing about statistical power is that it depends on both your effect size and your sample size:

  • If you already know your sample size, you can use this to calculate the smallest effect size you are able to detect in your study – this is called your Minimum Detectable Effect Size (see the video below for instructions).
  • If you already know roughly what effect size you are likely to find, you can estimate the minimum sample size you need in your study.

How do you know what effect size you are likely to find?

  • The best way is to look at previous similar studies. Often we find that communications trials find impacts of around two or three percentage points, and this is a good place to start.
  • Generally speaking, you should expect communications trials (eg, testing different versions of a letter) to have smaller impacts than more intensive and costly policy changes.

Calculating power for your study

You can conduct your own power calculations using an online power calculator. The video below walks you through how to do it.

What if my sample size is too small?

When you conduct your power calculations, you might find that your sample is too small to achieve the kind of effect size you think is likely. For example, you might be limited by the number of parents eligible for the two-year-old offer in your area. This means your trial is underpowered and you are unlikely to find a statistically significant result.

This is a common problem, but there are two main things you can do:

  1. Run several waves of the trial, so that in total your sample size is large enough (if you do this you will need to do a statistical adjustment to reflect that there were several waves).
  2. Go ahead anyway as a pilot trial. This means you are unlikely to find a statistically significant result, but it might be worthwhile if the trial is low cost, there are few risks, and you are going to implement the policy in any case (eg, if it’s a letter you regularly send out). If you do this, it’s important not to put too much weight on findings that aren’t statistically significant.

Authors

Louise Bazalgette

Louise Bazalgette

Louise Bazalgette

Deputy Director, fairer start mission

Louise works as part of a multi-disciplinary innovation team focused on narrowing the outcome gap for disadvantaged children.

View profile
Dave Wilson

Dave Wilson

Dave Wilson

Advisor

Dave is an Advisor in the Education team at the Behavioural Insights Team (BIT) with a focus on early years projects.

View profile
Fionnuala O’Reilly

Fionnuala O’Reilly

Fionnuala O’Reilly

Lead Behavioural Scientist, fairer start mission

Fionnuala is the lead behavioural scientist in the fairer start mission and is currently seconded from the Behavioural Insights Team (BIT) until March 2023.

View profile