Using Energy Performance Certificates to predict heat pump uptake

As part of our mission to help make households more environmentally friendly by increasing the uptake of low-carbon heating, we recently embarked on a project in partnership with the Energy Saving Trust. The work centred around using machine learning and statistical methods to gain insights into the adoption of heat pumps across the UK. We learned a lot from this project even though it didn’t quite deliver the results we hoped for.

Heat pumps are still relatively rare in the UK, with only around 1% of homes having one installed. This led us to ask the question: what can we learn about the early adopters of heat pumps, and can we use these learnings to understand who is likely to install a heat pump in the future? By applying methods from data science we set out to:

  • understand the profile of households that have already installed a heat pump
  • predict which households or areas could form the next wave of heat pump adopters
  • make some progress towards predicting the shape and rate of heat pump growth in the UK over the coming years

These predictions could be used to understand where installers are likely to be most in demand over the coming years, to prioritise regions that may need additional support to encourage heat pump adoption and to help mitigate the risk that widespread heat pump adoption could widen economic inequality.

Our primary data source was the Energy Performance Certificate (EPC) register, which provides data on characteristics of domestic properties such as structure, size and energy efficiency measures. Since 2008 in England and Wales, and since 2012 in Scotland, an EPC has been registered whenever a property has been built, sold or rented; the full dataset is openly available and consists of 22 million records relating to 17 million properties in Great Britain (as of March 2022).

What we did

We approached the project through a combination of machine learning and statistical modelling, technical details of which are provided below.

The general idea was to try and learn how heat pump uptake varies depending on different property characteristics. We would then try to predict the pattern of future heat pump uptake by considering the geographic distribution of properties with these characteristics. By using a combination of different methods in parallel, we aimed to strike a balance between the predictive performance and the interpretability of our models, and to increase the reliability of our predictions by comparing the results produced by each approach.

This approach uses supervised machine learning to predict heat pump adoption based on the characteristics of current heat pump adopters and their properties. One model learns what factors are most informative for predicting heat pump uptake from historical data about individual properties. An alternative model takes the slightly broader approach of predicting the growth in heat pump installations at a postcode level instead of individual households, indicating which areas are more likely to adopt heat pumps in the future.

This approach uses a geostatistical framework to model heat pump uptake on a postcode level. First, we aggregate the household EPC data to summarise the characteristics of the properties in each postcode (for instance, their average floor area). We then model the number of heat pumps in a postcode based on the characteristics of its properties and on the level of adoption in nearby postcodes. This allows us to uncover patterns specifically related to the spatial distribution of heat pump adoption, which can be represented on maps in an accessible way.

We began by exploring the data and experimenting with these methods in order to determine how to proceed. During this exploratory stage we found that, historically, heat pump adoption has been positively correlated with property size, whether a property is detached, and whether the property is located in an area without access to the gas grid or an area where the residents have comparatively high incomes. Predicting which households would install heat pumps in the future, however, was more challenging.

Challenges

We faced a number of challenges in obtaining reliable predictions of future heat pump adoption. These ranged from practical considerations about the data to more conceptual problems.

Missing data

The EPC register contains a large amount of detailed data, but there were several challenges to overcome before we could use it to predict heat pump adoption. First was the issue of missing data: only around half of all domestic buildings in Great Britain appear on the EPC register and the dataset does not include any Northern Irish properties. While this is still a high proportion of British properties, it means that several postcodes have very little data associated with them and therefore predictions of their heat pump uptake are prone to error.

There are also some missing values within EPC records - for instance, not all newly built properties are labelled as such in the data. We were primarily concerned with identifying properties that have had a heat pump retrofitted, but this lack of data made it difficult to separate these properties from ones that were built with a heat pump – potentially a very different type of property.

Questions of time

EPCs are snapshots of properties at particular points in time, and they cannot tell us exactly when a heat pump installation took place. We only know that a property has a heat pump if it is recorded on its EPC at some point, and if that property also has an earlier EPC stating that it does not have a heat pump, then all we know is that a heat pump installation took place between the dates of these two inspections. This makes predicting exactly when a household is likely to install a heat pump particularly challenging. On top of this, we cannot know with certainty that properties with no record of a heat pump in EPC data do not actually have a heat pump; all we know is that they did not have a heat pump at the time of their last EPC inspection.

Care is also required when deciding which property characteristics to use for modelling and at what point they were measured. If we naively looked at properties that already have a heat pump and predicted that similar properties are more likely to install one, then we would run into problems when considering characteristics such as electricity usage: a property with a heat pump uses more electricity than a comparable property with a gas boiler, but it would be incorrect to assume that properties that use a lot of electricity are more likely than others to get a heat pump installed. To achieve more reliable predictions, we need to consider the characteristics of properties with a heat pump as they were before the installation took place.

Missing context

Over the course of our exploration it became clear that physical building characteristics would not be sufficient to predict heat pump adoption. We found that we would need to supplement EPC data with data relating to householders’ socio-economic status and environmental values, which research has shown to be correlated with heat pump adoption. This data (justifiably!) does not exist openly at the individual household level and we would therefore need to use estimates or proxies on a broader geographic scale.

We were also conscious that historical heat pump installations were heavily influenced by the government support that was available at the time and any predictions we made about the future would be missing this vital context. For instance, a household choosing to install a heat pump when funding was available through the Renewable Heat Incentive may be different to one choosing to do so in 2022 after the launch of the Boiler Upgrade Scheme, and we had no way of definitively quantifying this difference.

Reflections

These challenges cast doubts on the accuracy of our predictions. We also began to question the value of predicting which households would “naturally” install heat pumps when there were several other alternative perspectives to consider: we could instead try to identify households that stood to benefit most from a heat pump, properties whose emissions could be reduced most significantly, or groups that are unlikely to adopt “naturally” but towards whom resources could be targeted in order to increase heat pump adoption most significantly. These framings could potentially have a greater impact than our original aim.

Ultimately at the end of the exploratory phase we decided not to continue with the project in its original form. Instead, we developed some more specific research questions and activities that we may do in the future. These included:

  • identifying locations with low heat pump uptake but with similar characteristics to high-uptake areas, to obtain a better understanding of the barriers to heat pump adoption
  • creating a broader set of indicators related to heat pump adoption, such as the concentration of early adopters, availability of installers and potential reduction in emissions, to help inform where local interventions may be needed.
  • developing our understanding of which physical property characteristics make heat pump installations particularly easy or difficult, then using EPC data to extrapolate this across the UK to identify regions that may need more investment in optimising properties to prepare for heat pump installation

What we learned

Even though the project did not lead to a definitive prediction of heat pump adoption, it still generated some useful insights. Our findings about the characteristics of households that are most associated with heat pump adoption corroborate those from other studies and complement our work in developing personas of early adopters of the technology.

Our time spent exploring and building tools to process EPC data has given us a much richer understanding of the potential and limitations of this dataset, which we will use as part of many other analytical projects. In particular it highlighted the need for a sufficiently clean version of EPC data to enable better analyses. Our code for processing EPC data and modelling heat pump adoption is hosted publicly on GitHub and we hope that others can find value in it for their own work.

Author

Christopher Williamson

Christopher Williamson

Christopher Williamson

Junior Data Scientist, Data Analytics Practice

Chris was a junior data scientist in the Data Analytics Practice, embedded in the sustainable future mission team.

View profile