Skip to content

Linking skills to occupations: Using big data to build a new occupational taxonomy for the UK

In our recent ESCoE discussion paper we’ve developed the first data-driven skills based taxonomy, or classification, of UK occupations. By linking skills to occupations, we hope the taxonomy will be of direct use to policy makers and employers.

The skills we need for work are changing

From automation to climate change and from globalisation to our ageing population, there are a myriad of factors changing the nature of work in the UK. These factors mean the majority of workers are in occupations with highly uncertain futures (Bakhshi, Downing, Osborne and Schneider, 2017). 

Amidst this changing landscape, policymakers, educators, businesses and individuals need timely information on how occupations are changing and how they can help workers to transition out of at-risk occupations where skills are becoming redundant. To generate these insights we need a framework that links skills to occupations.

The current occupational taxonomy does not map cleanly to skills

The main grouping (or hierarchical taxonomy) of occupations in the UK is called the Standard Occupational Classification (SOC). The SOC taxonomy assigns jobs to nine major occupation groups and then repeatedly splits each group three further times. The fourth layer contains 369 different occupation groups.

While the stability of SOC makes it ideal for reporting labour market statistics, the grouping is not particularly well suited for understanding skills. That’s because the initial split of jobs into nine major groups is based on differences in education and training levels (i.e. skill level), rather than on differences in the types of skills these jobs require (skill specialisation).

The initial emphasis of SOC on skill level means jobs that require very similar skills can appear in completely different major groups. In turn, it can be difficult to map skill domains onto SOC and to understand how changes in skill demands are affecting occupations. The structure of SOC also means workers may jump between different major groups over their careers as their skill levels rise.

We need a more timely way of capturing information on occupational dynamics

Expert curated taxonomies can be slow to adapt

Most taxonomies of occupations, like SOC (for the UK), ESCO (Europe) and O*NET (USA), are created through a process of consultation with experts. Keeping these taxonomies up-to-date can be resource-intensive and, as a result, they are often only updated periodically. At present, SOC is revised once every 10 years. Over such a period, the landscape for some occupations may change significantly, like it did for IT professionals between 2000 and 2010, necessitating the addition of new occupations to the UK SOC. We need a more timely way of capturing information on occupational dynamics.

Big data, in the form of job adverts, can help

Online job adverts, and the skills mentioned within these, can help us to develop an alternative taxonomy of occupations. Adverts provide detailed information on the skills required in different jobs. We can then group jobs into occupations that require similar skills.

The strength of job adverts is that they provide a near real-time source of information on skills. And compared to skill surveys, the skills in job adverts are typically more granular as the adverts give employers the freedom to directly describe their skill needs. That said, adverts do have limitations such as imperfect representativeness of the underlying occupations and a bias towards high-skilled professional occupations.

An emphasis on openness

We’re not the first to investigate the potential of online vacancy data. But to date, efforts have been concentrated largely in the private sector, by the likes of labour analytics companies, job search engines and recruitment agencies. While their research provides useful insights on methodology, the resulting occupational classifications remain proprietary. We are committed to sharing our methodology and, once finalised, the resulting taxonomy will also be shared publicly along with the algorithm used to generate it.

Our new data-driven taxonomy of UK occupations

The taxonomy we have built is based on 37 million UK online job adverts provided by Burning Glass Technologies. To cluster the job adverts into groups we used a range of machine learning methods such as document clustering and word embeddings.

Like SOC, our taxonomy contains four hierarchical layers. But unlike SOC, our first three layers group jobs that require similar types of skills. This allows us to automatically recommend occupations to individuals based on their skill capabilities. The fourth layer of the hierarchy distinguishes between jobs based on the offered salary and indicates skill level. Incorporating skill level allows us to measure an individual’s career progression within the same skill domain.

Using the taxonomy

Over the next six months, we’ll be working to show how the taxonomy, which links occupations to skills, can be applied to learn more about skill needs in the UK. As one example, we’ll be showing how the methodology can help us to identify new sets of skills and new occupations.

At the same time, we’re also building a skills taxonomy based on skill co-occurrence in job adverts. The UK doesn’t currently have a taxonomy of skills and the new skills taxonomy could be used to produce timely information on the demand for, and the return on (i.e. salary), different groups of skills. These insights can then be used by policymakers to prioritise investment in skill development.

More broadly, we hope our work shows how naturally occurring big data, such as online job adverts, can be used to build a smarter labour market.


Jyldyz Djumalieva

Jyldyz Djumalieva

Jyldyz Djumalieva

Data Science Research Fellow

Jyldyz is the Data Science Research Fellow at Nesta, working in the Policy and Research team. Jyldyz is interested in exploring large complex datasets, network analysis and machine l...

View profile
Cath Sleeman

Cath Sleeman

Cath Sleeman

Quantitative Research Fellow

Cath is the Quantitative Research Fellow at Nesta, working in the Policy and Research team. She is interested in scraping, analysing and visualising complex data.

View profile