This article is one in a series that walks through our key algorithms in the Open Jobs Observatory. Previous articles in this series have described how we created algorithms to assign adverts to occupation groups and detect skills mentioned within the text of the adverts.
All the aggregate data series in the Observatory can be downloaded from Github. We aim to update the data on a monthly basis. Unfortunately we are unable to share the job adverts that we have collected.
This article describes the method that we use to collect standardised locations from job adverts, which is a key step to providing insights in our Open Jobs Observatory (OJO).
The Open Jobs Observatory is the UK’s first-ever open repository of insights about the skills requested by employers in job adverts. We began collecting job adverts in January 2021, and have already amassed several million job adverts. We created the Observatory to provide free and timely access to information on skill demands. Collecting locations from job adverts is particularly important because it allows us to examine how skill demands vary across the UK, and to then identify the ‘skill-specialities’ of any given region. Having a localised view of skill demands may also enable job seekers, educators and local authorities to tailor activities to suit their local skills landscape.
Through the Observatory we are also aiming to fill a methodological gap, which is the lack of open resources for analysing job adverts. We have published the code that we use to extract insights from job adverts and have written this series of articles that walk through our methodology. We hope that this will enable other users of job adverts to benefit from, and build upon, our efforts.
There are a number of difficulties in identifying the location of a job from a job advert:
A broader and less tractable challenge is that the location mentioned in the job advert may not correspond to the location of the individual appointed to the role. The COVID-19 pandemic has led to a sharp rise in remote working. Moreover, the location specified in the job advert may refer to the head office of the company or to the location of a recruitment company. This challenge will continue to be monitored as the Observatory grows.
The key component of the methodology is building an 'index' of locations which can be used to look up the locations mentioned in job adverts. We have adopted the ONS’ Index of Place Names, as it provides a large number of location names (almost 90,000), and for each it gives a latitude and longitude. With this information, we can use Nesta's open source python package nuts_finder to extract further details for each place in the index. NUTS refers to the Nomenclature of territorial units for statistics and is a geographical nomenclature that subdivides the European Union and the UK. Given a latitude and longitude, the nuts_finder package extracts the level 1, 2 & 3 NUTS regions for the location.
Our 'index' can be supplemented with additional sources of location data, as long as these have a latitude and longitude, from which we can extract a location hierarchy. A location hierarchy allows us to group locations by their size at each level of the hierarchy, and link locations between differing levels. For example, a very granular location may fit into a location hierarchy like so:
Hillsborough → Sheffield → South Yorkshire → Yorkshire and the Humber → England
Before using the index, the locations mentioned within adverts are lightly cleaned. This involves removing punctuation, and making all letters lowercase. We then attempt to match the location to our index, saving the results of the match.
An alternative approach to locating job adverts would be to build a set of location names using job advert locations, and then spend time manually matching each of these to a standard location hierarchy. We decided not to pursue this ‘bottom-up’ approach as there are already many freely available datasets of locations that can be easily matched to standard location hierarchies. This allowed us to rapidly incorporate location extraction into the Observatory. The alternative 'bottom up' approach may also have caused the location extraction algorithm to only cater for the job board on which the prototype was built. This could have resulted in a large amount of manual location matching every time a new job board was added to the Observatory.
There are two key strengths in our preferred ‘top-down’ approach:
Alongside these benefits there are also limitations to our method that we will need to carefully monitor as we collect more job adverts:
Our first application of the location extraction method matched 91.3% of adverts to a location. However, it also highlighted immediate areas for improvement. Of the remaining 8.7% of job adverts, almost 40% referred to broad regions, such as South East England, and Yorkshire and Humberside. When we aggregate to finer location levels (such as NUTS3 regions), these adverts will be excluded from the aggregation. We also discovered a significant number of non-UK based job adverts (0.7% of adverts have a location of 'Czech Republic'). These adverts are also excluded from any aggregations.
Bearing in mind that the dataset is not particularly large, that we are still in the process of improving the quality of the algorithm, and that we are not yet employing seasonal adjustment, the chart below shows the growth in the number of adverts between the first and second quarter of this year, broken down by the regions matched to the job adverts. We can see that there is a wide range of growth rates between regions, with our current algorithm indicating that all regions, except Wales, experienced increases in the volume of job adverts between the first and second quarters. However, due to the reasons mentioned above, we will need to further investigate these results to ensure that they aren’t caused by inaccuracies in our algorithm. We are currently entering a period of data quality analysis for our location extraction algorithm, so that we can be confident in the results that it provides.
As we collect more job adverts, there will be further opportunities to improve the performance of the location matching methodology within the Observatory:
As the Observatory grows, we will also have opportunities to extract additional insights:
The Observatory is a pilot project and we welcome your feedback and suggestions for future improvements. We are also seeking funding to keep the Observatory running. If you have suggestions or are interested in supporting the work of the Observatory, please reach out to us by emailing [email protected].