Open data set: Who are the 2017 General Elections candidates?
Last week, we published a blog on the low number of STEM-graduates among candidates this General Election (only 9 per cent). Now, we publish the data behind this research as an open data set, and provide more context to the numbers.
Our open data set contains detailed information about main party candidates running in this general election, including data on university degrees, professional background and basic demographic information (see the description of all variables below).
Because our research on STEM-graduates only looked at candidates representing parties that had at least one seat at the time of dissolution of parliament, and who are running in seats in which their party had at least 8 per cent of the vote last time the seat was contested (usually the 2015 general election), we did not devote as much resource to candidates that did not satisfy these conditions.
So, while we explored all Labour, SNP and Conservative candidates, because of time constraints we only looked at a subset of LibDem, Green, Plaid Cymru, and Northern Ireland candidates (only those above the 8 per cent threshold).
Despite this limitation, we think there are many interesting insights to be gathered from the data set.
We encourage anyone who would like to expand the current set, or is working on a similar project, to get in touch
Data was manually collected in the weeks since parties released their definitive candidate lists, and is based on publically available information from Wikipedia, official party websites, LinkedIn and local news sources. In many cases (about 30 per cent), very little information was available on an individual candidate, often not more than a picture or a name on a party website.
This surprising lack of transparency is problematic, and hampers the democratic process: how can voters make an informed decision about choosing their local representative with so little information available?
It furthermore makes it hard to conduct the kind of analysis we have undertaken in this blog series, and to address important questions about the underrepresentation of candidates of, for example, a certain gender, ethnicity or socioeconomic background.There are interesting initiatives out there addressing this issue, such as Democracy Club, which collects the CVs of candidates. But there should be greater expectations of openness from the parties representing us in Westminster.
We strongly encourage parties to be more proactive in releasing information, such as the professional and academic background of their candidates, in future elections.
When using the dataset, it is not only important to keep in mind that we are missing data on many candidates because of lack of availability, but also rely on unverified information in the public domain, such as the kind sourced from Wikipedia. Consequently, there may be mistakes in the data, and we encourage readers to get in touch and help us correct fields based on imperfect or incorrect information.
Feel free to use this data for further analysis. We would be keen to hear about any interesting question readers have attempted to answer using the data, or discuss how we can make this data easier to use. Nesta is not responsible for, nor does Nesta endorse, any conclusions about individuals or political parties that may be made and publicised based on our data.
Description of fields in the dataset:
Constituency: Constituency candidate is running in (e.g. “Aberavon”, or “York Central”)
Region: Region constituency is in (e.g. “Wales”, “West Midlands” or “Yorkshire & Humber”)
Country: Country constituency is in (i.e. England, Northern Ireland, Scotland, Wales)
Current holder: Party that held constituency seat at time of dissolution of parliament (May 3, 2017)
Margin: Margin by which the current holder won seat the last time it was contested (generally the 2015 General Election, or otherwise the most recent by-election)
Full name: Full name of the candidate, as released by respective party
Party: Party the candidate is running for (i.e. Labour or Plaid Cymru)
Sitting MP: Whether or not a candidate was a member of Parliament by the time of the dissolution of Parliament
MP Since: The year a candidate became a member of parliament
Gap in term: In some cases, MPs did not serve all years since first entering parliament. Those years are marked (e.g. did not serve between 2010-2012)
Date of Birth: Date of birth of candidate. When only the year is available, DoB are listed as January 1st (e.g. 1-1-1973)
Place of Birth: City or town of birth of candidate, or the wider area if more detailed information is unavailable
Gender: Gender of candidate. This is often based on limited information available in the public domain. We apologise for any mistakes and encourage you to get in touch if this information is incorrect (or no longer correct)
Secondary School: Last secondary school a candidate attended (e.g. Eton College). In some cases schools may have changed names since
University undergraduate degree: University where candidate completed undergraduate degree (e.g. University of Leicester). In some cases, colleges within universities are also listed (e.g. University of Oxford- Magdalen College). Candidates that did not attend university are listed as “None”, and as “NA” when information was not available
Field undergraduate degree: Degree field undergraduate degree- exact names have been copied over (e.g law, Spanish or PPE). Candidates that did not attend university are listed as “None”, and as “NA” when information was not available
STEM Undergraduate Degree: Lists whether a candidate’s undergraduate degree is a STEM-degree, based on the official HASE definition.
University postgraduate degree: University where candidate completed post-graduate degree (e.g. University of Leicester). In some cases, colleges within universities are also listed (e.g. University of Oxford- Magdalen College). Candidates that did not complete a postgraduate degree are listed as “None”, and as “NA” when information was not available
Field postgraduate degree: Degree field graduate degree- exact names have been copied over (e.g law, Spanish or PPE). We do not list the type of post-graduate degree (e.g. PhD, MSc, etc.) as these were unavailable in too many cases. Candidates that did not complete a postgraduate degree are listed as “None”, and as “NA” when information was not available.
STEM Post-graduate Degree: Lists whether a candidate’s undergraduate degree is a STEM-degree, based on the official HASE definition.
Degree category: We have manually classified each candidate’s degrees within a smaller set of categories, using the following rules.
- Preference is given to postgraduate degrees over undergraduate degrees (e.g., a PPE undergraduate and law graduate will be categorised as “law”).
- If a candidate completed similar-level degrees in multiple fields (e.g. Philosophy and Mathematics), both fields are listed in the degree category field, separated by a comma.
Because of the preference given to postgraduate degrees, some predominantly undergraduate fields (such as PPE) would be undercounted when aggregating by degree category alone. When such as statistics are of interest, we advise using the undergraduate and postgraduate columns instead.
Merged STEM: If either a candidate’s undergraduate of postgraduate degree is in a STEM field (or both), this field is marked “Yes”, and a candidate is considered a STEM-candidate for our analysis.
Current or last job: This field lists a candidate’s current job (when not currently a member of parliament) or last job before joining parliament. This field has not been standardised (similar jobs may be listed under a variety of different names). When a candidate has had a long career with many different previous jobs, the most recent was selected.
Previously worked in parliament: Marks whether a candidate previously worked in parliament (Westminster, the European Parliament, the Scottish Parliament, etc.) in a non-elected role. Frequent careers include assistant to a member of parliament or special adviser. This list is likely underestimating rather than overestimating the number of former parliament-workers, given the paucity of data.
Other notes and further caveats:
- For some of the smaller parties, we have included the “8 per cent” threshold field, to help visualise why some candidates were analysed where others weren’t.
- If a field is listed “NA”, it does not necessarily mean data was not available. For the smaller parties, we did not analyse candidates running in seats below the 8 per cent threshold, empty fields for these candidates are also marked NA, even though detailed data may be available. NA may also mean “not applicable”, for example, in the case of non-current MPs, the “MP since” field is marked NA. Though this approach makes it harder to distinguish between these different categories of “NA”, it simplifies the data analysis process.
- Candidates are split in different sheets by party, with the exception of the candidates from Northern Ireland, which are all listed in the same sheet, as there are so few for each party.
- Though UKIP did not have a member of parliament at the time of the dissolution of parliament, we did include their full candidate list, and analysed a small set of prominent UKIP members. If you do have more detailed information available, please get in touch to generate a more detailed dataset.
- We do not distinguish between different schools within a university. E.g., a degree from Harvard Business School will be listed as “Harvard University”.