A challenge fuelled by Open Data – but what exactly do we mean by that?
Nesta and The Open Data Institute are running a series of seven open data challenge prizes in which we ask entrepreneurs, innovators and designers to solve a major social challenge using open data. But what constitutes ‘open data,’ what does it mean to ‘use’ it, and what are we doing to help people with this ambitious task? Let’s take these questions one at a time:
What do we mean by ‘open data’ for the open data challenges?
For the purpose of the challenges, we draw on the widely accepted Open Knowledge Foundation definition - ‘Open data is data that is freely available for anyone to use, reuse or redistribute for any purpose,’ under open licenses for example, Land Registry’s Price Paid data relating to house prices and the Met Office’s hourly weather observation data*. In our work, the open data definition excludes those datasets with an associated cost - regardless of the organisation or body who owns the data.
What are the challenges associated with using open data in this way?
As with any dataset, open data has some notable challenges when it comes to driving value for a new business or social enterprise. Not only does it imply considering different business models than if you were simply using proprietary data or closed data but there are also further considerations related to the nature of open data itself. We encourage participants on our Challenges to consider:
- Is the source of the data reliable?
- How frequently will the data be released?
- Is the data provided in a machine-readable format?
- What gaps are there in the data?
- How responsive is the data owner is if you need help?
We recommend that participants engage the data owner in their plans and their intended social impact as this will encourage them to continue releasing and provide them with useful case studies for future new releases.
What does it mean to ‘use’ open data in my competition entry?
Firstly, we are looking for products and solutions that use open data as a key resource. Entries will not be considered unless there is a clear elaboration of how at least one specific, existing, public sector, open dataset is used in the business. This does not mean that the product needs to be a website or app - it could be an in-person service that is driven by open data.
Secondly, as part of their approach, participants may also want to generate a new open dataset as part of the product - in isolation, this would not be sufficient. We would hope and expect that any data generated by the Open Data Challenge would be shared openly and would require that this new data is generated alongside the use of existing open data. And finally, participants may have a product or solution in mind that could help create a case to open up a new dataset. We are very supportive of this approach and would welcome participation from such teams, however, they will still need to be using an existing dataset to be eligible for entry.
Our previous winners and their use of data:
- Crime and Justice: Check that Bike uses a variety of open data to reconnect victims of bike theft with their bikes. They have also invested a great deal of time and effort in encouraging the police force to release the frame numbers of bikes reported to them as stolen.
Energy + Environment: Community Energy Manager is our latest winner and a predominantly in person service that helps community groups to support their local area by brokering energy efficiency improvements and generating savings for their community, helping to reduce carbon emissions, fuel bills and reduce fuel poverty.
What are we doing to help people to use open data?
The ODI and Nesta have a comprehensive challenge methodology which starts with a broad theme, such as Housing and ends with one team winning a £40,000 prize. The process is explained simply in this short animation, however, the focus of this blog is the open data and what we’re doing to help people use open data in their entry to the challenge. Behind the scenes, our approach is two fold:
- Desk research - in the early stages of the challenge definition, we research the available open data. By speaking to colleagues and data owners and trawling the internet and particularly data.gov.uk we gradually build a picture of the data that’s available, and what may be available in the lifetime of the challenge (c. 8 months). At this stage, we do some initial analysis - collecting together key information about the datasets and their ‘openness’ and categorising them against (fairly simplistic) sub categories such as social housing and private rental in the case of housing.
- Share the data - next we work together with data scientists to do some more detailed analysis of the quality of the datasets. We have developed a ‘quality indicator’ which considers facets of the data such as whether you have to register for access, how frequently it is released and the availability of metadata. For the last couple of challenges, this information has been presented in two different ways:
a. Data guide - a google document designed to be conversational and help people understand the content of the dataset without needing significant technical skill.
b. Data quality indicator - a spreadsheet with quantitative and qualitative descriptions of the quality of the dataset. Drawing on a variety of existing measures of data quality such as 5 star open data and the Open Data Institute’s Certificates. This doesn’t give insight into the content of the data so is best used in combination with the data guide.
We have actively decided not to reproduce the data on our own platform, as this would remove the direct connection with the data owner and ultimately interrupts the cycle of data release and improvement over time.
We hope that these data guides and quality indicators will provide a useful resource for the open data community over time and will increase use of some of the great data which is available at the moment. To note, participants are not restricted to using only the data that we make available - if you can find alternative sources of open data, we would encourage participants to share them with the community.
In some unusual instances, we may also work with third party organisations to clean specific datasets so they are more usable for the challenge. This is a service provided by the ODI and will often involve the completion of an open data certificate. At this stage we will work with data owners to improve the quality of their data and to advocate for the further release of data.
The open data ecosystem is constantly changing so it is almost impossible for us to include all the relevant and related datasets in our analysis at any one time. We often include some supplementary data (which is less directly related to the challenge question) in the data guide to try and overcome this issue. However we would also welcome any suggested additions - please email [email protected]. Do please also get in touch if you have any ideas to improve the presentation of the data - as we’re iterating each time we create one and want to make sure these guides are as useful as possible.
- Housing data guide / Housing data quality indicator
- Food data guide / Food data quality indicator
- Energy + Environment data guide
- Education data guide
- Crime and Justice data blog and github repository
Photo credit: Mikael Altemark at Flickr CC* Open Definition. Available from: <http://opendefinition.org/>. [16 July 2014].