People with a passion for data: An interview with Lucy Knight
There are pockets of data innovation all around the country where people with a passion for data are making things happen for their communities. Get in touch if you're one of them.
Last week, I spoke to Lucy Knight, artist, technologist and connector, with one foot in Devon Council and the other in the data activist and third sector community. Here's what she had to say about her role as Data Lead at Devon County Council.
What is your role, and how do you work with the open data community in Devon?
I describe my role as ‘Data Lead’, and it’s essentially what the private sector would call a Chief Data Officer. I get involved in any project where the use of data is a potential issue, particularly where we need to think about how to use our data better or connect up our systems and information to improve how we deliver a service.
Typical examples from my current to-do list are:
- Building a prototype information dashboard for our social care market position statement
- Testing how to open up the datasets we generate for the Joint Strategic Needs Assessment of the health and wellbeing of the local community
- Delivering training to colleagues
- Writing our strategy on data asset management.
As a co-founder of the Open Data Institute Node for Devon, I also keep in touch with developments in technical and ethical disciplines around open data publishing, so I act as a liaison between the council and a much wider network of experts and activists. This has helped us to get in touch with local developers and data consumers, through our own events and other meet up groups, so I’m lucky to have a wide range of skills on the doorstep.
I think it is great that Devon has the imagination to have a role like yours. Should other councils think about having such a role?
The things I do in my day to day role at the council are just about getting the most out of our data, but that doesn’t appear in my job description or title on paper. It’s just something I’ve moved towards over the last three or four years as a result of my specific skills and interests; and my line manager and head of service have supported it because they could see the value. I’m sure a lot of councils already do have people doing what I do, but it may not be visible or explicit.
In every authority there will be people who are the super-connectors, who participate in events, who use social media and collaboration tools, who play with data and code and see what can be done with it
The benefit comes when it’s clear that the organisation values those skills and connections, as it sends a signal that the authority is genuinely interested in being open and learning from feedback.
We had some great discussions at the recent Open Data Camp in Cardiff, with several of the UK Nodes represented; it was interesting to hear how much it helped other Nodes to have at least one of their members be part of the local authority.
Even though in some cases, and this is true in Devon, the Node is a separate legal entity - not formally attached to or supported by the council - there’s a huge benefit in having someone have the connections to the data owners and publishers. Working inside the local government boundary, I can ask around and help people outside form relationships with the services they want to work with on opening or consuming data.
What are some of the win-wins about the arrangement?
The whole thing is an upward spiral of positive reinforcement, for me. The more I make new connections, the more people form a positive impression of the council; the more favourable an impression we create, the more interesting people want to work with us on really useful projects. It just keeps building itself.
There’s also a huge benefit to being approachable and open to people outside the council, because the feedback the developer and activist community have given us over this time has been invaluable
For instance, if I want to try to release a particular dataset, because I think it might be useful to someone, I can use GitHub to publish it in the simplest possible way, and frequently will use Twitter to put the word out and ask for people's thoughts.
Even if the feedback is about what we’ve done wrong (and it frequently is!) or the mistakes we’ve made with the data format or content, at least we found that out before we committed large amounts of resource to an unstoppable waterfall project plan. I’m all in favour of robust planning but we need to be responsive to the needs of the potential users of our information if we’re going to make our data useful.
What are some of your success stories?
We’re beginning to see third sector organisations looking to use our open data, which is a sector not traditionally associated with technical innovation in this part of the world. Devon Communities Together, which is a support organisation for local charity and community groups, invited us to be part of their open data working group last year, with membership from all sectors and districts in Devon.
The group was set up specifically to look at how we can all make use of open data in our planning and service provision, and that’s resulted in some stronger relationships for the council and the other organisations. It's also delivered a shared project to create an open data web map identifying combined impacts on vulnerable people in rural areas, and to then think about creating information tools and services that will support them to get help.
Our internal use of other people’s open data is also helping us to streamline our own data collection and analysis work
We are beginning to cut out duplication of things like national performance benchmarking, lists of schools, lists of care providers etc because we can use open registers and the Application Programming Interfaces (APIs) from the Care Quality Commision and Local Government Inform to pull in the very latest data.
Our own Pinpoint directory has an API that developers can use to build information sites and services, and we’re looking into how we use that ourselves in our internal reporting and tools like the Market Position Statement dashboard.
Does it need to be expensive? What tools do the open data community use - particularly free and open source tools?
Keeping up with technical networks means I have a good idea of what’s available and relevant to new projects; and we’ve found that working with free and cheap tools lets us test ideas quickly at next to no cost. If something has the potential to be adopted on a wider scale we can look at which corporate systems can cope with the task, or what we may need to procure or build. But we keep in mind the lessons we learned with the quick and cheap prototypes, and get the major mistakes out of the way (hopefully) while the stakes are still low.
Everyone has their favourites, but my experience has been that there’s a lot of use of GitHub for sharing data and code, where a full data portal would be overkill. If people do want a data portal then CKAN, which is open source, is a popular choice. I also see a lot of people using the free tiers of services like Heroku and Zapier to get prototype data apps up and running.
For data visualisation and sharing dashboards, you’ve got Tableau Public, plot.ly and d3, and for mapping there’s Leaflet, CartoDB and QGIS, all free/open source or with a free tier.
If you asked around the network you’d probably hear another 10 different apps and software libraries named! There’s a huge range out there.
What about data quality - how do you deal with data that's not cross referenced?
There are constant discussions in the open data community around standards, schemas and linked data - that would be a whole other blog post. If we’re using a dataset for the first time, just to evaluate it, we can do a lot with text matching and lookups if the standard reference columns we’d need don’t exist.
It’s different when you need to use the data in a product or a service, and it has to be usable without any manual intervention; developers will use whatever communication channels they have, including Freedom of Information requests if necessary, to ask the data publisher for a specific field to be added to the dataset next time.
Examples we’ve had would be adding registered company numbers to the Spend Over £500 dataset, so it can be matched up to Companies House information. Another pet peeve I hear is that the public sector uses easting/northing coordinates rather than longitude/latitude for geographic point datasets, and developers always ask us to change that.
Does anyone have any further insights? I am wondering why so few data scientists make use of free tiers cognitive computing and machine learning tools to deal with matching or cross referencing dirty address data, for example? Any ideas?
Please comment below or on Twitter