Drawing red lines in a grey area

At Nesta we have become a leading figure in applying data science methods for building evidence in innovation policy, health policy, creative economy and the arts. But as our capacity grows, do we need to start questioning how far we can go with these methods? The ethics of data science is a grey area, but that doesn't mean that we should shy away from engaging with it.

I was recently in a meeting with a client who was concerned that using data science methods might tarnish their organisation’s image, or even fall foul of the EU data protection laws. Such concerns are increasingly common in the face of the Cambridge Analytica scandal. Beyond talks of foul play in the Brexit referendum and US election, other stories of algorithmic mismanagement have unfolded, such as the ‘flash crash’ of the pound in 2016, which have sown public mistrust in data science methods and the use of public data.

What does data science have to do with it?

Triple-helix of data science

Modern data science has emerged from the triple-helix of the World Wide Web, Computational Methods, and Artificial Intelligence. The World Wide Web, as an infrastructure, has enabled the rapid sharing and merging of large volumes of ‘Big’ data facilitated by computational methods. Artificial Intelligence, in turn, allows information to be disentangled from data; with applications including the identification of spam emails, Google translate, and finding important people in networks.

These seemingly uninterpretable methods can be combined and applied (unscrupulously or naively) to the misuse of data on a colossal scale. Even though it would never be our intention to misuse data against the public interest, Nesta needs to build a strategy for “doing data science” without violating our principles, namely:

  • Working for the common good;
  • Taking responsibility for the people whose lives are affected by our work;
  • Acting with integrity;
  • Treating everyone fairly and equally;
  • Being open about how we work and how we spend our money.

Why does Nesta “do data science”?

We do data science so that we can achieve Nesta’s core goals by making sense of the World Wide Web, by generating new insights from old resources, and ultimately by broadening the evidence base for policymakers and society. In our context, concerns arise about how personal (or sensitive) data might be used. The kinds of sensitive information we might encounter include academic histories, ethnicities, age and social networks. One set of ethical issues can arise when you consider that true biases from this information in the data can produce biased AI. Separately, it is important to question whether an individual would object to their personal data being used in unforeseen ways.

It should also be said that much of our work doesn’t involve any personal data, such as our work on job classification with the ONS. However in our other work, we have used large numbers of academic papers, including authorships; patent data, including inventors; meetup data, including member IDs; and Twitter data, including Twitter handles. But should the ethical alarm bells start ringing already? The analyses of these data include Arloesiadur, Creative Nation and The Westminster Twitterverse - and nobody would claim that these were ethically contentious.

It seems, therefore, that if we’re transparent about what data we have, ethical dilemmas seem to arise when you consider what we might do with the data - perhaps more than the data we use.

Where do we start?

From the monoliths to the minnows, organisations have been churning out ethical frameworks left, right and centre. Take Google’s AI charter, which sets out very broad principles that will guide its future selection of projects. But this doesn’t come close to explaining what data, methods and AI they will use - presumably so that they can accommodate a broad business portfolio. On the other end of the spectrum, DataKind has developed a very specific set of questions which its volunteer data scientists should ask themselves before making decisions. This is great but (and I caveat this “but” with a nod to the resource constraints of charities) it is not obvious to me that the data scientist in question is necessarily best placed to give their own work the green light. As opposed to Google’s proposition, this kind of charter perhaps shifts too much corporate responsibility onto its data scientists.

Constructive criticism aside, I acknowledge that coming into the race a little later means that we benefit from hindsight. On the back of work of others, I therefore propose the following: we will draw up a very concise, plain language, data science charter which will both guide our work and also provide clarity to our stakeholders, including members of the public.

Thoughts from our public dialogue

The latter point particularly resonates with my experience of a public dialogue, commissioned by Nesta’s inclusive innovation team, which I attended last week. My personal takeaways from this dialogue are as follows:

  • What data is used for appears to cause more concern than the data itself.
  • Targeted AI products, i.e. those which make recommendations to or about individuals, are the most contentious applications of AI. People strongly object to being labeled, or even vilified, by algorithms.
  • Using data science as tool of policymaking is significantly more agreeable than automated policymaking.
  • There needs to be public confidence building on the data science ‘black box’. Personally I feel there is a lot of fearmongering on this subject, which dismisses genuine logic behind algorithmic decisions, which are based on real patterns in data (to the nerds: think feature importance, decision trees, or model extraction). The black box only truly arises when organisations are not transparent about their methods and data.

It was particularly interesting to note that, whilst data seems to be a primary among experts, at first glance the public appears to be more concerned by methods. There may be good reasons for this, possibly because it is more immediately obvious how algorithms can directly impact our day-to-day lives, or perhaps simply because experts are more actively engaged in the wider discussion. I think that this needn’t be a sticking point, since these are two sides of the same coin: if you can’t produce an output for ethical reasons, then the data you can use will also be restricted.

Thoughts from our internal dialogue

Nesta has been discussing the ethics of AI for some time. In a recent non-technical workshop, it became apparent that being clear on our ethical boundaries is increasingly important as Nesta leads the way in using data science to address sensitive aspects of human behaviour.

There was a strong feeling that Nesta should retain some corporate responsibility for data science, even though the total number of data scientists at Nesta is relatively small. One approach which we are considering is the creation of a small non-technical data science ethics panel. The panel would evaluate risks and opportunities for Nesta, in terms of impact and reputation. Once a data source or analytical technique has been approved, it would be added to a public-facing list, which would offer public accountability.

A key consideration will be in applying ‘everyday ethics’, where possible, to data science projects. Let’s take an everyday example: it would be ethically dubious to stand in a restaurant guessing the ethnicities of customers. The data science equivalent might be predicting the ethnicity of customers using their names from Twitter. I acknowledge that the debate is more nuanced than this, but this approach at least applies a relatable ethical baseline.

Towards a data science charter

As we move towards a data science charter, it is clear to me that Nesta should offer a charter that balances corporate and individual responsibility, whilst remaining publicly accountable. In short: if our work affects how democratically elected people make decisions, then we must accept responsibility for our algorithms and interpretations. Furthermore, we must acknowledge genuine concerns that our increasingly algorithmic society is becoming opaque. On this, Nesta must be a part of public confidence building.

In the next two months we will draft a charter by drawing upon the work of others, our public dialogue, our internal discussions and also from your comments. In August we will present the draft charter to our peers in other organisations (if that’s you then please get in touch!). After this, we aim to have the charter ratified by our board of trustees.

Culture and ethics are always evolving, and so the debate on data science ethics will never be settled. The red lines in this grey area are contextual and personal and, with that in mind, we would like to know what you think! What would you like to see from a data science charter from us, Google or anyone else?

Other non-linked references:

10 principles for public sector use of algorithmic decision making

Me, my data and I: The future of the personal data economy

Algorithmic Transparency for the Smart City

Ten simple rules for responsible big data research

A Unified Ethical Frame for Big Data Analysis

Cabinet Office Data Science Ethical Framework

Author

Joel Klinger

Joel Klinger

Joel Klinger

Data Engineering Senior Lead, Data Analytics

Joel is Nesta’s Data Engineering Senior Lead

View profile