Mind the gap between the truth and data

Over the past couple of weeks, I’ve been lucky enough to represent Nesta at a number of events and meetings, which have focused on exploring the power of data and technology to address social challenges.

In New York, I attended the 3rd Global Data Commons workshop. I also met with Stefaan Verhulst from GovLab to talk about his work on data collaboratives and the 100 Questions initiative. And I met with Alexandra Mateescu, from Data & Society to discuss her report, AI in Context.

Closer to home, I spent last Thursday at the anthropology + technology conference in Bristol.

In trying to synthesise everything I’ve heard, debated and discussed, one overarching observation stands out. My key takeaway is this - in thinking about how to use data and technology to solve social challenges, the most important thing to consider is not what is there, but what is not.

What data is missing? Whose voices are missing? What untested assumptions are we making, and how do these obscure other truths?

I have identified six common gaps that are critical to consider when working on any data or tech-for-good project.

The six gaps

  1. Data gaps

In any project involving data, it is important to consider which datasets might have been overlooked. While certain datasets might be very clearly needed to answer a particular question, thinking beyond those obvious datasets is likely to paint a broader picture both of the challenge, but also of the potential solutions.

For example, in trying to reduce A&E presentations, what datasets would you look at? Labour market statistics might not be highest on the list. Yet, we recently heard from a Local Council that a key driver of increased A&E presentations is people working in gig economy jobs who find it much easier to visit A&E than to book a GP appointment. Understanding this led the Council to offer different solutions.

In thinking about missing datasets, it is important to recognise two additional points:

  1. Quantitative data cannot answer everything. To add the necessary depth to the insights, qualitative data has a role to play too. I’ve written about that previously.
  2. Datasets capture only a representation of reality. Think of GDP, which purports to reflect productivity, but completely excludes women’s unpaid work which, if accounted for, would add $10 trillion dollars to global GDP. It is important to recognise that the composition of datasets - and decisions around which data-streams are included and excluded - may be informing your understanding of reality. With that in mind, it is critical to interrogate what information might be needed to paint a more complete picture.

It is also important to consider not just what datasets are missing, but also what might be missing from the datasets you are working with. This slightly blurry photo taken on the New York subway sums it up pretty well:

Photo of data gaps

This poster, spotted on the New York Subway, is a call to arms to help close gender, racial and age gaps in medical research

In addition, in the book Automating Inequality, Virginia Eubanks notes that some datasets may have data gaps because the government cannot capture data about private services. When professional middle-class people pay for private support services like drug and alcohol counselling, the government has no way of knowing they are accessing those services. In contrast, when working-class people rely on government to provide those services for them, that data is captured by the Council (as a black cross against their name). This creates a serious imbalance in the datasets, and again means they are painting a skewed picture of reality.

The Open Data Institute’s (ODI) Data Ethics Canvas offers some helpful prompts to help identify these data gaps.

2. Unclear purpose

Another gap that arises in these projects is a lack of clear purpose - in other words, what specific question are we trying to answer? And what would be done differently as a result of gathering this data, or developing this technology?

For example, at one of the events that I went to, someone proposed using data to create a poverty map.

What is the specific question that we would be trying to answer in creating a poverty map? As highlighted by the 100 Questions initiative, “unless we define the questions well… how can we provide answers that matter?” Questions need to be specific and actionable. The NOLAlytics site has some great suggestions around how to frame appropriate questions for a data analytics project.

In addition, what would be done differently as a result of the poverty map? What real world outcome are we working to enable? For data insights to lead to tangible action, it’s important to define who will act, and where and when they will do so. This is often overlooked in favour of an approach which seems to assume “if you build it, they will come.” But we know, by observing patterns of use of open government data, that’s not true.

To help clearly define the change the project is trying to achieve, and the action needed to get there, Nesta’s theory of change template is very helpful.

3. Missing voices

It is tempting, when working on complex data and tech projects, to only involve people with data and technical expertise. But these are not the only voices needed around the table.

It is critical to also hear the voices of those who are expected to act on the insights, or use the technology, as well as those who will be impacted by the project.

Let’s take the poverty map example again. In order to understand whether it will, in fact, have a real impact in the world, the charities, NGOs and community partners who would be expected to do something differently as a result of that map need to be asked whether it is, in fact, what they need and want. In addition, where feasible, the intended beneficiaries of the project - in this case, those experiencing poverty - should be included as part of the conversation.

A co-design approach improves the chances that the project will result in real change.

If you wouldn’t know how to begin in how to effectively bring different voices into the room, there are a plethora of ‘human-centred design’ toolkits and course available online. Ideo.org offers this guide, and 18F offers a collection of Human-Centred Design Tools.

Beyond just including those who are expected to execute the project, it is - more generally - important to include diverse voices in the project. The profound effects of a homogenous group of people designing answers to problems that affect diverse groups is brought starkly to light in Caroline Criado-Pérez’s book “Invisible Women”, which calls out how gender-blindness results in products, medications and services that don’t work well for women.

How we collect and understand data, and design solutions to social challenges, is generally framed from the standpoint of the dominant racial, social and cultural majority. Introducing voices which challenge and disrupt this thinking is critical.

One approach to guarantee this diversity would be to introduce diversity quota to each project steering and working group. Another, softer approach, might be to introduce a diversity and inclusion strategy, which relates not just to the organisation’s internal hiring and practices, but also all projects that are pursued by that organisation.

4. Ethical gaps

At the anthropology + technology conference, Miranda Marcus from the ODI gave a fantastic Pechakucha talk, which explored the ethical implications of data collection, sharing and use (the name of this blog was actually inspired by one of her slides).

The ethical challenges associated with safely gathering, storing and using people’s data are now well-known. What is less clear is how to identify and manage those ethical challenges.

There are useful tools for this purpose, such as the already mentioned ODI’s Data Ethics Canvas , the UK Government’s Data Ethics Framework, and the UK Statistics Authority’s Data Ethics Self-assessment.

5. Unintended consequences

Another important thing to think about is the unintended consequences of projects. What could happen, for example, if this data or technology landed in the wrong hands and was used for bad, rather than for good?

At the anthropology + technology conference, Julien Cornebise spoke about his involvement in a project which used machine learning to automatically analyse satellite imagery, at country-wide scale, to be able to map the scale of destruction and conflict in Darfur. He recognised that the data he and his collaborators had collected could, if put into the wrong hands, be used for bad, rather than for good. For example, the maps could be used for targeting further violence. However, on balance, it was decided that the possible upside justified the risk, and so they continued with the project.

To help identify unintended consequences, Doteveryone has developed a consequence scanning kit, designed to support organisations to mitigate or address potential harms or disasters before they happen.

6. Context gaps

Data is often collected for one purpose, and then recycled and used for another. What contextual information about the original data needs to be understood to ensure it is fit for purpose?

In this article, Nathan Lau suggests that it is important to understand the following: Who collected the data and is it a reputable source? How was the data collected? When was the data collected, and is it current enough? Where was the data collected and are the conditions in the place of origin sufficiently similar for the findings to translate? Why was the data collected, and might there be any biases baked in?

Understanding context is important because, as Nathan explains: “Using data without knowing anything about it, other than the values themselves, is like hearing an abridged quote secondhand and then citing it as a main discussion point in an essay. It might be okay, but you risk finding out later that the speaker meant the opposite of what you thought.”

A data scientist, analyst or statistician should be able to support an interrogation of the data’s context. In addition, MIT’s Data Nutrition Label project - once it moves beyond prototype - should help make interrogating the data’s context more straightforward.

Why gaps matter

Why do these gaps matter? To make this all a bit less abstract, let’s use an analogy based on the image below.

Imagine you had to climb this staircase. What would you do?

Man on staircase into clouds

You could take a flying leap and hope for the best. But there’s a good chance that the gap is too large, and you will fail in your attempt to make it to the other side.

Alternatively, you could walk back down the stairs, gather the materials needed to fill the gaps, and then attempt again; this time, with a much higher likelihood that you’ll reach the desired destination.

Working out how to fill the gaps isn’t easy. It can be time consuming. Expensive. Complex. But… as shown above, the investment means you will almost certainly get to where you want to get. Whereas pushing ahead so as to save time and money creates a real risk that you’ll end up somewhere far less desirable.

But this all assumes that you know the gaps you’re confronting. Imagine climbing this same staircase blindfolded. It is inevitable that you will fall through the gap.

It is this second scenario that is often more reflective of the reality. The gaps aren’t known. People aren’t deliberately leaping across the void. It’s just that they assume that the staircase will continue. That assumption is built on their life experience - they’ve never walked up a staircase with a big hole in the middle.

Assumptions arise when information is incomplete, context isn't accounted for, or voices aren't heard. This is why assumptions can be dangerous and need to be interrogated. Imagine how valuable it would be if, while climbing the staircase blindfolded, someone was there who had fallen through a staircase before. They would likely alert you to that risk - which you hadn’t even appreciated - and also offer ideas around how they successfully filled the gap and made it to the other side.

Having the gap pointed out to you, by having different voices in the room challenging your assumptions, would bring to light the gaping hole, and allow you to make an informed decision about how to proceed.

Addressing the gaps

So, what can be done to both identify and address these gaps?

I would suggest that each project should begin with a reflection on absence - who, or what, is absent, and what are the implications? The gaps identified above can be used as an anchor for this discussion:

  1. What data is missing?
  2. Do we have a clear purpose?
  3. Whose voices are missing?
  4. What are the ethical risks?
  5. What might be the unintended consequences of this work?
  6. What is the data context?

Reflecting on this list, it seems like it could comfortably apply beyond data and tech projects, to policy development, service design, and more. Across many fields, we should be using questions like this to support rigorous thinking about who and what is missing from the project, and what might be done to address those gaps.

There are also structural ways to support the identification of these gaps. Adopting, for example, a cross-disciplinary approach to projects introduces different disciplines and fields of expertise into the room, which helps to avoid group-think. Ensuring that the group is sufficiently diverse is also important for this reason.

There are also some great tools to work with, which are included for each gap identified above.

Acknowledging the gaps is the most important step. Without knowing the gaps, you cannot calculate the risks. It is only once the gaps are identified - once you know how large they are, and how deep the fall is - that informed decisions can be made about whether to take the leap or walk back down the stairs and come back with the materials needed to build the bridge to get you safely where you want to go.

I’d also like to finish by offering my own acknowledgment that there are no doubt gaps within my own list of gaps. I’d love to hear what you think should be added to this list.

A big thank you to Jack Orlik for his input and ideas, which made this blog much better than it was. And to Camilla Bertoncin and Rosalyn Old for their feedback.

**

Since writing this blog, I have worked with Kelly Duggan, a Learning Experience Designer in Nesta’s Innovation Skills team, to create a practical tool that people can use to convert the ideas outlined in the blog into practice. It's call Map the Gap, and it's available to download here.

Author

Thea Snow

Thea Snow

Thea Snow

Senior Programme Manager, Government Innovation

Thea was a Senior Programme Manager in the Government Innovation team.

View profile