About Nesta

Nesta is a research and innovation foundation. We apply our deep expertise in applied methods to design, test and scale solutions to some of the biggest challenges of our time, working across the innovation lifecycle.

More about us

Common Voice

Crowdsourcing voices to train speech recognition software

The challenge

Most of the software and voice data that powers the personal assistants in our smart devices is locked up in privately owned systems. Getting access to good‑ quality data takes time and money. As a result, the cost of developing speech recognition and other software that relies on voice data is prohibitively high, giving a few companies a monopoly on these services. There is also little transparency about what data has been used to develop smart assistants, meaning that certain populations can remain underserved. These limitations make the technology less effective for some groups, such as non-native speakers with accents, or for languages spoken by small populations.

The AI and CI solution

Common Voice is a Mozilla initiative, which addresses this challenge by developing the world’s first open-source voice dataset and a speech recognition engine, called Deep Speech. The concept is simple. Common Voice crowdsources voice contributions through an online platform where users are invited to record themselves reading sentences. All sentences are sourced from texts that are under a Creative Commons license , to ensure they can be freely reused by researchers and entrepreneurs in the future. Users can also listen to and validate the contributions from others to ensure that the data is of high enough quality to train an AI algorithm. The market’s leading voice technologies are powered by deep learning algorithms, which can require up to 10,000 hours of validated data to train.

So what?

As of January 2020, users have recorded almost 2,500 hours of their voices in 29 different languages for Common Voice. The aim of the project is to ensure that the data used to train voice recognition tools represents the full diversity of real people’s voices. Each data entry contains an audio file with the linked text, as well as any associated metadata about the contributor, if it is available. By making the datasets open, Mozilla is creating opportunities for a wider range of researchers, developers and public sector actors to develop voice technologies that can benefit a wider range of people. This accessibility can help to incentivise innovation and healthy competition for better tools. Mozilla released the first version of Deep Speech in 2017.

Common Voice is an example of how a collective intelligence (CI) approach to data collection – that emphasises diversity and open access – can be used to improve the development of AI, which in turn has the opportunity to be used for other CI purposes.

We extend our impact through two specialised units that help people and organisations to solve complex problems and achieve their goals.

BIT

BIT helps clients from government, nonprofits and the private sector to improve people’s lives through our empirical problem solving and deep understanding of human behaviour.

Challenge Works

Challenge Works designs and runs challenge prizes to spark innovation in science, technology and society.

Get our regular newsletter and tailor your updates on our missions, programmes and events

Join our mailing list to receive the Nesta Edit, our regular newsletter showcasing how we design, test and scale solutions to some of society's biggest challenges, with updates from Nesta, BIT, Challenge Works and the wider innovation sector.

* denotes a required field

Sign up for our newsletter

First name:

Last name:

Organisation:

Job title:

Country of residence:

Take a deeper dive by signing up for our sector-specific emails too:

Early years

Environment

Health

You can unsubscribe by clicking the link in our emails where indicated, or emailing [email protected]. Or you can update your contact preferences. We promise to keep your details safe and secure. We won't share your details outside of Nesta without your permission. Find out more about how we use personal information in our Privacy Policy.

Common Voice

About Nesta

Common Voice

The challenge

The AI and CI solution

So what?

AI and Collective Intelligence: case studies

A powerful ecosystem for innovation

Stay up to date

Common Voice

About Nesta

Common Voice

The challenge

The AI and CI solution

So what?

Also of interest

AI and Collective Intelligence: case studies

A powerful ecosystem for innovation

Stay up to date

Stay up to date

Sign up for our newsletter