The challenge

Languages hold the key to unique cultures and communities, and their preservation helps to maintain diverse perspectives in the world. Rare languages with small populations of speakers, such as the Quechua language in Peru, are facing the threat of extinction through lack of use and visibility. In contrast to languages such as English and Spanish, whose popularity and reach have been amplified by digital technology, it can be difficult to find material written in native South American languages on the internet. In total, there are more than 40 native languages in Peru in danger of extinction.

Siminchikkunarayku interface

The AI and CI solution

Siminchikkunarayku is a Peruvian foundation whose mission is to protect indigenous languages such as Quechua through crowdsourcing, digital technology and AI. The foundation gathers multimedia content from artists, researchers, engineers and linguistic communities, with the aim of assembling an online corpus of the endangered language that can be used to develop natural language processing (NLP).

AI tools, such as automated machine translation. This relies on mobilising communities who have access to this rare cultural knowledge to create a large enough dataset to train the AI models.

So what?

Siminchikkunarayku is a unique example of native language preservation through a combination of CI and AI. The initiative is still in the early stages, building the Quechua corpus through direct engagement and crowdsourcing from Quechua speakers. As of January 2020, they have worked with 1,200 volunteers to collect audio recordings and translations of the language. The project aims to create a Quechua module on Duolingo, the global language-learning platform, to help it achieve interoperability with the most common languages worldwide.


Similar initiatives
COCOHub.cc
Niger-Volta Language Technologies Institute GitHub for West African languages
Wikipedia Scribe