Community-Led African Language AI

|

KivuLingua AI is an initiative dedicated to the creation of the first open infrastructure for speech data and AI models for Eastern Congo's Bantu languages, beginning with Mashi (1.9M speakers). We build ethical ASR and TTS systems anchored in real-world use cases: literacy education via Teaching at the Right Level (TaRL) and community health communication, ensuring continuous linguistic datafication and digital sovereignty.

200+

Native Mashi speakers and contributors already mobilized for community validation, linguistic preservation and participatory data governance.

8

Underrepresented Bantu languages across Eastern Congo (North Kivu, South Kivu) prioritized for sustained ASR, TTS and multilingual AI development.

250h+

High-quality aligned speech corpus in Mashi collected across diverse genres: read speech, spontaneous conversations, oral narratives and functional content.

Apache 2.0

All datasets, models, and tools published under open licenses on Hugging Face, GitHub and Zenodo as African digital public goods.

Project Overview

Explore Our Work

KivuLingua AI is building open infrastructure for Eastern Congo's Bantu languages. Click below to learn more about each area.

Ready to Get Involved?

We welcome linguists, ML engineers, educators, health workers, and community advocates. Whether you want to contribute data, collaborate on research, or support our mission, join us.