|
KivuLingua AI is an initiative dedicated to the creation of the first open infrastructure for speech data and AI models for Eastern Congo's Bantu languages, beginning with Mashi (1.9M speakers). We build ethical ASR and TTS systems anchored in real-world use cases: literacy education via Teaching at the Right Level (TaRL) and community health communication, ensuring continuous linguistic datafication and digital sovereignty.
200+
Native Mashi speakers and contributors already mobilized for community validation, linguistic preservation and participatory data governance.
8
Underrepresented Bantu languages across Eastern Congo (North Kivu, South Kivu) prioritized for sustained ASR, TTS and multilingual AI development.
250h+
High-quality aligned speech corpus in Mashi collected across diverse genres: read speech, spontaneous conversations, oral narratives and functional content.
Apache 2.0
All datasets, models, and tools published under open licenses on Hugging Face, GitHub and Zenodo as African digital public goods.
Project Overview
Explore Our Work
KivuLingua AI is building open infrastructure for Eastern Congo's Bantu languages. Click below to learn more about each area.
Education Impact
Teaching at the Right Level (TaRL) literacy programs in mother-tongue Mashi for 500+ students across rural Bushi schools.
Community Health
Voice-based mobile app for 30+ health workers delivering prevention messages and recording patient data offline in Mashi.
AI Models
Edge-optimized ASR and TTS models for Mashi (CPU-only, offline deployment on Android and low-cost devices).
Community Team
Interdisciplinary team of AI engineers, linguists, and community leaders. Consortium with African STEM Resources Hub & Kwetu Best.
8 Bantu Languages
Scalable approach: Pilot with Mashi, document kinande/kihunde/kifuliru in Phase 1, expand to 8 languages in Phase 2.
12-Month Timeline
Structured implementation from July 2026 to June 2027: Foundation → Collection → Processing → Deployment & Publication.
Ethics First
Community sovereignty before openness. Informed consent, gender balance, conflict-sensitive protocols, data governance.
Open Science
All resources (corpus, models, tools) published under Apache 2.0 on Hugging Face, GitHub, and Zenodo as African digital public goods.
Ready to Get Involved?
We welcome linguists, ML engineers, educators, health workers, and community advocates. Whether you want to contribute data, collaborate on research, or support our mission—join us.