Language Ecosystem
Building datasets for underrepresented Bantu languages
KivuLingua focuses on community-driven speech and language datasets for indigenous languages across Eastern Congo, beginning with Mashi as the flagship pilot infrastructure.
Mashi (Shi)
ISO 639-3: shr
Speakers
1.9M
Region
Bukavu, Walungu, Kabare, Kalehe
Culturally foundational Bantu language of the Bushi region with relatively stabilized orthography and initial textual resources. Priority language for ASR and TTS system development.
Nande (Kinande)
ISO 639-3: nnb
Speakers
2.6M
Region
North Kivu, Ituri
One of the largest regional Bantu languages spoken by Yira/Nande communities. High sociolinguistic vitality but limited digital corpus for machine learning.
Hunde (Kihunde)
ISO 639-3: hke
Speakers
800K-950K
Region
Masisi, Rutshuru, Walikale
Historically significant language with extremely rare digital resources. Priority for corpus documentation and creation for neural model development.
Fuliru (Kifuliru)
ISO 639-3: flr
Speakers
250K-400K
Region
Uvira, Fizi
Language with existing community-led preservation initiatives. Structured vocal data remains insufficient for robust neural model performance.
Tembo (Kitembo)
ISO 639-3: tbt
Speakers
500K+
Region
Kalehe, Masisi, Rutshuru
Upland Bantu language facing displacement pressure and minimal contemporary digital documentation. Requires prioritized preservation strategy.
Havu (Kihavu)
ISO 639-3: hav
Speakers
1.1M
Region
Idjwi Island, Kalehe
Strategically important language with geographic concentration and vibrant oral heritage. High-priority for cultural preservation initiatives.
Nyanga (Kinyanga)
ISO 639-3: nyj
Speakers
150K+
Region
Walikale
Minority forest language with increasing vulnerability to Swahili dominance. Critical focus for linguistic preservation and digital inclusion.
Rega (Kirega/Lega)
ISO 639-3: leg
Speakers
250K-450K
Region
Mwenga, Shabunda, Pangi
Forest-region language with heightened erosion risk. Specialized preservation strategies essential for digital documentation and archival.
Swahili Congolais (Kingwana)
ISO 639-3: swc
Speakers
11M
Region
Regional Lingua Franca
Major regional lingua franca essential for developing robust multilingual architectures and advanced cross-linguistic transfer strategies.