logo

| Events and Workshops

June 2021

KenCorpus Inaugural Workshop

A Research Team of Investigators, including the Principal Investigator, from Maseno University’s Departments of Computer Science and Kiswahili and Other African Languages was awarded Research Funds from the LACUNA FUND for a Project titled, KenCorpus: Kenyan Languages Corpus. The Project is in collaboration with Investigators from University of Nairobi and Africa Nazarene University. The aim of this Research Project will be to among others; collect natural-occuring language texts in Kiswahili, Dholuo and Luhyia, to Collect speech data for Kiswahili, Dholuo and Luhyia Languages, to translate the DHoluo and Luhyia texts into Kiswahili for Machine Translation, to collect questions and answer pairs for the Kiswahili texts for Machine Comprehension and to annotate the Kiswahili, Dholuo and Luhyia texts with Part of Speech tags.

November 2021

KenCorpus Capacity Building Workshop

KenCorpus is a Kenyan Languages Corpus its capacity building workshop at the Kisumu Hotel. The workshop has seen languages experts across Kenyan Universities come together to tackle language challenges that are experimented on daily basis across dialects.

March 2022

KenCorpus Closing Workshop

Kenya Language Corpus Project came to a close with Collected Data containing 1,152 speech data, 4,442 texts, and 176h 29min 46sec audio; Translation of 1500 Dholuo-Swahili Pairs and 10,900 Luhya Pairs; Kiswahili transcriptions of Audio 27h 31min 50sec by 7 males, 19 females with 31,759 Dictionary words; Parts of Speech Tagged Data containing 143,000 words total i.e. 50,000 Dholuo, 27,900 Marachi, 34,300 Lugooli and 30,900 Marachi; and Question and Answer 7526 pairs from 1445 documents with Five pairs per text. Android application for data collection, webapp for data collection, various algorithms for Question and Answer testing and a Java based Kiswahili Speech to TExt API were also developed.

X