Join our research assistants in collecting text and audio files

Tools & Resources

From Collocation to n-grams to frequency analysis to tagging, process your text right now!

Our Community

Linguists, Researchers, Local Authorities, Developers

img img img

Kenya Language Corpus

Kenya Language Corpus, founded by Maseno University, the University of Nairobi and Africa Nazarene University early in 2021. These universities have been jointly creating a language corpus, and while using machine learning and natural language processing, are creating tomorrow's African language chatbot. Although natural language processes have undergone quite a bit of modernization and upkeep over the years, KenCorpus aims to take it a step further, and process our own African Languages on our own devices.

  • Upskill your organization
Here's a brief on the KenCorpus Project

Our Activites

Project Phases

  • Data Collection
  • Transcription & Annotation
  • Speech To Text Q&A


Access Language Resources

  • Access Corpus Data
  • Perform text processing on input texts.
  • Ask questions based on input text and receive answers.

Join Our Community

  • Researchers
  • Linguistics Analysis Team
  • Developers
  • African Language Enthuthiasts
Know your

Project Investigators

Principal Investigator
Dr. Lilian Wanzare

Lilian D.A. Wanzare Research interest is Artificial Intelligence, in particular Natural Language Processing and building text processing tools for low-resource languages. She has experience in collecting and annotating data for building NLP tools. In the project, she will be involved in data collection, management and question and answers annotation of the collected stories. She will also serve as the principal investigator of the project.

Prof. Florence N. Indede

She is an Associate Prof. of Kiswahili Studies and Chair: Department of Kiswahili and Other African Languages, Maseno University-Kenya. Publication and research interest include thematic contexts on pragmatics and discourse analysis in Kiswahili language and literary texts, research, knowledge management and quality assurance in higher education. An active participant in Dialogue for Innovative Higher Education Strategies (DIES). In the project, she will be involved in data collection, management, annotation and translation.

Dr. Edward Ombui

Dr. Edward Ombui A Senior Lecturer at the School of Science and Technology at Africa Nazarene University. Some of his previous research projects include the machine translation of Ekegusii and Swahili languages, the development of annotation tools for African languages, and text classification of code switched data generated by social media users in Kenya. His current research interest is on the application of deep learning to automatically process African languages on cyberspace. In the project, he will work on data collection, transcription and management of speech data.

Dr. Owen MCOnyango

Research interest is Application of African Languages in projects targeting African communities for language development, use of indigenous knowledge as resource, African orature/literature and specialized lexicography: e.g. for farming, health, life skills etc. In the project, he will be involved in data collection, management, annotation and translation.

Dr. Lawrence Muchemi Githiari

Senior Lecturer at the School of Computing & Informatics, University of Nairobi with experience spanning over 20 years. He is also a Resource person and a Member of the Pan-African organization, ACALAN-AU on Corpus Development Initiative. Besides teaching he is currently involved in Corpus Building for Kiswahili under ACALAN-AU. He is the chair of the board spearheading the creation of a Pan-African platform for hosting of linguistic resources such as African Languages corpora, Human Language Technology tools and publications for many cross-border Languages in Africa. In the project, he will work on data collection and management of speech data.

Mr. Baraka Wanjawa

Mr. Barack Wanjawa Research interest is Artificial Intelligence, subfields being Machine Learning, Natural Language Processing and modelling Low-resource languages. He has experience in formulating and coding tools for NLP tasks and shall be responsible for developing models that shall be used for creating the deliverables (QA, NER) of the project. In the project, he will work on data collection, data management, Part of Speech tagging and spearhead prototype development for the Machine Comprehension system.