This project annotated a total of 7,526 QA pairs based on 1,445 Swahili story texts, where at least 5 QA pairs were annotated for each of the story texts. The story texts where these stories were derived are referenced under ‘Story_ID’ and are available for download as single or bulk download on the links shown at the bottom of the page. This QA dataset is also available for download as one single CSV file on the links at the bottom of the page. Each row on that CSV file representing one Swahili story and the associated QA pairs that were annotated. The columns are on the CSV file are: ‘Story_ID’ to represent the source story text which can be referenced using that unique story_ID number. ‘Num_QA_pairs’ column indicates the number of QA pairs that were set for that particular story. The other columns relate to each QA pair and are named as ‘Q01’ – for the first question on that story_ID, ‘A01’ for the answer to Q01, ‘P_A01’ to indicate the paragraph number where the answer can be found. The paragraph number is an optional field and an ‘x’ on this column indicates that the paragraph number is not applicable. This series of Q, A and P columns are repeated for QA pairs across the story ID for the total number of QA pairs that were set (Num_QA_pairs).
To cite this dataset: