International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 8, Issue 3 (May-June 2026) Submit your research before last 3 days of June to publish your research paper in the issue of May-June.

Modeling Cholera Outbreaks In Kenya Using Machine Learning Algorithms

Author(s) Paul Njuguna Theuri
Country Kenya
Abstract Kenya continues to grapple with the cholera outbreak due to environmental, socio-economic and behavioral factors. The traditional surveillance systems are largely used for reactive surveillance, which limits the capacity to foresee outbreaks. This study examines how machine learning can be used to forecast cholera outbreaks based on environmental, epidemiological and socio-economic inputs. Variables such as weekly cholera case counts, lagged trends in cholera case counts, and rainfall, temperature and sanitation variables were used to train a Random Forest classification model. Data preprocessing included data cleaning, encoding, and feature engineering, such as creating lag variables to represent temporal features. The data set was divided into a training and test set (80:20). The Synthetic Minority Oversampling Technique (SMOTE) was used to solve the class imbalance problem between the outbreak and non-outbreak events. The results indicated that there were temporal patterns of cholera transmission, with the most important factors were the presence of temporal features especially lagged case counts. The overall accuracy of the baseline model was good, but sensitivity to identify outbreak cases is low. The model showed good outcomes in detecting outbreaks, but with a slight decrease in overall accuracy once SMOTE was used, illustrating a trade-off between precision and recall in imbalanced data sets. Other environmental factors (temperature and rainfall) were also factors in prediction performance. The authors find machine learning models have the potential to predict cholera outbreaks, but the data is limited and has class imbalance and features that are not well represented. The next round of research needs to link multiple data and sophisticated scientific models in real time to improve prediction capabilities and provide the tools for early warnings.
Field Computer > Artificial Intelligence / Simulation / Virtual Reality
Published In Volume 8, Issue 3, May-June 2026
Published On 2026-05-18
DOI https://doi.org/10.36948/ijfmr.2026.v08i03.78379

Share this