International Journal For Multidisciplinary Research

E-ISSN: 2582-2160     Impact Factor: 9.24

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Call for Paper Volume 8, Issue 3 (May-June 2026) Submit your research before last 3 days of June to publish your research paper in the issue of May-June.

Bridging Healthcare and AI: An Interpretable and Robust Framework for Early Diabetes Prediction

Author(s) Ms. Girisha Arora, Ms. Keerti Mamgain, Ms. Namami Khantwal, Ms. Manisha Sharma, Dr. Yatu Rani
Country India
Abstract The rapid increase in diabetes cases worldwide emphasises the urgent need for comprehensive diagnostic tools that can help in early detection and prompt treatment. This study presents an iterative and comprehensible machine learning framework designed to make an accurate prediction of diabetes risk using the PIMA Indian Diabetes Dataset. One of the major challenges when working with medical datasets is the presence of noisy data and missing values. To address this, we made a careful data preprocessing pipeline that maintains the biological value of the data. For example, unrealistic zero values found in important medical attributes such as glucose level and diastolic blood pressure were treated as missing data and handled using the K-Nearest Neighbours (KNN) imputation method. This approach estimates missing values based on similar patient records instead of using simple averages, which helps preserve important patterns within the dataset.

In addition to data cleaning, the process of feature scaling was applied to maintain consistency among variables, and hereby, new interaction features were created to better capture complex relationships between different health indicators. Class imbalance was a major problem for the medical cases, where diabetic cases are fewer than non-diabetic cases; synthetic oversampling techniques were used in that particular case to balance the dataset. This helps the model learn better patterns of the diabetic class and improves its ability to correctly identify patients with a higher risk of having diabetes.

Rather than using only a single algorithm for determining diabetic patients, this research uses an ensemble learning approach that combines multiple models, such as kernel-based methods and boosting techniques. The predictions from these models are refined through a probability calibration step to make the outputs more reliable as well as accurate in a clinical context. An uncertainty estimation mechanism is included to identify predictions where the model is less confident, allowing such cases to be reviewed by
medical professionals.

To make the model more trustworthy, or we can say reliable, SHAP (Shapley Additive Explanations) is used to explain how each feature contributes to the final prediction. This makes it possible to understand how factors like glucose level, BMI, or age affects the diabetes risk score for each individual patient. The model was evaluated using k-fold cross-validation to ensure robustness and reliability. Special attention was given to improve the evaluation metric, recall, so that the chances of missing actual diabetic patients are minimized.

Therefore, the conceptual model acts as an interpretable and reliable decision-support system that not only predicts diabetes risk but also provides meaningful explanations behind each prediction. Such a system can support healthcare professionals in making informed and research-based clinical decisions.
Keywords Diabetes Prediction, Machine Learning, Ensemble Learning, Data Preprocessing, Class Imbalance, SMOTE, ADASYN, KNN Imputation, Explainable AI, SHAP, K-fold Cross-Validation, Healthcare Analytics.
Field Engineering
Published In Volume 8, Issue 3, May-June 2026
Published On 2026-05-03
DOI https://doi.org/10.36948/ijfmr.2026.v08i03.76897

Share this