
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
AIMAR-2025
Conferences Published ↓
ICCE (2025)
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 4
July-August 2025
Indexing Partners



















A Comparative Study of Different Data Pre-processing Methods for Machine Learning
Author(s) | Ms. Ramya S, Dr. B Kumaraswamy, Mr. Vishal Agarwal, Dr. Anushka Gkl Jain |
---|---|
Country | India |
Abstract | In machine learning, the quality of data often determines the success of predictive models, and data pre processing is a crucial step in ensuring reliability, accuracy, and generalizability. This study presents a comparative evaluation of common pre processing methods, including missing value imputation, feature scaling and normalization, categorical encoding, outlier detection, and feature engineering techniques. Using three benchmark datasets across classification, regression, and multiclass tasks, we applied these methods in combination with machine learning models such as logistic regression, decision tree, support vector machine (SVM), random forest, and gradient boosted trees. Results show that imputation methods like iterative multivariate imputation improve predictive performance in datasets with moderate to high missingness, while scaling significantly enhances linear and gradient based models but remains unnecessary for tree based models. Target encoding proves most effective for high cardinality categorical features, though it requires careful leakage prevention. Outlier handling benefits linear models but has limited impact on tree based algorithms. Feature engineering techniques such as polynomial expansion and principal component analysis (PCA) provide gains in specific contexts but involve trade offs in interpretability and runtime. Overall, the study underscores the importance of tailoring pre processing strategies to both data characteristics and model families, offering practical guidelines for optimizing machine learning pipelines. |
Keywords | Data Pre processing; Machine Learning; Missing Value Imputation; Feature Scaling; Categorical Encoding; Outlier Detection; Feature Engineering; Principal Component Analysis (PCA); Model Performance; Comparative Study |
Field | Computer |
Published In | Volume 7, Issue 4, July-August 2025 |
Published On | 2025-08-04 |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
