A Comparative Study of Different Data Pre-processing Methods for Machine Learning

Ramya S; B Kumaraswamy; Vishal Agarwal; Anushka Gkl Jain

A Comparative Study of Different Data Pre-processing Methods for Machine Learning

Author(s)	Ms. Ramya S, Dr. B Kumaraswamy, Mr. Vishal Agarwal, Dr. Anushka Gkl Jain
Country	India
Abstract	In machine learning, the quality of data often determines the success of predictive models, and data pre processing is a crucial step in ensuring reliability, accuracy, and generalizability. This study presents a comparative evaluation of common pre processing methods, including missing value imputation, feature scaling and normalization, categorical encoding, outlier detection, and feature engineering techniques. Using three benchmark datasets across classification, regression, and multiclass tasks, we applied these methods in combination with machine learning models such as logistic regression, decision tree, support vector machine (SVM), random forest, and gradient boosted trees. Results show that imputation methods like iterative multivariate imputation improve predictive performance in datasets with moderate to high missingness, while scaling significantly enhances linear and gradient based models but remains unnecessary for tree based models. Target encoding proves most effective for high cardinality categorical features, though it requires careful leakage prevention. Outlier handling benefits linear models but has limited impact on tree based algorithms. Feature engineering techniques such as polynomial expansion and principal component analysis (PCA) provide gains in specific contexts but involve trade offs in interpretability and runtime. Overall, the study underscores the importance of tailoring pre processing strategies to both data characteristics and model families, offering practical guidelines for optimizing machine learning pipelines.
Keywords	Data Pre processing; Machine Learning; Missing Value Imputation; Feature Scaling; Categorical Encoding; Outlier Detection; Feature Engineering; Principal Component Analysis (PCA); Model Performance; Comparative Study
Field	Computer
Published In	Volume 7, Issue 4, July-August 2025
Published On	2025-08-04

View / Download PDF File

E-ISSN 2582-2160

doi

CrossRef DOI is assigned to each research paper published in our journal.

IJFMR DOI prefix is
10.36948/ijfmr

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 7 Isu 4 Cover Page Vol 7 Isu 3 Cover Page Vol 7 Isu 2

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About IJFMR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijfmr.com

International Journal For Multidisciplinary Research

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

A Comparative Study of Different Data Pre-processing Methods for Machine Learning

Share this