
International Journal For Multidisciplinary Research
E-ISSN: 2582-2160
•
Impact Factor: 9.24
A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal
Home
Research Paper
Submit Research Paper
Publication Guidelines
Publication Charges
Upload Documents
Track Status / Pay Fees / Download Publication Certi.
Editors & Reviewers
View All
Join as a Reviewer
Get Membership Certificate
Current Issue
Publication Archive
Conference
Publishing Conf. with IJFMR
Upcoming Conference(s) ↓
WSMCDD-2025
GSMCDD-2025
Conferences Published ↓
ICCE (2025)
RBS:RH-COVID-19 (2023)
ICMRS'23
PIPRDA-2023
Contact Us
Plagiarism is checked by the leading plagiarism checker
Call for Paper
Volume 7 Issue 3
May-June 2025
Indexing Partners



















Machine Learning-Based Self-Healing Systems: Automated Failure Detection and Recovery in Microservices
Author(s) | Ravikanth Konda |
---|---|
Country | United States |
Abstract | The exponential adoption of microservices architecture has revolutionized software development, enabling scalable, modular, and resilient systems. However, the increased complexity of distributed systems also introduces challenges related to fault detection, isolation, and recovery. Traditional methods of fault management are increasingly inadequate due to the dynamic and decentralized nature of microservices. This paper explores a machine learning-based approach to creating self-healing systems capable of automated failure detection and recovery in microservices environments. We discuss the state-of-the-art methodologies, including anomaly detection, predictive analytics, and reinforcement learning, and propose a novel architecture integrating these techniques to enhance system robustness. Our proposed methodology leverages log and metric data for real-time anomaly detection, root cause analysis, and proactive recovery mechanisms. This system integrates both supervised and unsupervised learning algorithms to achieve a continuous learning loop that improves accuracy over time. Moreover, reinforcement learning is applied for policy-based recovery decisions that adapt to evolving failure patterns. These capabilities are essential in modern systems where manual intervention is neither scalable nor reliable for maintaining service quality. The architecture and algorithms are validated through a series of controlled experiments on a simulated Kubernetes-based microservices platform. The experiments demonstrate significant improvements in fault detection precision, diagnostic speed, and system recovery time compared to traditional rule-based systems. In addition, the paper explores the implications of these findings in operational environments, addressing potential overhead, integration challenges, and scalability concerns. Overall, the results indicate that machine learning can serve as a foundational technology in enabling autonomous, resilient microservices. |
Field | Engineering |
Published In | Volume 3, Issue 5, September-October 2021 |
Published On | 2021-10-09 |
DOI | https://doi.org/10.36948/ijfmr.2021.v03i05.43946 |
Short DOI | https://doi.org/g9hm27 |
Share this

E-ISSN 2582-2160

CrossRef DOI is assigned to each research paper published in our journal.
IJFMR DOI prefix is
10.36948/ijfmr
Downloads
All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.
