Reducing Leader Recovery Time in Distributed Architectures Using Zookeeper Atomic Broadcast

Naveen Srikanth Pasupuleti

doi:10.36948/ijfmr.2023.v05i04.46802

Reducing Leader Recovery Time in Distributed Architectures Using Zookeeper Atomic Broadcast

Author(s)	Naveen Srikanth Pasupuleti
Country	United States
Abstract	Virtual Replication (VR) is a distributed system architecture that aims to provide fault tolerance and high availability by maintaining copies of data across multiple nodes. It ensures that even if one or more nodes fail, the system can continue to function by promoting one of the remaining replicas to become the new leader. This replication mechanism is essential in systems that require reliability and consistency. VR systems typically use leader-based replication, where a single node, called the leader, handles write operations and propagates those changes to the follower nodes. In case of a leader failure, a new leader is elected from the available replicas, and the system continues operation without disruption. However, despite its advantages in providing fault tolerance, VR systems often face significant challenges with leader failure recovery time. This is particularly true as the number of nodes in the system increases. In VR systems, when a leader fails, a recovery process must take place to elect a new leader from the available replicas. This process involves communication between the replicas, where they must agree on which replica should take over as the new leader. While the consensus process is designed to maintain the consistency and availability of the system, it introduces delays. One of the primary reasons for high leader failure recovery times in VR systems is the need for synchronization among the nodes. When a leader fails, the system must ensure that all replicas are up to date before electing a new leader, which can be time-consuming, especially in larger clusters. Additionally, as the number of nodes increases, the number of communication messages between replicas grows, further increasing the recovery time. The process of leader election itself involves multiple rounds of communication and coordination, adding to the delay. In systems with many nodes, this coordination overhead can become a bottleneck. Another contributing factor is the time required to validate the state of the cluster after a leader failure. In a large-scale system, there are often a significant number of follower nodes, and ensuring that they all agree on the new leader can take considerable time. This issue is exacerbated when the system is under heavy load, as the election process becomes more resource-intensive and time-consuming. The consensus process, communication overhead, and leader election mechanism contribute to the delays observed in the recovery process. As a result, optimizing leader failure recovery in VR systems is essential for ensuring system performance and minimizing downtime in large-scale deployments. This paper addresses this issue using Zookeeper Atomic Broadcast ZAB.
Published In	Volume 5, Issue 4, July-August 2023
Published On	2023-08-05
DOI	https://doi.org/10.36948/ijfmr.2023.v05i04.46802

View / Download PDF File

E-ISSN 2582-2160

doi

CrossRef DOI is assigned to each research paper published in our journal.

IJFMR DOI prefix is
10.36948/ijfmr

Downloads

Research Paper Format Copyright Permission Form and Undertaking Form Cover Page Vol 8 Isu 3 Cover Page Vol 8 Isu 2 Cover Page Vol 8 Isu 2

All research papers published on this website are licensed under Creative Commons Attribution-ShareAlike 4.0 International License, and all rights belong to their respective authors/researchers.

CC-BY-SA

About IJFMR Fees & Payment Current Issue Publication Archive	Submit Research Paper Track Submission Status Publication Guidelines Publication Ethics Peer Review & Plagiarism	Join as a Reviewer Editors & Reviewers Reviewer Referral Program Get Reviewer Membership Certi.	Website/Journal Policies Usage Policy Content Policies Privacy Policy

Contact Us		+91-9687-828-838	editor@ijfmr.com

International Journal For Multidisciplinary Research

A Widely Indexed Open Access Peer Reviewed Multidisciplinary Bi-monthly Scholarly International Journal

Reducing Leader Recovery Time in Distributed Architectures Using Zookeeper Atomic Broadcast

Share this