Document Type : Original Article
Authors
1
Computer Department,, Zabol Branch, Islamic Azad University, Zabol, Iran.
2
Computer Department, Birjand Branch, Islamic Azad University, Birjand, Iran,
Abstract
Distributed systems, as the backbone of modern information technology, provide the critical infrastructure for cloud services, the Internet of Things, digital financial networks, and large-scale computing. Despite their central role, such systems continually face challenges such as hardware and software failures, communication latency, message asynchrony, and the inherent dynamism of the execution environment. These challenges intensify the need for robust solutions that ensure fault tolerance, data consistency, and functional integrity.
Adopting a structured analytical review approach, the present study systematically examines failure models, fault-detection mechanisms, and consensus algorithms in distributed systems. The findings indicate that the design of adaptive and intelligent failure detectors plays a pivotal role in enhancing system stability and reliability. Moreover, the results show that integrating adaptive detectors with lightweight consensus algorithms such as Raft and PBFT provides an effective pathway toward achieving resilient distributed systems in dynamic environments.
In addition, the use of machine learning algorithms for intelligent fault prediction and detection, as well as the integration of blockchain technology with classical consensus mechanisms, is proposed as a set of emerging research directions aimed at improving security, efficiency, and scalability. The outcomes of this research can serve as both a theoretical and practical foundation for the design and implementation of distributed infrastructures that exhibit high levels of resilience, self-regulation, and fault tolerance while maintaining effective and intelligent performance in the face of environmental fluctuations.
Keywords
Subjects