Reaching Consensus for Async Distributed Systems : A Guide to Harmonized Data Decision-Making
Keywords:
Consensus, Distributed Systems, Fault Tolerance, Paxos, Raft, Blockchain, Consistency, Byzantine Fault Tolerance (BFT).Abstract
Consensus algorithms must be highly reliable in distributed systems due to their vast use in asynchronous environments for fault tolerance and consistent data consistency. These systems require that multiple nodes, typically spread across large areas, replicate a common view or value, even in the presence of hardware or network failures or a condition known as Byzantine failure. This paper discusses consensus mechanisms essential in cloud environments, blockchains, and real-time data management. This article reviews consensus algorithms such as Paxos, Raft, and Byzantine Fault Tolerance and discusses their working model, advantages, and challenges. Paxos is safe under crash failures but may prove tough to implement. Raft also makes leadership and log replication easy while making reliability practical in real-world applications through BFT, preventing the influence of antagonistic actors in secure areas. Issues that might hinder the consensus process include network ruling, leader elections, and security threats. A comprehensive analysis of technological consensus approaches, including quorum-based decision-making, conflict resolution, and observability practices, is provided. The paper discusses the various developments of consensus to establish the importance of distributed applications such as distributed databases, blockchain systems, and microservices orchestration for integrity and availability. Growing trends like HCM, Layer 2 solutions like Rollups and State Channels, and serverless infrastructure imply the continued evolution of the space. This guide is for engineers, architects, and researchers interested in consensus to build systems capable of handling the operational requirements that characterize distributed systems.
References
- Almeida, J., Rufino, J., Alam, M., & Ferreira, J. (2019). A survey on fault tolerance techniques for wireless vehicular networks. Electronics, 8(11), 1358.
- Barnickel, J. (2013). Authentication and identity privacy in the wireless domain (Doctoral dissertation, Aachen, Techn. Hochsch., Diss., 2013).
- Beard, J. C., Li, P., & Chamberlain, R. D. (2015, February). RaftLib: a C++ template library for high performance stream parallel processing. In Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores (pp. 96-105).
- Belotti, M., Božić, N., Pujolle, G., & Secci, S. (2019). A vademecum on blockchain technologies: When, which, and how. IEEE Communications Surveys & Tutorials, 21(4), 3796-3838.
- Bernabe, J. B., Canovas, J. L., Hernandez-Ramos, J. L., Moreno, R. T., & Skarmeta, A. (2019). Privacy-preserving solutions for blockchain: Review and challenges. Ieee Access, 7, 164908-164940.
- Bernstein, P. A., & Newcomer, E. (2009). Principles of transaction processing (2nd ed.). Morgan Kaufmann.
- Birman, K. P. (1993). The process group approach to reliable distributed computing. Communications of the ACM, 36(12), 37-53.
- Bracha, G., & Toueg, S. (1985). Asynchronous consensus and broadcast protocols. Journal of Algorithms, 4(4), 557–573.
- Brewer, E. (2012). CAP twelve years later: How the "rules" have changed. Computer, 45(2), 23–29.
- Castro, M., & Liskov, B. (1999). Practical Byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (pp. 173-186).
- Castro, M., & Liskov, B. (1999). Practical Byzantine Fault Tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99).
- Chandra, T. D., Griesemer, R., & Redstone, J. (2007). Paxos made live: An engineering perspective. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing (pp. 398–407).
- Chandy, K. M., & Lamport, L. (1985). Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems (TOCS), 3(1), 63-75.
- Copeland, C., & Zhong, H. (2016). Tangaroa: a byzantine fault tolerant raft. Stanford University.
- Corbett, J. C., Dean, J., Epstein, M., Fikes, A., Frost, C., Furman, J., & Woodford, D. (2012). Spanner: Google’s globally-distributed database. In OSDI (Vol. 12, pp. 261-264).
- Correia Júnior, A. T. (2010). Practical database replication.
- Cristian, F. (1991). Synchronous and asynchronous recovery primitives. Proceedings of the Twenty-First IEEE International Symposium on Fault-Tolerant Computing, 82–89.
- Dragoni, N., Giallorenzo, S., Lafuente, A. L., Mazzara, M., Montesi, F., Mustafin, R., & Safina, L. (2017). Microservices: yesterday, today, and tomorrow. In Present and Ulterior Software Engineering (pp. 195-216). Springer, Cham.
- Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), 374-382.
- Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of Distributed Consensus with One Faulty Process. Journal of the ACM (JACM), 32(2), 374-382.
- Gifford, D. K. (1979). Weighted voting for replicated data. In Proceedings of the seventh ACM symposium on Operating systems principles (pp. 150-162).
- Gilbert, S., & Lynch, N. (2002). Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News, 33(2), 51-59.
- Gill, A. (2018). Developing a real-time electronic funds transfer system for credit unions. International Journal of Advanced Research in Engineering and Technology (IJARET), 9(1), 162–184. [Primary Source]
- Gray, J., & Lamport, L. (2006). Consensus on transaction commit. ACM Transactions on Database Systems, 31(1), 133–160.
- Kemme, B., Schiper, A., Ramalingam, G., & Shapiro, M. (2014). Dagstuhl seminar review: Consistency in distributed systems. ACM SIGACT News, 45(1), 67-89.
- King, V., Saia, J., Sanwalani, V., & Vitta, E. (2011). Scalable leader election. In Distributed Computing (pp. 490–502). Springer.
- Kraft, D. (2016). Difficulty control for blockchain-based consensus systems. Peer-to-peer Networking and Applications, 9, 397-413.
- Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: a distributed messaging system for log processing. In Proceedings of the NetDB (pp. 1-7).
- Kumar, A. (2019). The convergence of predictive analytics in driving business intelligence and enhancing DevOps efficiency. International Journal of Computational Engineering and Management, 6(6), 118-142. https://ijcem.in/wp-content/uploads/THE-CONVERGENCE-OF-PREDICTIVE-ANALYTICS-IN-DRIVING-BUSINESS-INTELLIGENCE-AND-ENHANCING-DEVOPS-EFFICIENCY.pdf
- Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2), 133–169.
- Lamport, L. (1998). The Part-Time Parliament. ACM Transactions on Computer Systems, 16(2), 133-169.
- Lynch, N. (1996). Distributed Algorithms. Morgan Kaufmann.
- Merkle, R. (1988). A Digital Signature Based on a Conventional Encryption Function. In C. Pomerance (Ed.), Advances in Cryptology — CRYPTO’ 87 (pp. 369-378). Springer.
- Misra, J., & Chandy, K. M. (1982). Distributed simulation: A case study in design and verification of distributed programs. IEEE Transactions on Software Engineering, SE-5(5), 440–452.
- Nakamoto, S. (2008). Bitcoin: A peer-to-peer electronic cash system. Retrieved from https://bitcoin.org/bitcoin.pdf
- Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System.
- Nyati, S. (2018). Revolutionizing LTL Carrier Operations: A Comprehensive Analysis of an Algorithm-Driven Pickup and Delivery Dispatching Solution. International Journal of Science and Research (IJSR), 7(2), 1659–1666. https://www.ijsr.net/getabstract.php?paperid=SR24203183637
- Nyati, S. (2018). Transforming Telematics in Fleet Management: Innovations in Asset Tracking, Efficiency, and Communication. International Journal of Science and Research (IJSR), 7(10), 1804-1810. https://www.ijsr.net/getabstract.php?paperid=SR24203184230
- Oki, B. M., & Liskov, B. (1988). Viewstamped replication: A new primary copy method to support highly-available distributed systems. Proceedings of the Seventh Annual ACM Symposium on Principles of Distributed Computing, 8–17.
- Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm (Raft). In USENIX Annual Technical Conference (pp. 305-319).
- Pease, M., Shostak, R., & Lamport, L. (1980). Reaching agreement in the presence of faults. Journal of the ACM, 27(2), 228–234.
- Schneider, F. B. (1990). Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys, 22(4), 299–319.
- Sheehy, J. (2015). There is No Now: Problems with simultaneity in distributed systems. Queue, 13(3), 20-27.
- Tanenbaum, A. S., & van Steen, M. (2007). Distributed systems: principles and paradigms. Prentice Hall.
- Vukolić, M. (2012). Latency-efficient Quorum Systems. In Quorum Systems: with Applications to Storage and Consensus (pp. 81-108). Cham: Springer International Publishing.
- Wang, X., Sun, N., & Wickersham, K. (2017). Turning math remediation into" homeroom:" Contextualization as a motivational environment for community college students in remedial math. The Review of Higher Education, 40(3), 427-464.
- Yin, M., Malkhi, D., Reiter, M. K., Gueta, G. G., & Abraham, I. (2018). HotStuff: BFT consensus in the lens of blockchain. arXiv preprint arXiv:1803.05069.
- Zhang, I., Sharma, N. K., Szekeres, A., Krishnamurthy, A., & Ports, D. R. (2018). Building consistent transactions with inconsistent replication. ACM Transactions on Computer Systems (TOCS), 35(4), 1-37.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRCSEIT

This work is licensed under a Creative Commons Attribution 4.0 International License.