Agreeing Under Chaos

System Design Infrastructure

January 26, 2026

I’ve always been fascinated by how distributed systems maintain a single consistent truth despite crashes, network partitions, and message loss. That curiosity led me to distributed consensus protocols and a desire to understand both their theory and implementation.

To understand it in practice, I took on a self driven project to build the Raft consensus protocol in Go, implementing core features like elections, replication, persistence, and snapshotting, and used it to power a strongly consistent replicated KV store. I also designed an observability pipeline to monitor the system.

This blog walks through the system’s architecture,the design decisions behind it and challenges it exposed.

Introduction

Before diving into what I built, we first need to look at the problem Raft solves.

In a distributed environment where nodes can fail or disconnect at any time, agreeing on the state of the system is notoriously difficult. Raft cuts through this complexity by enforcing a single consensus timeline. Raft aligns the cluster around a shared log. As long as a majority of the nodes are healthy, the system remains available and consistent.

The protocol relies on these invariants:

System Overview

The system is a replicated state machine built on a leader based, event driven consensus core. The cluster runs with an odd number of nodes to ensure a clear majority quorum for committing entries. Each node hosts the same two components — Raft and a KV server — and nodes communicate internally over RPC.

The system runs as a cluster of multiple nodes where:

We’ll now break these down across two planes that make the system easier to reason about.

Architecture Overview

With the system’s components introduced, the next step is to see how they interact end-to-end. To simplify the mental model, I organize the architecture into two complementary planes:

Consensus Plane (Raft)

The consensus plane is the reliability layer. It stores a replicated log of operations and guarantees that all nodes agree on the same committed order.

Application Plane (KV Store)

The application plane turns Raft’s committed log into a usable service. The KV store is implemented as a deterministic state machine that ensures given the same sequence of committed commands, every replica reaches the same state. The following are the key responsibilities of the application plane:

Request Flow

  1. Client sends a KV operation to a server.
  2. If the server is leader, it proposes the operation to Raft and appends it to the log.
  3. Leader replicates the log entry to followers via AppendEntries.
  4. Once a majority acknowledges, Raft marks the entry committed.
  5. Committed entries are emitted on the ApplyMsg channel and applied by the KV state machine.
  6. The KV server responds to the client after modifying the in memory map.

Design Decisions

Let’s talk about a few of the core architectural choices behind this system and more importantly, why they were made. Each one was driven by practical needs uncovered during implementation.

Observability

The system integrates observability through a pluggable MetricsSink interface that the Raft core calls at key lifecycle points. This keeps the consensus logic clean while making it easy to export metrics to any backend. A few metrics that are instrumented:

Load Testing & Evaluation

The benchmarking of the system was done on a single server. I started each node as a separate process which communicates with Go rpc.

This section focuses on a few tests cases to illustrate the system's behavior under load.

Baseline tests

- clients=20 snapshot_size=4MB read_ratio=0.0

- clients=56 snapshot_size=4MB read_ratio=0.0

Graph showing number of operations
Figure 2:Number of client operations and latency with 20 clients
Graph showing number of operations
Figure 3:Persistence latency with 20 clients
Graph showing number of operations
Figure 4: Number of client operations and latency with 64 clients
Graph showing number of operations
Figure 5: Persitence latency with 64 clients