Skip to content

How to Get Your Friends to Agree: The Raft Way

Posted on:March 2, 2025 (8 min read)

I’m a sucker for explaining complex topics using simpler analogies and have always wanted to try my hand at writing something similar to your average post on r/explainlikeimfive. Back in college, I really enjoyed reading the Raft consensus algorithm paper as part of a distributed systems class so here’s my attempt at simplifying it. While this entire post might not be suitable for every 5 year old out there, I hope the core concepts translate well.

Why should we agree on the same thing?

When you use your favorite website or app, you’re probably interacting with a collection of independent computers (let’s call them nodes) spread across different locations but work together to appear as a single system to the user. So, what’s the catch? Well, in a system like this each node has its own copy of the data and its own sense of time since there’s no single shared clock to keep them perfectly in sync. The only way for them to stay in sync is by communicating over a network that can be unreliable at times. The need for consensus stems from the fact that these nodes can fail too. Failures can happen in many ways - hardware malfunctions, networks introducing delays or losing messages entirely, or some unexpected software bug causing the server to temporarily go down. Despite all this chaos, the system still needs to function as one single reliable unit. If the system isn’t prepared for this, it could potentially lose data permanently. To prevent this, systems replicate the data by keeping multiple copies of the data across different nodes. This way, if one node goes down, another has a backup. But just having copies isn’t enough. These copies must stay consistent, meaning all nodes must agree on the same updates in the same order. If different nodes apply updates in different orders, they could end up with conflicting versions of the data leading to inconsistencies. Consensus algorithms make sure that this system of nodes continue to agree on a shared state and operate as if nothing happened when one or more of them fail.

The Distributed Piggy Bank

Now that we’ve established why consensus is important, allow me to spin up a completely hypothetical scenario.

Let’s say that you and your friends from the neighborhood are running a system to keep track of each other’s pocket money balances. To do this each of you maintain a notebook where you record each transaction (deposits and withdrawals). Additionally, each person also has a balance sheet that keeps track of the final balance for everyone after all transactions are applied. The goal is to have synchronized notebooks and balance sheets. Now imagine a situation where one of you deposit your weekly allowance and the update isn’t reflected in everyone’s notebooks in time and another person makes a conflicting withdrawal. Without a reliable way to track these changes, some of you might end up with incorrect balances causing confusion. To avoid such situations, you decide to use Raft to help you with consensus.

Here are some basic rules to follow:

The Election

Everyone starts off as a follower and wait for the leader to give instructions. The leader is supposed to send out regular heartbeats, which are like a quick “I’m still here” signals to reassure everyone that they are still active. Each person has a limit (election timeout) for how long they can go without hearing from the leader. If the leader stops sending heartbeats (maybe they’re busy with school), the follower(s) whose election timeout runs out steps up and becomes a candidate, starts the next term and requests for votes from everyone else in the group.

In order to become the new leader, the candidate must convince a majority of the group members that they have the most updated records and gather votes from them. A candidate would then send out a Request Vote message with the most recent term up to which they have entries in their notebook for along with their latest page number. The latest page number (log index) indicates the position of the last entry in the notebook. This helps the followers determine which candidate has the most up to date data and ensures that they vote for the candidate who won’t cause inconsistencies if elected. When the follower compares the term and page number with their own, they might think “Wait a second, your notebook entries only go up to a term from last month and I’ve got everything upto today’s lunch money (which is part of a different term)”. In that case the follower wouldn’t vote for the candidate and only does so if the candidate’s request reflects the records from latest term they currently have or one that’s more recent. If a situation where multiple candidates send out vote requests and receive an equal number of votes with no clear majority, a situation called a split vote occurs. In this case, no one can claim leadership and a new term is started and the election is repeated. Since each person’s election timeouts is random, the chances of a second split vote are low.

Terrible meme #2

As usual, here’s a mini simulator to help you visualize the election process in a Raft cluster (or in our case, you and your group of friends managing the distributed piggy bank)

  • Click the Play button to start the simulation
  • Click on any member to pause/resume their activity.
  • Colors are shown based on the role: blue (Follower), orange (Candidate), green (Leader)
  • The red bar shows how long a member waits before trying to become the new leader (election timeout)
  • Use the Select Group Size buttons to change between 3 or 5 members in the group
  • Colored dots represent messages (see legend below)
Select Group Size
YouFollowerTerm: 1Friend 1FollowerTerm: 1Friend 2FollowerTerm: 1
Message Types
AppendEntries (Heartbeat)
RequestVote
VoteResponse

Keeping everyone in sync (Log replication)

Life’s good and you’ve successfully become the leader of your little piggy bank group. Now the challenge lies in making sure the entire group is on the same page (no pun intended). You’ve received a request to record a new transaction and writing it only in your notebook is not enough since if you get sick tomorrow and someone else becomes the leader they might not know about this transaction at all. So, you send out an Append Entries message with the following information to your followers:

term: your current term number 
previous log index: the most recent entry's page number in your notebook 
previous log term: the term when the most recent entry was recorded 
entries: new transaction(s) to note down (empty if message is used as a heartbeat) 
...

Once the followers receive your message, one of these could happen

Once you get a response that the new entries have been written down from a majority of followers, its considered to be “committed”. Now, you can respond back to the parent who requested the change that their request was succesfully handled and all of you update your balance sheets to reflect the new transaction.

Wrap Up

Raft is not just an abstract concept from academic research. It’s actively used in popular projects like etcd (used for storing state and config in Kubernetes clusters) or Kafka’s newer KRaft mode, which replaces ZooKeeper with a built-in Raft-based consensus system for managing metadata. That said, there are a lot more edge case scenarios during replication along with how safety is guaranteed during cluster membership changes covered in the original Raft paper which I highly recommend reading. For a simple and interactive guide, also check out Secret Lives of Data.

Happy learning!

0 views