I’m a sucker for explaining complex topics using simpler analogies and have always wanted to try my hand at writing something similar to your average post on r/explainlikeimfive
. Back in college, I really enjoyed reading the Raft consensus algorithm paper as part of a distributed systems class so here’s my attempt at simplifying it. While this entire post might not be suitable for every 5 year old out there, I hope the core concepts translate well.
Why should we agree on the same thing?
When you use your favorite website or app, you’re probably interacting with a collection of independent computers (let’s call them nodes) spread across different locations but work together to appear as a single system to the user. So, what’s the catch? Well, in a system like this each node has its own copy of the data and its own sense of time since there’s no single shared clock to keep them perfectly in sync. The only way for them to stay in sync is by communicating over a network that can be unreliable at times. The need for consensus stems from the fact that these nodes can fail too. Failures can happen in many ways - hardware malfunctions, networks introducing delays or losing messages entirely, or some unexpected software bug causing the server to temporarily go down. Despite all this chaos, the system still needs to function as one single reliable unit. If the system isn’t prepared for this, it could potentially lose data permanently. To prevent this, systems replicate the data by keeping multiple copies of the data across different nodes. This way, if one node goes down, another has a backup. But just having copies isn’t enough. These copies must stay consistent, meaning all nodes must agree on the same updates in the same order. If different nodes apply updates in different orders, they could end up with conflicting versions of the data leading to inconsistencies. Consensus algorithms make sure that this system of nodes continue to agree on a shared state and operate as if nothing happened when one or more of them fail.
The Distributed Piggy Bank
Now that we’ve established why consensus is important, allow me to spin up a completely hypothetical scenario.
Let’s say that you and your friends from the neighborhood are running a system to keep track of each other’s pocket money balances. To do this each of you maintain a notebook where you record each transaction (deposits and withdrawals). Additionally, each person also has a balance sheet that keeps track of the final balance for everyone after all transactions are applied. The goal is to have synchronized notebooks and balance sheets. Now imagine a situation where one of you deposit your weekly allowance and the update isn’t reflected in everyone’s notebooks in time and another person makes a conflicting withdrawal. Without a reliable way to track these changes, some of you might end up with incorrect balances causing confusion. To avoid such situations, you decide to use Raft to help you with consensus.
Here are some basic rules to follow:
- Roles : At any given time, each member of the group falls into one of three roles (Leader, Candidate or Follower).
- Leader - The person in charge of writing transactions and making sure everyone copies them.
- Follower - A person who just copies whatever the leader writes.
- Candidate - An intermediate role used when the leader disappears and someone steps up to take their place
- When a parent wants to deposit or withdraw pocket money, the requests to record transactions is always made to a single person (the Leader). If the request is sent to a follower, they redirect it to the leader.
- Terms : Time is broken up into “terms” which is just an increasing number denoting a period of time during which a particular leader was in control. Each term begins with an election. Whenever the leader is unreachable/stops responding, a new term begins and the others go through a process to elect a new leader.
- Everyone communicates using a fancy walkie-talkie that supports two kinds of messages:
- Append Entries: The leader sends these messages to all followers, telling them what to write in their notebooks.
- Request Vote: When someone wants to become the leader, they send this message to individual people, asking for their vote.
The Election
Everyone starts off as a follower and wait for the leader to give instructions. The leader is supposed to send out regular heartbeats, which are like a quick “I’m still here” signals to reassure everyone that they are still active. Each person has a limit (election timeout) for how long they can go without hearing from the leader. If the leader stops sending heartbeats (maybe they’re busy with school), the follower(s) whose election timeout runs out steps up and becomes a candidate, starts the next term and requests for votes from everyone else in the group.
In order to become the new leader, the candidate must convince a majority of the group members that they have the most updated records and gather votes from them. A candidate would then send out a Request Vote message with the most recent term up to which they have entries in their notebook for along with their latest page number. The latest page number (log index) indicates the position of the last entry in the notebook. This helps the followers determine which candidate has the most up to date data and ensures that they vote for the candidate who won’t cause inconsistencies if elected. When the follower compares the term and page number with their own, they might think “Wait a second, your notebook entries only go up to a term from last month and I’ve got everything upto today’s lunch money (which is part of a different term)”. In that case the follower wouldn’t vote for the candidate and only does so if the candidate’s request reflects the records from latest term they currently have or one that’s more recent. If a situation where multiple candidates send out vote requests and receive an equal number of votes with no clear majority, a situation called a split vote occurs. In this case, no one can claim leadership and a new term is started and the election is repeated. Since each person’s election timeouts is random, the chances of a second split vote are low.
As usual, here’s a mini simulator to help you visualize the election process in a Raft cluster (or in our case, you and your group of friends managing the distributed piggy bank)
- Click the Play button to start the simulation
- Click on any member to pause/resume their activity.
- Colors are shown based on the role: blue (Follower), orange (Candidate), green (Leader)
- The red bar shows how long a member waits before trying to become the new leader (election timeout)
- Use the Select Group Size buttons to change between 3 or 5 members in the group
- Colored dots represent messages (see legend below)
Keeping everyone in sync (Log replication)
Life’s good and you’ve successfully become the leader of your little piggy bank group. Now the challenge lies in making sure the entire group is on the same page (no pun intended). You’ve received a request to record a new transaction and writing it only in your notebook is not enough since if you get sick tomorrow and someone else becomes the leader they might not know about this transaction at all. So, you send out an Append Entries message with the following information to your followers:
term: your current term number
previous log index: the most recent entry's page number in your notebook
previous log term: the term when the most recent entry was recorded
entries: new transaction(s) to note down (empty if message is used as a heartbeat)
...
Once the followers receive your message, one of these could happen
- Follower is up to date : If the follower is on the same term as you and their latest page number matches with the one you sent, they’ll write down the new transaction(s) in their notebook and respond back confirming the update.
- Followers are behind or outdated : If the follower is lagging behind (due to them being AFK on an earlier term or is missing entries), you’ll need to find the most recent entry where both of you agree and force them to copy your entries following that, overwriting any conflicting or missing entries they might have.
Once you get a response that the new entries have been written down from a majority of followers, its considered to be “committed”. Now, you can respond back to the parent who requested the change that their request was succesfully handled and all of you update your balance sheets to reflect the new transaction.
Wrap Up
Raft is not just an abstract concept from academic research. It’s actively used in popular projects like etcd (used for storing state and config in Kubernetes clusters) or Kafka’s newer KRaft mode, which replaces ZooKeeper with a built-in Raft-based consensus system for managing metadata. That said, there are a lot more edge case scenarios during replication along with how safety is guaranteed during cluster membership changes covered in the original Raft paper which I highly recommend reading. For a simple and interactive guide, also check out Secret Lives of Data.
Happy learning!