Skip to main content

Generalized Consensus: Consistent Reads

· 5 min read
Sugu Sougoumarane
Creator of Multigres, Vitess

This section discusses various approaches to achieving consistent reads in a consensus protocol.

Most official publications of consensus protocols have paid lip service to the issue of consistent reads. Implementors of these protocols have each developed their own methods for achieving consistency, and they all involve trade-offs.

The reason for this avoidance is that no solution is both perfect and performant. These properties determine the trade-offs.

  • Consensus is a replicated system. There is no guarantee that a follower has the latest data.
  • The current leader is guaranteed to have the latest data, but there is no guarantee that you know who the current leader is.

One important factor to keep in mind is that leader terms are expected to last a long time in the order of days. A planned leader change typically happens once a week. Unplanned leader changes might be even less common. This is a key factor to consider when choosing your solution.

At this point, we have the opportunity to reintroduce observers. These nodes are not part of the cohort, but they receive completed requests and can be used for reads. We will refer to the combination of followers and observers as replicas.

Here are a few approaches:

Leader lease

The lease approach involves giving a leader a lease once appointed. During this period, the system will not revoke its leadership. The leader can renew its lease either by completing requests or through heartbeats. If a leader cannot renew its lease, it will stop serving reads before the lease expires.

There are a few disadvantages:

  • We trust the clock.
  • If the leader becomes unreachable, you have to wait till the lease expires before appointing another leader.
  • We lose the opportunity to distribute reads across the replicas.

For reference, Spanner supposedly uses this approach with a lease period of ten seconds.

Leader heartbeat read

In this approach, the leader sends out heartbeats for every read. If a valid quorum of followers respond with the same term number, then it knows the leadership has not been revoked yet. It can respond to the read request.

Downsides:

  • The cost of a read is as high as the network cost of completing requests.
  • We lose the opportunity to distribute reads across the replicas.

Log index based read

This method works for a single client. For each successful write request, the leader returns the log position of the request. The client can request a read from any replica by requiring it to wait until it reaches that position before serving the read.

An advantage of this approach is that reads can be load-balanced across multiple replicas.

Downsides:

  • Replica lag or network partitions can impact read performance.
  • Only the client that wrote the last request knows the latest position of its request.

Replica heartbeat read

This is a combination of the leader heartbeat read and the log index-based read. In this case, the read is sent to a replica, which sends out a heartbeat to the current leader and its quorum nodes. For its response, it must receive matching term numbers as well as the latest apply index from the leader. The replica waits until its own apply index reaches that of the leader, and then it can serve the read.

This allows reads to be load-balanced across multiple replicas. Also, this read works even if the client did not perform the last write.

Downsides:

  • The cost of a read is as high as the network cost of completing requests.
  • Replica lag or network partitions can impact read performance.

Quorum read

This approach was pointed out by @huesflash. It was presented at the hostrage '19 conference by Aleksey Charapko, Ailidani Ailijiang, and Murat Demirbas. In this approach, the client multicasts the quorum-read request to a majority quorum of replicas and waits for their replies. At this point, it knows the highest accepted slot. It then waits for any of the replicas to apply that slot. Following this, the client can read from that replica.

This approach can be mapped to the generalized consensus model by multicasting the reads to enough replicas to ensure that they cover the revocation requirements of all possible primaries. After that, the approach described above can be used to fetch a consistent read.

We have not seen this approach used in practice. Theoretically, we believe it has the following trade-offs:

Upsides

  • It has the same advantages of the log index based read.
  • Unlike the log index based read, it can be used by any client.

Downsides

  • Network packet amplification: The number of packets sent is multiplicative of the number of replicas that need to be reached. Some cloud providers have a limit on packets sent per second, and you may hit this limit sooner than other methods.
  • There is a possibility that the highest accepted slot may later be abandoned due to a failover. If so, the client will need to detect this and take corrective action.

Eventually consistent reads

If the application can tolerate stale reads, those reads can be directed to any replica. There are many use cases where a certain level of staleness is acceptable. Based on this, we recommend setting a staleness tolerance and having the system reject reads that exceed this limit.

In the next part, we'll cover topics we deferred, like health checks, term numbers, etc.

If you have comments or questions, please start a discussion on the Multigres GitHub repository.