Zyzzyva: Speculative Byzantine Fault Tolerance

Motivation & Introduction

It builds on BFT state machine replication
Replicas in Zyzzyva immediately execute request speculation without running an expensive protocol to establish the requests' final order (reduce the cost by speculation)
many BFT protocols, while system designers must choose the right technique to achieve the best performance. Zyzzyva simplifies the design space of BFT replicated services by approaching the lower bounds in almost every key metric.
Under specific conditions, Zyzzyva can approach the lower bound more rapidly. (especially all replicas are correct)

System Model

partially asynchronous, similar to PBFT.

Protocol

keywords: $3 f + 1$ replicas, 3 subprotocols.

Agreement

order requests for execution by the replicas

1, 2, 3 are associated with the received number of client after speculative execution. 4 concerns the behavior during execution. 2 and 3 are not exclusive of 4.

fast case ( $n_{rece i v e d} = 3 f + 1$ )

all modes act correctly and no timeout occur.

process:

(1) the client sends request to the primary

$< REQUEST, o, t, c >_{σ_{c}}$

symbol	meaning
o	operation
t	timestamp (ensure exactly-one semantics of execution of requests)
c	client_id

(2) the primary receives request, assigns sequence number, and forwards ordered request to replicas

$<< ORDER-REQ, v, n, h_{n}, d, ND >_{σ_{p}}, m >$

symbol	meaning
v	current view
n	sequence number
$h_{n}$	history including current request ( $h_{n} = H (h_{n - 1}, d)$ )
d	digest of current request ( $d = H (m)$ )
ND	non deterministic variables required for execution
m	current request message from client

if $t > t_{c}$ , execute the step. Otherwise, ignore.

(3) the replicas receive, speculatively execute it, and send to client.

$<< SPEC-RES, v, n, h_{n}, H (r), c, t >_{σ_{i}}, i, r, OR >$

Symbol	Meaning
r	reply
i	replica_id
OR	message of Order-request
c	client

!!! only accept and speculatively execute requests in sequence number order. If lose message or faulty primary? Fill Hole Phase

process:

① Replica i discards the order-req msg if $n \leq ma x_{n}$ ( $ma x_{n}$ is the maximum sequence number)

② if $n > ma x_{n}$ , send $< FILL-HOLE, v, ma x_{n} + 1, n, i >_{σ_{i}}$ from replica i, and start timer

③ primary receives, and send ORDER-REQ with sequence number from $ma x_{n} + 1$ to n during current view.

④ if the replica i doesn't receive, suspect the primary.

(4) the client receives $3 f + 1$ matching responses and completes the request.

it's safe, and ensure even in the view change phase, the history will not change.

two phase case ( $2 f + 1 \leq n_{rece i v e d} < 3 f + 1$ )

a non-primary replica is faulty or some timeouts occur.

=> If the network, primary or some replicas are slow and faulty.

process:

Step (1)-(3) is the same.

(4) clients commit certificate to replicas (includes cryptographic proof that $2 f + 1$ servers agree on a linearizable order for the request)

$< COMMIT, c, CC >$

a cryptographic proof that a majority of correct servers agree on the ordering of requests.

(5) Replica receives a commit message from a client containing a commit certificate and acknowledges with a local-commit message.

$< LOCAL-COMMIT, v, d, h, i, c >$

(6) Client receives Local-commit messages from 2f+1 replicas and completes to request

!!! The second phase is used to link the system actions together.

The problem is that the replicas only know their local decision, but not guarantee that the request completes in this order in the first phase.

Besides, view change requires 2f+1 replicas proposed view change message. If only one replica is correct, f is faulty, f is correct but misled. The new view may execute the wrong request.
view change case ( $n_{rece i v e d} < 2 f + 1$ )

a primary replica is faulty or more severe timeouts occur.

(1) client retransmits to all replicas (for liveness, skip primary because maybe primary is faulty)

$< REQ U EST, o, t, c >$

(2) replicas begin waiting for the primary to order the retransmitted request.

$< CONF I RM - REQ, v, m, i >$

primary retransmit ORDER-REQ msg.

(3) if a correct replica inconsistently or too slowly, start suspecting the primary is faulty.

doesn't receive from primary? send the CONFIRM-REQ from another replica j and received ORDER-REQ from another replica.

(4) a sufficient number of clients suspect? view change occurs and a new primary is elected.
Client receives responses indicating inconsistent ordering by the primary and sends a proof of misbehavior to the replicas, which initiate a view change to oust the faulty primary.

inconsistency -> Proof Of Misbehavior (POM) against the primary.

Sends a message $< POM, v, POM >$ to all replicas.

View Change

coordinate the election of a new primary when the primary is faulty or the system is running slowly

Zyzzyva avoid the third phase?

=> a replica does not abandon the current view unless it is guaranteed that every other correct replica will do the same.

Zyzzyva maintains the overall structure of the traditional. protocol, but departs in two ways:

ensure liveness. Adding a new 'I hate the primary' phase

strengthen the conditions under which a replica stops participating in the current view.

(1) a correct replica $i$ suspects the primary of view $v$ continues to participate in the view.

(2) expresses its vote of no-confidence in the primary by multicasting to all replicas the message $< I_H a t e_t h e_p r ima ry, v >_{σ_{i}}$

(3) receives $f + 1$ votes of no confidence in $v$ 's primary? commit to a view change, and multicast a view-change msg.
guarantee safety. weaken the condition (a request appears in the history included in the new-view msg)

two cases:

(1) receives $2 f + 1$ LOCAL-COMMIT msgs -> have been solved by traditional protocols.

(2) receives $3 f + 1$ matching speculative responses:

cause: no committed certification -> replicas don't know whether their executions are correct.

① correct replicas add the information included in their VIEW-CHANGE msg to all ORDER-REQ messages since the latest stable checkpoint

② correct new primary extends the history to be adopted in the new view to include all request with an ORDER-REQ msg (sequence number > largest one in any CC appearing at least f+1 view change msg)

side effect: it may find different requests for a given sequence number. (is benign)

process:

(1) replicas initiate the view change by sending accusation against the primary to all other replicas.

(2) replicas receives f+1 accusations that the primary is faulty and commits to view change

$< Vi e wC han g e, v + 1, s, C, CC, O, i >_{σ_{i}}$

Symbol	meaning
s	sequence number of last stable checkpoint
C	proof of the last stable checkpoint including 2f+1 cp msg
O	$i$ 's ordered request history since the last stable checkpoint

(3) replica receives 2f+1 view change msg.

new primary p constructs new msg $< N e w Vi e w, v + 1, P, G, σ_{p} >_{σ_{p}}$

Symbol	Meaning
P	set including valid View-change msg received by the new primary
G	the ordered request history computed by the new primary using P

backup not receive msg from new primary when timer expired? initialize the new view of v+2

(4) Replica receives a valid new view msg, reconciles its local state and changes to the new view.

Checkpoint

limit the state that must be stored by replicas and reduces the cost of performing view changes

Each replica maintains an ordered history of the requests it has executed, a copy of the max commit certificate (largest prefix of executed history seen by $i$ )

committed history: request executed before the commit certificate.

speculative history: request executed but not commit to client.

a replica constructs a checkpoint and a corresponding stable application state snapshot every CP_INTERVAL requests.

the process for tentative checkpoint & corresponding stable application to become stable is similar to traditional BFT protocols.

To bound the size of history:

1. truncate the history before the committed checkpoint (stable checkpoint)

2. block the new requests after processing $2 * CP_I NTER V A L$

a response cache (a copy of the latest ordered request from, and corresponding response to, the client)

Safety: If a request with sequence number $n$ and history $h_{n}$ completes, then any requests after this request should follow $n^{'} > n$ and $h_{n}$ is the prefix of $h_{n}^{'}$

Liveness: Any request issued by a correct client eventually completes.