Byzantine Broadcast in Dolev-Strong Protocol

In this article, we study a classic result from 1983 by D.Dolev and H.Strong on reaching an agreement in a distributed system between a set of nodes $N$ in the presence of faulty (Byzantine) nodes $F$ .

Motivation: Understanding the Byzantine General Problem and Dolev-Strong Protocol is the gateway to getting a feel for how consensus protocol works, including more complicated models used in the blockchain space and distributed systems. So in the spirit of building up your muscles, understanding this protocol will make life easier for you.

1. The Byzantine General Problem

The Problem

How to make absolutely sure that multiple entities, which are separated by distance, are in absolute full agreement before an action is taken?

In other words, how can individual parties find a way to guarantee full consensus?

Example

Imagine you're a General in a Byzantine army and you're planning to attack an enemy city.

You have the city surrounded by several other generals, each of them separated several miles from the other and you must decide as a group whether to attack or retreat. Some generals may prefer to attack while others prefer to retreat. The important thing is that all generals agree on a common decision.

A coordinated attack will lead to successful victory. An uncoordinated attack will lead to defeat.

Byzantine generals: coordinated attack or retreat

Approach

You have decided to attack the city but you have no means of communication to tell other generals your decision. How do you make sure then, that other generals reach a consensus and all attack together at the same time?

You could send messengers on horseback but you need also to have a reply from each of your generals confirming that they have received your message, which means they have to send messenger again to you on horseback.

Messengers relaying decisions between generals

Questions

What if other generals are Traitors, and they send back to you a message that they're willing to attack while their intention is to Retreat?
How do other generals know that the messages that they received from you are genuine and haven't been intercepted or altered by enemies? (Solved by digital signatures)
What if a messenger is captured by the enemy and an imposter messenger with a fake reply is sent back to you?
What if one of your messengers is killed by other armies before or after delivering the message?

This problem has remained unsolvable for years, and this is the core of consensus used in Bitcoin and distributed systems in general.

2. Goals And Assumptions

Goals

Consistency (Agreement): All nodes agree on the same value $v^*$ .
Liveness (Validity): Every submitted value to at least one node is eventually added to all nodes' history.

Assumptions

Permissioned Settings: We assume we have a priori known set of nodes $\{1, 2, ..., n\}$ before the start of the protocol.
Public Key Infrastructure: Each node $i$ has a distinct public-private key pair ( $pk_i$ / $sk_i$ ). $pk_i$ is known to all nodes at the start of the protocol.
Synchronous: Every message sent by one node at time $t$ arrives to recipient by time step $t+1$ . All nodes share a global clock.
All honest nodes: We will assume at the beginning that we don't have Byzantine or faulty nodes (we'll relax this assumption later).

Plan

Solve the problem under these assumptions and then relax the last assumption of having all honest nodes.

3. Solving Via Round-Robin Leaders

A Lazy Protocol

If every value submitted by a node always arrives at exactly the same time at every node, then the protocol is valid. But if a node only submits a transaction to a subset of the nodes, or if the network delays because values arrive at different nodes in different time steps, then consistency will generally be violated.

Coordination via Rotating Leaders

The protocol above highlights the need for nodes to coordinate. A solution is to use a rotating leader for each step $t$ . At time step $t = t_1$ , Node 1 sends value $v_1$ to both nodes $\{2, 4\}$ . At time step $t = t+1$ , Node 2 will be the new sender and sends value $v_1$ to the nodes it's connected to.

We keep repeatedly iterating through the nodes in a round-robin order. In the last time step, all the nodes will have the same value $v_1$ , with both consistency and validity achieved.

Consistency and validity: nodes agreeing on value

Under the 4 previous assumptions (Permissioned, PKI, Synchronous, and All honest nodes) Rotating leader seems to be a good protocol to achieve consistency and validity.

4. Faulty/Byzantine Nodes

A node that is not honest is called a faulty node.

Types of faulty nodes

Fail-Stop: A node halts and remains halted permanently. Other nodes can detect that the node has failed.

Byzantine Faults: A node does not behave according to its specific protocol or algorithm. Usually happens when a malicious actor or a software bug compromises the node.

Omission Fault: A node fails to respond to incoming requests.

Crash Fault: A node halts silently, with no more messages sent or received.

Types of faulty nodes: Fail-Stop, Byzantine, Omission, Crash

Failure of Rotating Leader Protocol

If we relax the last assumption and consider a system of $n$ nodes with $f$ faulty nodes, our previous protocol of rotating leaders will fail. Byzantine nodes can ignore your protocol and do whatever they want, so at a time step $t$ where a Byzantine node is the leader, it may send trickier values to other nodes.

5. Byzantine Broadcast Problem

Cross-Checking

Our simple rotating leaders' protocol achieves consistency and liveness when $f=0$ but not when $f=1$ . The idea: stick with the rotating leader approach, but instead of letting nodes blindly accept the new value received from the leader, add a cross-checking functionality.

Byzantine Broadcast + Rotating Leaders = Fault-tolerant System

Goals of Byzantine Broadcast

Termination: Each honest node $i$ eventually halts with some output $v_i$ .
Agreement: All honest nodes choose the same value $v_i$ (whether or not the sender is honest).
Validity: If the sender is honest, all honest nodes $v_i$ equal $v^*$ .

Implementation

We already know how to solve the Byzantine Broadcast problem in the case of $f=0$ . The sender sends a private value $v^*$ and eventually all nodes output whatever they heard from the sender.

This doesn't work in $f=1$ case because a Byzantine sender can send different values to different nodes leading to inconsistency. To solve this we add cross-checking: when a sender sends a private input $v^*$ , all honest non-senders compare their values to check if they received the same value. The tricky part is that there may be a Byzantine non-sender who lies during the cross-checking phase.

Using the PKI assumption, we can implement the cross-checking functionality with digital signatures.

Cross-checking with PKI: sender signs, non-senders echo and verify

6. Testing the Protocol

The $f=1$ case

Let's test our protocol with $n \geq 4$ nodes, one of them is faulty ( $f=1$ ):

Agreement: We only care about the case when the sender is Byzantine. Because $f=1$ , every other non-sender is honest. In the first time step, the Byzantine sender sends different values. In the second time step, every honest non-sender echoes the value received. In the third time step, all honest non-senders have the same information and carry out the same majority computation.
Validity: We only care about the case when the sender is honest. The sender sends $v^$ with its digital signature. Each honest non-sender receives one vote for $v^$ $v^{*}$ signed by the honest sender, then receives at least one more vote echoed by another honest non-sender. Using majority vote, all honest non-senders output $v^*$ .

The $f=2$ case (the protocol fails)

A conspiracy and coordination between two byzantine nodes will break the protocol. We need more than one round of cross-checking. This is exactly what happens in Dolev-Strong Protocol.

The f=2 case: two Byzantine nodes breaking the protocol

7. The Dolev-Strong Protocol

The Dolev-Strong protocol gives a solution to state machine replication. It works under the assumption of the Synchronous, Permissioned model with PKI.

Validity Proof

Assume the sender is honest. At time step $0$ , the sender sends a signed copy $v^$ to all non-senders. Because the signature can't be forged, all nodes receive the same output $v^$ $v^{*}$ before the beginning of the next step.

Agreement Proof

Assume the sender is byzantine. We need to prove that if a non-sender $i$ gets convinced of a value $v$ by a message $m$ , all other honest non-senders get convinced of that value $v$ before the end of the protocol.

$t = f+1$ : If $i$ becomes convinced of a new value $v$ , there is no time for $i$ to notify other nodes. The only hope is that other nodes already added $v$ independently.
$t \leq f$ : The protocol has not yet ended, so if a non-sender $i$ gets convinced of a new value $v$ by a message $m$ , it sends this value and adds its signature. Every other honest non-sender $j$ receives this message prior to time step $t+1$ .

Agreement proof: message propagation across time steps

This article uses extensively lectures by Tim Roughgarden on "Foundation of Blockchain" and other research papers.