Byzantine Broadcast in Dolev-Strong Protocol

In this article, we study a classic result from 1983 by D.Dolev and H.Strong on reaching an agreement in a distributed system between a set of nodes NN in the presence of faulty (Byzantine) nodes FF.

Motivation: Understanding the Byzantine General Problem and Dolev-Strong Protocol is the gateway to getting a feel for how consensus protocol works, including more complicated models used in the blockchain space and distributed systems. So in the spirit of building up your muscles, understanding this protocol will make life easier for you.

1. The Byzantine General Problem

The Problem

How to make absolutely sure that multiple entities, which are separated by distance, are in absolute full agreement before an action is taken?

In other words, how can individual parties find a way to guarantee full consensus?

Example

Imagine you're a General in a Byzantine army and you're planning to attack an enemy city.

You have the city surrounded by several other generals, each of them separated several miles from the other and you must decide as a group whether to attack or retreat. Some generals may prefer to attack while others prefer to retreat. The important thing is that all generals agree on a common decision.

A coordinated attack will lead to successful victory. An uncoordinated attack will lead to defeat.

Byzantine generals: coordinated attack or retreat

Approach

You have decided to attack the city but you have no means of communication to tell other generals your decision. How do you make sure then, that other generals reach a consensus and all attack together at the same time?

You could send messengers on horseback but you need also to have a reply from each of your generals confirming that they have received your message, which means they have to send messenger again to you on horseback.

Messengers relaying decisions between generals

Questions

  • What if other generals are Traitors, and they send back to you a message that they're willing to attack while their intention is to Retreat?
  • How do other generals know that the messages that they received from you are genuine and haven't been intercepted or altered by enemies? (Solved by digital signatures)
  • What if a messenger is captured by the enemy and an imposter messenger with a fake reply is sent back to you?
  • What if one of your messengers is killed by other armies before or after delivering the message?
This problem has remained unsolvable for years, and this is the core of consensus used in Bitcoin and distributed systems in general.

2. Goals And Assumptions

Goals

  • Consistency (Agreement): All nodes agree on the same value vv^*.
  • Liveness (Validity): Every submitted value to at least one node is eventually added to all nodes' history.

Assumptions

  • Permissioned Settings: We assume we have a priori known set of nodes {1,2,...,n}\{1, 2, ..., n\} before the start of the protocol.
  • Public Key Infrastructure: Each node ii has a distinct public-private key pair (pkipk_i / skisk_i). pkipk_i is known to all nodes at the start of the protocol.
  • Synchronous: Every message sent by one node at time tt arrives to recipient by time step t+1t+1. All nodes share a global clock.
  • All honest nodes: We will assume at the beginning that we don't have Byzantine or faulty nodes (we'll relax this assumption later).

Plan

Solve the problem under these assumptions and then relax the last assumption of having all honest nodes.

3. Solving Via Round-Robin Leaders

A Lazy Protocol

If every value submitted by a node always arrives at exactly the same time at every node, then the protocol is valid. But if a node only submits a transaction to a subset of the nodes, or if the network delays because values arrive at different nodes in different time steps, then consistency will generally be violated.

Coordination via Rotating Leaders

The protocol above highlights the need for nodes to coordinate. A solution is to use a rotating leader for each step tt. At time step t=t1t = t_1, Node 1 sends value v1v_1 to both nodes {2,4}\{2, 4\}. At time step t=t+1t = t+1, Node 2 will be the new sender and sends value v1v_1 to the nodes it's connected to.

We keep repeatedly iterating through the nodes in a round-robin order. In the last time step, all the nodes will have the same value v1v_1, with both consistency and validity achieved.

Consistency and validity: nodes agreeing on value

Under the 4 previous assumptions (Permissioned, PKI, Synchronous, and All honest nodes) Rotating leader seems to be a good protocol to achieve consistency and validity.

4. Faulty/Byzantine Nodes

A node that is not honest is called a faulty node.

Types of faulty nodes

  • Fail-Stop: A node halts and remains halted permanently. Other nodes can detect that the node has failed.
  • Byzantine Faults: A node does not behave according to its specific protocol or algorithm. Usually happens when a malicious actor or a software bug compromises the node.
  • Omission Fault: A node fails to respond to incoming requests.
  • Crash Fault: A node halts silently, with no more messages sent or received.
  • Types of faulty nodes: Fail-Stop, Byzantine, Omission, Crash

    Failure of Rotating Leader Protocol

    If we relax the last assumption and consider a system of nn nodes with ff faulty nodes, our previous protocol of rotating leaders will fail. Byzantine nodes can ignore your protocol and do whatever they want, so at a time step tt where a Byzantine node is the leader, it may send trickier values to other nodes.

    5. Byzantine Broadcast Problem

    Cross-Checking

    Our simple rotating leaders' protocol achieves consistency and liveness when f=0f=0 but not when f=1f=1. The idea: stick with the rotating leader approach, but instead of letting nodes blindly accept the new value received from the leader, add a cross-checking functionality.

    Byzantine Broadcast + Rotating Leaders = Fault-tolerant System

    Goals of Byzantine Broadcast

    • Termination: Each honest node ii eventually halts with some output viv_i.
    • Agreement: All honest nodes choose the same value viv_i (whether or not the sender is honest).
    • Validity: If the sender is honest, all honest nodes viv_i equal vv^*.

    Implementation

    We already know how to solve the Byzantine Broadcast problem in the case of f=0f=0. The sender sends a private value vv^* and eventually all nodes output whatever they heard from the sender.

    This doesn't work in f=1f=1 case because a Byzantine sender can send different values to different nodes leading to inconsistency. To solve this we add cross-checking: when a sender sends a private input vv^*, all honest non-senders compare their values to check if they received the same value. The tricky part is that there may be a Byzantine non-sender who lies during the cross-checking phase.

    Using the PKI assumption, we can implement the cross-checking functionality with digital signatures.

    Cross-checking with PKI: sender signs, non-senders echo and verify

    6. Testing the Protocol

    The f=1f=1 case

    Let's test our protocol with n4n \geq 4 nodes, one of them is faulty (f=1f=1):

    • Agreement: We only care about the case when the sender is Byzantine. Because f=1f=1, every other non-sender is honest. In the first time step, the Byzantine sender sends different values. In the second time step, every honest non-sender echoes the value received. In the third time step, all honest non-senders have the same information and carry out the same majority computation.
    • Validity: We only care about the case when the sender is honest. The sender sends vv^ with its digital signature. Each honest non-sender receives one vote for vv^ signed by the honest sender, then receives at least one more vote echoed by another honest non-sender. Using majority vote, all honest non-senders output vv^*.

    The f=2f=2 case (the protocol fails)

    A conspiracy and coordination between two byzantine nodes will break the protocol. We need more than one round of cross-checking. This is exactly what happens in Dolev-Strong Protocol.

    The f=2 case: two Byzantine nodes breaking the protocol

    7. The Dolev-Strong Protocol

    The Dolev-Strong protocol gives a solution to state machine replication. It works under the assumption of the Synchronous, Permissioned model with PKI.

    Dolev-Strong protocol description

    Validity Proof

    Assume the sender is honest. At time step 00, the sender sends a signed copy vv^ to all non-senders. Because the signature can't be forged, all nodes receive the same output vv^ before the beginning of the next step.

    Agreement Proof

    Assume the sender is byzantine. We need to prove that if a non-sender ii gets convinced of a value vv by a message mm, all other honest non-senders get convinced of that value vv before the end of the protocol.

    • t=f+1t = f+1: If ii becomes convinced of a new value vv, there is no time for ii to notify other nodes. The only hope is that other nodes already added vv independently.
    • tft \leq f: The protocol has not yet ended, so if a non-sender ii gets convinced of a new value vv by a message mm, it sends this value and adds its signature. Every other honest non-sender jj receives this message prior to time step t+1t+1.
    Agreement proof: message propagation across time steps

    This article uses extensively lectures by Tim Roughgarden on "Foundation of Blockchain" and other research papers.

    ← All posts