IC internals: XNet protocol

  2022-07-01

Introduction

This article continues the previous post on state machine replication in the Internet Computer (IC), A swarm of replicated state machines. This time, we shall examine the protocol over which independent state machines (also known as subnets) communicate, the XNet protocol.

Subnets

A subnet is a collection of nodes participating in a single instance of the consensus protocol. All the nodes in a subnet have the same state and apply the same blocks.

It might be tempting to think that nodes of a subnet are physically collocated. However, the opposite is usually true: subnet nodes are often geographically distributed across independent data centers There are still good reasons to assign multiple nodes in a data center to the same subnet, such as increasing query call capacity and speeding up recovery in case of a replica restart. . The goal is to improve availability: a disaster in a single data center cannot take down the entire subnet. The registry canister maintains the assignment of nodes to physical machines, and the Network Nervous System governs all the changes to the registry.

Nodes from different subnets can live in the same data center. This setup might be counter-intuitive: nodes from different subnets might sometimes have better network connectivity than nodes in the same subnet.

Two three-node subnets located in three data centers. Solid lines represent peer-to-peer communication within a subnet; dotted lines — cross-subnet communication. Not all XNet connections are present in the picture.

XNet protocol is not limited to the same-datacenter communication; any node from one subnet can talk to any other node from another subnet. However, the replica prefers contacting close The replica uses network latency as the proximity measure. It randomly contacts nodes from other subnets, assigning higher weights to nodes with lower latency. peers to reduce the message delivery latency.

Message streams

Messages start their journey in the output queues of canisters hosted on the source subnet. The subnet sorts these messages based on the destination subnet and interleaves them into flat message streams (one stream per destination subnet). Each message in the stream gets a unique monotonically increasing index.

The flow of messages within a subnet. Canisters push messages into their output queues; the stream builder picks up these messages and constructs a flat outgoing message stream for each subnet.

The component merging the queues (aptly called stream builder) should satisfy a few constraints:

When a node from subnet X decides to get new messages from subnet Y, it picks a node from Y (the registry stores the entire network topology) and calls Y’s XNet endpoint with the index of the first unseen message and the number of messages to fetch.

Block payloads

The job of the consensus protocol is to aggregate messages from the outside world and pack them into a neat block. Consensus includes several types of messages into blocks, such as user ingress messages, Bitcoin transactions (for subnets with enabled Bitcoin integration), and inter-canister messages from other subnets. We call messages paired with the data required for their validation a payload. We call components that pull payloads from the network payload builders.

The consensus algorithm aggregates messages from the outside world into blocks.

XNet payload builder pulls messages from nodes assigned to other subnets using a simple HTTP protocol. XNet endpoint is a component that serves messages destined for other subnets over secure TLS connections, accepting connections only from other nodesThis measure does not imply privacy because malicious nodes can access the data; but ensures that network providers cannot read the messages.. XNet endpoint fetches the complete list of nodes, their subnet assignment, IP addresses, and public keys (required to establish a TLS connection) from the registry.

The data flow in the XNet protocol. Subnet Y produces a signed stream of messages for subnet X and exposes this stream via an HTTP endpoint. Subnet X pulls messages from one of the nodes on subnet Y.

Garbage collection

We now know how one subnet accumulates messages destined for another subnet. This knowledge begs another question: how do replicas remove messages that the destination subnet has already consumed? We need a feedback mechanism allowing the consumer subnet to tell the producer subnet that it does not need some stream prefix anymore. We call this mechanism signals.

Signals are a part of the XNet payload specifying the prefix of the reverse stream that the sending subnet can drop. When a node from subnet X fetches XNet payload from a node from subnet Y, in addition to actual messages, the Y node includes the stream header. The stream header describes the state of the X ↔ Y communication:

Signals solve the issue of collecting obsolete messages, but they introduce another problem: now we also need to garbage-collect signals! Luckily, we already have all the information we need: we keep signals only for messages that are still present in the reverse stream. Once we notice (by looking at the range of message indices) that the remote subnet dropped messages from its stream, we can remove the corresponding signals from our header.

From my experience, the interplay of message indices and signals is notoriously hard to grasp, so let us look at an example. The diagram below depicts two subnets, X and Y, caught in the middle of communication. Subnet X gets a prefix of Y’s stream and the header, allowing X to induct new messages and garbage-collect its messages and signals. X also publishes signals for the newly received messages and updates its indices accordingly so that Y could garbage-collect its messages and signals in the next round of communication.

Two subnets, X and Y, communicating over the XNet protocol. In previous communications, subnet Y consumed messages [0,10) from X’s stream and has recently received messages 10 and 11. Now, subnet X receives a prefix of Y’s stream and the matching header and removes messages 10 and 11 from its stream because Y will not need them anymore. X also includes signals for the newly received messages into the X → Y stream header and removes an obsolete signal for message Y1 it consumed before.

Signals and stream messages are like snakes eating each other’s tails on the Auryn: the sender drops messages when it sees signals, and the receiver drops signals when it sees stream bounds advancing.

Stream certification

Since no two nodes in the network can blindly trust each other, we need a mechanism to validate the contents of XNet payloads. More specifically, we want proof that the stream prefix in the payload is the same on all honest nodes in the subnet that produced the stream. In the previous article, we have seen how nodes use threshold signatures on state trees as authenticity proofs for call replies. The XNet protocol relies on the same trick: nodes include streams destined for other subnets into their state trees and use threshold signatures as proofs of authenticity for nodes from other subnets.

The encoding of subnet message streams in the state tree.

Conceptually, a node requesting messages from a subnet is not different from a client requesting a response. This similarity allows us to use the same Currently, there is a minor difference in how we represent certificates in the XNet protocol and the IC HTTP interface. The HTTP interface certificates combine the data and the hashes required to check the authenticity in a single data structure, the hash tree. The XNet protocol separates the raw message data and the hashes (called the witness) into separate data structures. The reason for that distinction is purely historical. We implemented the XNet protocol significantly earlier than the response authentication, so we could not benefit from Joachim Breitner’s brilliance. Joachim took the XNet authentication scheme as the starting point and simplified it for the IC Interface Specification. authentication mechanism in both cases.

The tree structure of the certificate comes in handy for buffering messages on the receiver: the receiver can maintain a slightly larger pool of messages than a single block can fit, asynchronously fetching new messages and appending them to the pool, merging the certificates appropriately. This optimization allows us to use the subnet’s narrow XNet channel more efficiently and improve fairness when we include messages from multiple subnets into a single block.

Limitations

The XNet protocol is a marvelous feat of engineering, but it has a few inherent limitations:

Code references

A few pointers to the code implementing designs from this article:

Credits

Finally, I shall give due credit to people who played an essential role in the protocol development.

My main contributions are stream certification and TLS integration.

Similar articles