IC internals: Internet Identity storage

  2022-10-12

Introduction

The Internet Identity canister innovated a passwordless approach to authentication on the Internet Computer (IC). I was lucky to be among the first members of the team that launched this service. Despite the time shortage, the team pioneered a few engineering solutions, some of which later caught up in other services:

This article will explain how the Internet Identity canister uses its stable memory.

Internet Identity data model

The Internet Identity service acts as a proxy between the browser’s authentication mechanism and the Internet Computer authentication system. A user registers in the service by creating an anchor (a short number) and associating authentication devices, such as Yubikey or Apple Touch ID, with that anchor. The Internet Identity canister stores these associations and presents a consistent identity to each DApp integrated with the authentication protocol.

The data model of the Internet Identity system. The system allocates a unique anchor (a short number) for each user. The user can attach one or more authentication devices to the anchor. The Internet Identity system allows users to log into DApps that support the authentication protocol. The DApps will see the same user identity (also known as the principal) consistently, regardless of which device the user used for authentication.

The Internet Identity service maintains three core data structures:

  1. A mapping from anchors to authentication devices and their attributes.
  2. A collection of frontend assets, such as the index page, JavaScript code, and images.
  3. A set of temporary certified delegations. Most delegations expire within a minute after the service issues them.

Conventional canister state management

The orthogonal persistence feature of the IC greatly simplifies program state management. Yet it does not solve the problem of code upgradesYou can find more detail in the Surviving upgrades section of the post on orthogonal persistence.. Most canister work around this issue by using stable memory as a temporary buffer during the code upgrade.

The conventional model of canister code upgrades. In the pre-upgrade hook, the canister marshals its data into stable memory storage. In the post-upgrade hook, the canister unmarshals the contents of stable memory to reconstruct the convenient data representation.

This upgrade model works well for canisters that hold little data, usually up to a few hundred megabytes (the exact limit depends on the encoding scheme). However, if the state grows too large for the persistence roundtrip to fit into the upgrade instruction limit, upgrading the canister might become complicated or impossible.

From the beginning, the Internet Identity team knew that the service must support millions of users and retain gigabytes of data. One way to achieve scalability is to spread the data across multiple canisters and ensure that each canister holds a reasonably-sized share of the state. However, the time constraint forced the team to keep the architecture simple and build the service as a monolithic backend canisterLuckily, there is an easy way to extend the chosen monolithic design to support data sharding.. That design has other advantages besides short time-to-market: atomic upgrades and the ease of deployment, testing, and monitoring.

Stable memory as primary storage

To reconcile the monolithic design with fearless upgrades, the team arranged the core data structures in the following way:

  1. The mapping from anchor to devices lives directly in stable memory. Each anchor gets a fixed chunk of memory that is large enough to hold ten authentication devices (on average).
  2. The build process embeds the frontend assets directly into Internet Identity’s WebAssembly binary, eliminating the need to carry those assets across upgrades.
  3. The canister discards all delegations on upgrades. This decision simplifies data management without affecting the user experience much.

The canister arranges the anchor mapping directly in stable memory, updating the relevant sections incrementally with each user interaction. A good analogy with traditional computing would be updating data in files incrementally instead of loading them fully into memory and writing them back. The main difference is that stable memory is a much safer mechanism than file manipulations: you do not need to worry about data loss, partial writes, power outages, and disk corruption.

This design eliminates the need for a pre-upgrade hook: stable memory is already up-to-date when the upgrade starts. The post-upgrade hook does little work besides reading and validating a few bytes of the storage metadata.

The Internet Identity storage model. The anchor data lives directly in stable memory. The pre-upgrade is unnecessary; the post-upgrade hook does little work.

The memory layout

The Internet Identity canister divides the stable memory space into non-overlapping sections. The first section is the header holding the canister configuration, such as the random salt for hashing and the assigned anchor range. The rest of the memory is an array of entries; each entry corresponds to the data of a single anchor.

The Internet Identity stable memory layout. The system reserves the first 512 bytes for static configuration and divides the rest into 2KiB blocks. Each such block holds data associated with a single anchor.

The size of the header section is 512 bytes; the layout reserves most of this space for future extensions. The following is the list of all header fields as of October 2022:

  1. Magic (3 bytes): a fixed string "IIC" indicating the Internet Identity stable memory layout.
  2. Version (1 byte): the version of the memory layout. If we need to change the layout significantly, the version will tell the canister how to interpret the data after the code upgrade.
  3. Entry count (4 bytes): the total number of anchors allocated so far.
  4. Min anchor (8 bytes): the value of the first anchor assigned to the canister. The canister allocates anchors sequentially, starting from this number.
  5. Max anchor (8 bytes): the value of the largest anchor assigned to the canister. The canister becomes full and stops allocating anchors when MinAnchor + EntryCount = MaxAnchor.
  6. Salt (32 bytes): salt for hashing. The canister initializes the salt upon the first request by issuing a raw_rand call.

Further sections are all 2 KiB in size and contain anchor data encoded as Candid. Entry at index N holds information associated with anchor MinAnchor + N, where MinAnchor comes from the header. The first two bytes of each entry determine the size of the encoded blob.

Example: lookup anchor devices

To get a feeling for how this layout works in practice, let us assume that the canister holds the range of anchors between MinAnchor = 10_000 and MaxAnchor = 1_000_000 and already allocated NumEntries = 20_000 anchors. Below are the steps the canister must perform to fetch the list of devices for anchor 12345:

  1. Look up the range of metadata from the header. The requested anchor 12345 is in the range of live anchors between 10_000 and 30_000 (MinAnchor and MinAnchor + NumEntries, correspondingly). Entry 2345 (12345 - MinAnchor) holds the data for anchor 12345.
  2. Read the first two bytes at offset EntryStart = 512 bytes + 2345 * 2KiB as a 16-bit integer (little-endian). The decoded value N determines the size of the blob we have to read next.
  3. Read N bytes starting at offset EntryStart + 2. Decode the bytes as a Candid structure containing the list of authentication devices.

Code pointers

Similar articles