Tutorial: stable-structures
✏ 2023-01-20 ✂ 2023-02-04- Introduction
- Design principles
- Abstractions
- Data structures
- Constructing stable structures
- Tying it all together
- Where to go next
Only purpose of any code is to transform data.
Introduction
Canisters hosted on the Internet Computer (IC) are mutable: canister controllers can upgrade the code to add new features or fix bugs without changing the canister’s identity.
Since the orthogonal persistence feature cannot handle upgrades, the IC allows canisters to use additional storage, called stable memory, to facilitate the data transfer between code versions.
The conventional approach to canister state persistence is to serialize the entire state to stable memory in the pre_upgrade
hook and decode it back in the post_upgrade
hook.
This approach is easy to implement and works well for relatively small datasets.
Unfortunately, it does not scale well and can render a canister non-upgradable, so I recommend using stable memory as the primary storage when possible.
The ic-stable-structures library aims to simplify managing data structures directly in stable memory. This article explains the philosophy behind the library This article describes the 0.5.0 version of the library. and how to use it effectively.
Design principles
The point is that you must decide, in advance, what the coding priorities and quality bars will be; otherwise, the team will have to waste time rewriting misconceived or substandard code.
Software designs reflect their creators’ values.
The following principles shaped the stable-structures
library design.
- Radical simplicity. Programming stable memory is significantly easier than working with conventional file systems. The IC solves many issues with which any good storage must deal: data integrity, partial writes, power outages, and atomicity of multiple writes. Even with all these issues sorted out, complicated designs would be hard to implement, debug, and maintain. Each data structure follows the most straightforward design that solves the problem at hand.
- Backward compatibility. Upgrading the library version must preserve the data. All data structures have a metadata section with the layout version. Newer implementations will respect old layouts and should not require data migration.
-
No
pre_upgrade
hooks. A bug in thepre_upgrade
hook can make your canister non-upgradable. The best way to avoid this issue is not to have apre_upgrade
hook. - Limited blast radius. If a single data structure has a bug, it should not corrupt the contents of other data structures.
- No reallocation. Moving large amounts of data is expensive and can lead to prohibitively high cycle consumption. All data structures must manage their memory without costly moves.
- Compatibility with multi-memory WebAssembly. The design should work when canisters have multiple stable memories since this feature is on the IC roadmap.
As a result of these goals, using the library requires planning and understanding your data shape.
For example, you might need to set an upper bound on a value size that you cannot adjust without data migration.
Or you might have to decide how many memory pages you want to allocate for canister configuration.
There are other libraries with similar goals whose authors chose a slightly different set of trade-offs, such as ic-stable-memory
.
Abstractions
In solving a problem with or without a computer it is necessary to choose an abstraction of reality, i.e., to define a set of data that is to represent the real situation. The choice must be guided by the problem to be solved.
Memory
The core abstraction of the library is the Memory
trait that models a WebAssembly memory instance
That design decision is not a coincidence.
Eventually, canisters will access multiple stable memories, and each data structure might reside in a dedicated memory instance.
.
pub trait Memory {
/// Equivalent to WebAssembly memory.size.
fn size(&self) -> u64;
/// Equivalent to WebAssembly memory.grow.
fn grow(&self, pages: u64) -> i64;
/// Copies bytes from this memory to the heap (in Wasm, memory 0).
fn read(&self, offset: u64, dst: &mut [u8]);
/// Writes bytes from the heap (in Wasm, memory 0) to this memory.
fn write(&self, offset: u64, src: &[u8]);
}
Some instances of the Memory
trait are:
-
The
Ic0StableMemory
type delegates calls to the IC System API. -
RefCell<Vec<u8>>
implements theMemory
interface for a byte array. This type is helpful for unit tests. -
The
DefaultMemoryImpl
type alias points toIc0StableMemory
when compiled to WebAssembly. Otherwise it expands to a memory backed by a byte array. This alias allows you to compile your canister to native code with minimal effort. -
RestrictedMemory
is a view of another memory restricted to a contiguous page range. You can use this type to split a large memory into non-intersecting regions if you know the size of the chunk in advance.RestrictedMemory
works best for allocating relatively small fixed-size memories.
The most powerful and convenient way to create memories is to use the MemoryManager
.
This utility interleaves up to 255 non-intersecting memories in a single address space, acting similarly to a virtual memory subsystem in modern operating systems.
The memory manager uses part of the parent memory to keep a dynamic translation table assigning page ranges to virtual memories.
Virtual memories have a non-contiguous representation, so a single write can translate to multiple system calls.
Storable types
The library does not impose any serialization format on you and does not provide a default.
Depending on your needs, you might prefer Candid, Protocol Buffers, CBOR, Borsh, DER, or something else.
The Storable
trait abstracts data structures over your choice of serialization format.
Some data structures, such as stable vectors, need to know how much memory they must allocate for each instance of your storable type.
Such types rely on an extension of this trait, providing this necessary metadata, BoundedStorable
.
BoundedStorable
types are analogous in their function to Sized
types, except that the compiler cannot deduce serialized sizes for you.
The library implements these traits for a few basic types, such as integers, fixed-size arrays, and tuples, allowing you to get away without serialization libraries if you store only primitives.
It also provides efficient variable-size bounded byte arrays, the Blob<N>
type, where N
is the maximal number of bytes this type can hold.
For example, you can persist an IC principal as a Blob<29>
.
Data structures
One has an intuitive feeling that data precede algorithms: you must have some objects before you can perform operations on them.
The heart of the stable-structures
library is a collection of data structures, each spanning one or more memories.
Stable structures do not compose The reason for this restriction is simplicity: most data structures are significantly easier to implement correctly and efficiently, assuming they can span an entire memory. . For example, you cannot construct a stable map containing stable vectors as values. Many conventional storage systems impose the same restriction. For example, SQL databases do not allow tables to hold other tables as values, and Redis does not allow the nesting of its data types.
Using composite keys or defining several data structures linked with foreign keys, you can work around the restriction.
Let us examine the available data structures in detail.
Stable cell
A Cell represents a single value stored in stable memory in serialized form. Cell’s contents in stable memory updates every time you change the underlying value.
The primary use case for cells is storing canister configuration:
- The ICRC-1 Ledger Archive canister persists its initialization arguments in a cell.
- The Internet Identity Archive canister holds its init arguments in a cell.
Stable vector
A Vec
is a growable mutable array similar to std::vec::Vec
.
A stable vector stores its items by value, so it must know how much space it needs to allocate for each item, hence the BoundedStorable
bound on the item type.
Stable vector takes advantage of the T::IS_FIXED_SIZE
attribute of the item type.
If the value size is not fixed, the actual value can occupy anything from zero to T::MAX_SIZE
bytes, so the vector allocates T::MAX_SIZE
and a few extra bytes per entry to store the actual entry size.
If all the values are the same size, the vector implementation uses T::MAX_SIZE
for the item slot, saving up to 4 bytes per entry.
This reduction is primarily helpful for vectors of primitives (e.g., StableVec<u64>
).
The stable structure powering the Internet Identity service is a stable vector in disguise, though a less generic one.
Stable log
Sometimes called write-ahead logs or commit logs or transaction logs, logs have been around almost as long as computers and are at the heart of many distributed data systems and real-time application architectures.
A Log is an append-only list of arbitrary-sized values, similar to streams in Redis. The log requires two memories: the index storing entry offsets and the data storing raw entry bytes. The number of instructions needed to access old and append new entries does not depend on the number of items in the log, only on the entry size.
The log is a versatile data structure that can be helpful in almost any application:
- The ICRC-1 Ledger Archive canister stores archived transactions in a stable log.
- The Internet Identity Archive canister holds events in a stable log.
- The Chain-Key Bitcoin Minter canister persists all state modifications in a stable log. Replaying events from the log is the minter’s primary upgrade mechanism.
Stable B-tree
Deletion of items from a B-tree is fairly straightforward in principle, but it is complicated in the details.
The BTreeMap
stable structure is an associative container that can hold any bounded storable type.
The map must know the sizes of the keys and values because it allocates nodes from a pool of fixed-size tree nodes
The tree allocator is a free-list allocator, the simplest allocator capable of freeing memory.
.
The interface of stable BTreeMap
will look familiar to any seasoned Rust programmer.
If you need a BTreeSet<K>
, you can use BTreeMap<K, ()>
instead.
The range
method is handy for selecting relevant data from a large data set.
Stable B-tree maps are ideal for large sets of relatively small objects (hundreds of bytes). For example, the bitcoin canister uses this data structure to store bitcoin block headers and UTXOs.
Constructing stable structures
Make sure that objects are initialized before they’re used.
We have to declare stable structures before we can use them.
Each data structure T
in the library declares at least three constructors:
T::new
allocates a new copy ofT
in the given memory, potentially overwriting previous memory content.T::load
recovers a previously constructedT
from the given memory.T::init
is a dwim constructor acting asT::new
if the given memory is empty and asT::load
otherwise.
In practice, most canisters need only T::init
.
The most common way to create a stable structure is by declaring a global variable. This approach reminds me of the way we organize tables in relational databases.
The main benefit of this approach is that the runtime will automatically initialize the stable structure the first time you access it.
Ensure that you access all such variables in the post_upgrade
hook.
Otherwise, you might only be able to catch a configuration error after the upgrade is complete.
However, you do not have to declare each stable structure in a separate global variable.
You can embed stable structures into regular structures if this helps you keep the code better organized.
For example, the Bitcoin canister stores UTXOs in multiple stable structures combined into a regular struct
.
Tying it all together
This section contains an example of declaring several data structures for a ledger canister compatible with the ICRC-1 specification. We will need three collections:
-
A
Cell
to store the ledger metadata. -
A
StableBTreeMap
to map accounts An account is a pair of a principal and a 32-byte subaccount. to their current balance. - A
StableLog
for transaction history.
The ledger metadata is relatively small; it should fit into two megabytes or sixteen WebAssembly memory pages. We will allocate the metadata cell in a restricted memory, partitioning the rest of the stable memory between the accounts map and the log using the memory manager.
Let us put these ideas to code. First, we must import a few type definitions from the library.
use ic_stable_structures::{
DefaultMemoryImpl as DefMem,
RestrictedMemory,
StableBTreeMap,
StableCell,
StableLog,
Storable,
};
use ic_stable_structures::memory_manager::{
MemoryId,
MemoryManager as MM,
VirtualMemory,
};
use ic_stable_structures::storable::Blob;
use std::borrow::Cow;
use std::cell::RefCell;
Next, let us define the types of data we will store in our stable structures.
I omit the field definitions because they are not essential for understanding the example.
We derive serialization boilerplate for the Metadata
and Tx
types using the serde framework.
type Account = (Blob<29>, [u8; 32]);
type Amount = u64;
#[derive(serde::Serialize, serde::Deserialize)]
struct Metadata { /* … */ }
#[derive(serde::Serialize, serde::Deserialize)]
enum Tx {
Mint { /* … */ },
Burn { /* … */ },
Transfer { /* … */ },
}
The next step is to decide on the data serialization format.
I use Concise Binary Object Representation in the example because this encoding served me well in production.
Instead of implementing the Storable
trait for Metadata
and Tx
, I define a generic wrapper type Cbor<T>
that I use for all types I want to encode as cbor and implement Storable
only for the wrapper.
I also implement std::ops::Deref
to improve the ergonomics of the wrapper type.
/// A helper type implementing Storable for all
/// serde-serializable types using the CBOR encoding.
#[derive(Default)]
struct Cbor<T>(pub T)
where T: serde::Serialize + serde::de::DeserializeOwned;
impl<T> std::ops::Deref for Cbor<T>
where T: serde::Serialize + serde::de::DeserializeOwned
{
type Target = T;
fn deref(&self) -> &Self::Target { &self.0 }
}
impl<T> Storable
for Cbor<T>
where T: serde::Serialize + serde::de::DeserializeOwned
{
fn to_bytes(&self) -> Cow<[u8]> {
let mut buf = vec![];
ciborium::ser::into_writer
(&self.0, &mut buf).unwrap();
Cow::Owned(buf)
}
fn from_bytes(bytes: Cow<[u8]>) -> Self {
Self(ciborium::de::from_reader
(bytes.as_ref()).unwrap())
}
}
The final and most important part is defining stable structures as global canister variables.
Note the use of RestrictedMemory
to split the canister memory into two non-intersecting regions and MemoryManager
(abbreviated as MM
) to interleave multiple data structures in the second region.
// NOTE: ensure that all memory ids are unique and
// do not change across upgrades!
const BALANCES_MEM_ID: MemoryId = MemoryId::new(0);
const LOG_INDX_MEM_ID: MemoryId = MemoryId::new(1);
const LOG_DATA_MEM_ID: MemoryId = MemoryId::new(2);
// NOTE: we allocate the first 16 pages (about 2 MiB) of the
// canister memory for the metadata.
const METADATA_PAGES: u64 = 16;
type RM = RestrictedMemory<DefMem>;
type VM = VirtualMemory<RM>;
thread_local! {
static METADATA: RefCell<StableCell
<Cbor<Option<Metadata>>, RM>> =
RefCell::new(StableCell::init(
RM::new(DefMem::default(), 0..METADATA_PAGES),
Cbor::default(),
).expect("failed to initialize the metadata cell")
);
static MEMORY_MANAGER: RefCell<MM<RM>> = RefCell::new(
MM::init(RM::new(DefMem::default(), METADATA_PAGES..u64::MAX))
);
static BALANCES: RefCell<StableBTreeMap
<Account, Amount, VM>> =
MEMORY_MANAGER.with(|mm| {
RefCell::new(StableBTreeMap::init(mm.borrow().get(BALANCES_MEM_ID)))
});
static TX_LOG: RefCell<StableLog
<Cbor<Tx>, VM, VM>> =
MEMORY_MANAGER.with(|mm| {
RefCell::new(StableLog::init(
mm.borrow().get(LOG_INDX_MEM_ID),
mm.borrow().get(LOG_DATA_MEM_ID),
).expect("failed to initialize the tx log"))
});
}
We are all set to start working on the ledger! I left implementing the rest of the specification as an exercise for an attentive reader.
Where to go next
- Take a look at the library usage examples.
- Check out real-world usage examples in production-quality canisters Note that these examples can use a slightly outdated library version. , such as II archive and ckBTC minter, and Bitcoin.
- Read the official documentation.
Similar articles
- Effective Rust canisters
- When Rust hurts
- Designing error types in Rust
- IC internals: XNet protocol
- Candid for engineers