Extending HTTPS outcalls

  2024-04-29

The more I learn about the Chainlink platform, the more parallels I see between Chainlink’s systems and the Internet Computer (ic) network I helped design and implement. Both projects aim to provide a solid platform for trust-minimized computation, but they take different paths toward that goal.

One of the limitations of blockchains is their self-contained nature. They authenticate the data they store and the transaction history, but can’t prove any facts about the external world. This problem is commonly called the oracle problem. Oracles are services that bring external data, such as price feeds and weather conditions, into a blockchain.

The Chainlink network and ic solve the Oracle problem by providing byzantine fault-tolerant protocols. Chainlinks relies on the Off-chain reporting protocol (ocr), while ic provides the https outcalls feature. ocr is more general, while https outcalls are readily available to all developers and are easier to use.

This article explores how to bridge the gap between the two protocols. We will start with an overview of the https outcalls feature. Then, we will design an extension to support cases when http responses are not deterministic. Finally, we will see how to use this extension to implement a robust price feed canister.

HTTPS outcalls in a nutshell

Smart contracts on the ic network can initiate https requests to external services.

First, the canister sends a message to the management canister that includes the https request payload and the transform callback function. The management canister includes this request in a dedicated queue in the node’s replicated state.

A background process independent from the replicated state machine called adapter periodically inspects the request queue and executes requests from the queue. Each replica has an independent instance of the adapter process.

If the original canister specified the transform callback, the adapter invokes the callback on the canister as a query. The callback accepts the raw http response and returns its canonicalized version. One universal use case for transform callbacks is stripping the response headers since they can contain information unique to the response, such as timestamps, that can make it impossible to reach a consensus.

The adapter passes the transformed response to the consensus algorithm, and the nodes exchange their observation shares.

If enough replicas agree on the response, the system includes the response in the block. The state machine delivers the response to the management canister, which forwards it to the originator canister.

Extending HTTPS outcalls

It turns out, https outcalls implement a special case of the ocr’s report generation protocol, where participants are ic nodes. The ocr protocol defines three stages:

  1. In the query stage, the participants receive a task to observe an external data source. This stage is implicit in https outcalls: instead of the protocol leader initiating the query, a canister triggers a query using the system interface.
  2. In the observation stage, each node observes the data source, signs its observation, and sends it over the network. The ic implements this step through the adapter process discussed in the previous section and the consensus algorithm. The adapter executes an https request and filters it through the calling canister’s transformation function. The transformation result is the observation.
  3. In the report stage, the network aggregates participant observations into the final report. This stage is hard-coded in the ic consensus protocol. If 2f + 1 nodes observed the same http response, its value becomes the report.

Multi-HTTP outcalls

To make https outcalls as general as the full report generation protocol, we must make the report stage customizable. The ic consensus algorithm must allow the canister to observe all response versions and distill them into a report. The most straightforward way to achieve this goal is to include all response versions in the block and deliver this batch to the canister.

This design requires adding a new endpoint to the management canister interface. Let’s call this endpoint multi_http_request. It accepts the same request as the existing http_request endpoint but returns multiple responses.

A hypothetical extension to the ic management interface allowing the canister to inspect http responses from multiple nodes.
service ic : {
    // …
    multi_http_request : (http_request_args) -> (vec http_request_result);
};

This interface poses a new challenge: how can the canister know which responses it can trust? The usual approach for numeric observations is to sort them and pick the median. Since there are at most f Byzantine nodes, and the responses vector contains at least 2f + 1 elements, only the top and bottom f responses can skew the observation significantly.

If there are more than 2f + 1 responses, the aggregation function can make a better choice if it knows the value of f. Thus, the system api might provide a function to obtain the maximum number of faulty nodes in a subnet:

ic0.max_faulty_nodes : () -> (i32);

This design significantly restricts the http response size. Since the vector of responses might contain 34 entries on large subnets, and all the responses must fit in the block size and the two-megabyte response size limits, each response must not exceed 58 kilobytes. Luckily, that’s enough for many essential use cases, such as observing a frequently updating price feed or the latest Ethereum block number.

Faulty design: aggregation callbacks

My first attempt at extending https outcalls relied on allowing the canister to specify an additional callback to aggregate multiple observations. Manu Drijvers pointed out a fatal flaw in this design, and I think it’s helpful to outline it here because it highlights differences and parallels between ic’s and ocr’s approach to consensus.

The faulty protocol extension would kick in after the consensus algorithm distributes transformed observations through the peer-to-peer network. Instead of checking whether there are 2f + 1 equal observations, the consensus would invoke the aggregation callback on the canister to obtain a report.

The nodes would then distribute their report shares through the peer-to-peer network.

If there are enough equal report shares to form a consensus, the system sends the report to the canister.

This design would allow the system to save the block space because the block would need to contain only the aggregated response, not all the individual responses.

Unfortunately, this approach doesn’t work. The problem is that we cannot guarantee that different nodes will see the same subset of responses. Each healthy node in the network of 3f + 1 nodes will see responses from some other nodes (at least 2f + 1), but the exact subset might differ for each node. Different observation subsets will lead to unequal aggregated reports, and the system might fail to reach consensus.

The ocr protocol solves this issue by electing a leader node that picks the subset of observations and distributes it to the followers. Thus, all honest nodes must derive the same report from these observations.

There is no leader in the ic consensus protocol; blockmaker rank governs node priority in each round. ic nodes must agree on the observation subset using block proposals, so including all observations in the block is inevitable. However, that requirement doesn’t mean that ic consensus protocol is less efficient: We can view ocr leader as the sole zero-rank block maker that sends the block with observations to all participants.

Use-case: price feeds

One of the most popular use cases for oracles is delivering price feeds to power DeFi applications. Unsurprisingly, the exchange rate canister was one of the first users of the https outcalls feature. This section is a walk through an implementation of a simplistic price feed canister using the ocr-inspired extension of the https outcalls feature discussed in the previous section.

The canister queries a hypothetical price feed api and returns the observed price and the timestamp. Treat the code as pseudo-code: it has never been tested or compiled.

First, we import the necessary api to make https requests. Imports marked in bold do not exist yet.

use ic0::max_faulty_nodes;
use ic_cdk::api::management_canister::http_request::{
    mutli_http_request,
    CanisterHttpRequestArgument, HttpHeader, HttpMethod, HttpResponse, TransformArgs,
    TransformContext,
};

Next, we define the data structures specifying the format of the api response and the price report the canister produces. Since the block space is precious, the ExamplePriceResponse structure restricts the response contents to the fields we need to construct the report.

/// The format of response we get from the example price feed JSON API.
#[derive(serde::Serialize, serde::Deserialize, Debug)]
struct ExamplePriceResponse {
    price: f64,
    timestamp_seconds: u64,
}

#[derive(candid::CandidType, candid::Deserialize, Debug)]
struct PriceReport {
    price: f64,
    timestamp_seconds: u64,
}

We then define the transformation function for the api response. The function removes the response headers and replaces the response body with its restricted canonical version.

#[ic_cdk::query]
fn transform(args: TransformArgs) -> HttpResponse {
    let mut response = args.response;
    response.headers.clear();
    let parsed_body: ExamplePriceResponse =
        serde_json::from_slice(&response.body).expect("failed to parse response body");
    response.body = serde_json::to_vec(&parsed_body).unwrap();
    response
}

It’s time to send a multi-http request to the example price feed api.

#[ic_cdk::update]
async fn observe_icp_price() -> PriceReport {
    let request = HttpRequest {
        url: "https://api.example-exchange.com/price-feed?pair=ICP-USD".to_string(),
        method: HttpMethod::GET,
        headers: vec![],
        transform: Some(TransformContext::from_name("transform".to_string(), vec![])),
        body: None,
    };
    let http_responses = multi_http_request(request).await.expect("http call failed");
    let f = max_faulty_nodes();
    assert!(http_responses.len() >= 2 * f + 1, "not enough responses for consensus");

In the next snipped, we parse the http responses into a vector of ExamplePriceResponse objects. Note that we cannot assume that all responses are parseable since malicious nodes can intentionally reply with garbage.

    let mut price_responses: Vec<ExamplePriceResponse> = vec![];
    let mut faulty_responses = 0;
    for http_response in http_responses {
        match serde_json::from_slice(&http_response.body) {
            Ok(price_response) => {
                price_responses.push(price_response);
            }
            Err(e) => {
                faulty_responses += 1;
                ic_cdk::print(format!("Failed to parse HTTP response body: {:?}", e));
            }
        }
    }
    if faulty_responses > f {
        ic_cdk::trap("too many faulty responses");
    }

Finally, we select the median price and timestamp independently. We cannot assume the entire response is trustworthy only because one of its fields lies in the middle.

    let median_price = price_responses
        .select_nth_unstable_by_key(n / 2, |r| r.price)
        .1.price;
    let median_ts = price_responses
        .select_nth_unstable_by_key(n / 2, |r| r.timestamp_seconds)
        .1.timestamp_seconds;

    PriceReport {
        price: median_price,
        timestamp_seconds: median_ts,
    }
} // end of observe_icp_price

Conclusion

https outcalls feature allows anyone to deploy an oracle service on the ic network with minimal effort. Unfortunately, the current implementation is limited to use cases of deterministic http responses. This article explored how to lift this limitation by taking inspiration from the ocr protocol and including all the http request versions to the requesting canister.

Similar articles