ONNX introduction

  2025-02-17

The machine learning ecosystem is a zoo of competing frameworks. Each framework offers its custom model representation format, sometimes more than one:

  1. Pytorch represents models using Python code that you can download from the Hugging Face model repository. This format is flexible, but it is a disaster from the security point of view because it allows for arbitrary code execution.
  2. TensorFlow supports multiple formats, including SavedModel. The latest version is the Keras model format, which is a zip-file with the model components and metadata. TensorFlow formats are underspecified, implementation-defined, and generally insecure.

onnx is an open format for representing machine learning models; it aims to unify the ecosystem. It doesn’t support advanced use cases, such as model training checkpoints, but it is simple, secure (no arbitrary code execution), and easy to build on.

This article is an introduction to the onnx file format I wish I had when embarked on my ml model transformation journey at Gensyn.

The anatomy of an ONNX model

The primary way to create an onnx file is to export it from a more popular model format, such as Pytorch (using the torch.onnx module) or TensorFlow (using third-party packages, such as tf2onnx).

Conceptually, an onnx file describes a model containing a computation graph. The model acts as a program; it establishes the context through imports and helper function definitions. The primary graph is the main function; it defines a pure function that maps model inputs to outputs.

This description is abstract, so let’s get practical and inspect the textual representation of a tiny perceptron model.

The textual representation of a tiny perceptron model.
<
    ir_version: 7,
    opset_import: ["" : 21]
>

G (float[N, 3] X) => (float[N, 2] Out)
<
    float[3, 4] W1 = {
        0.01, 0.02, 0.03, 0.04,
        0.05, 0.06, 0.07, 0.08,
        0.09, 0.10, 0.11, 0.12
    },
    float[4, 2] W2 = {
        0.11, 0.12,
        0.13, 0.14,
        0.15, 0.16,
        0.17, 0.18
    },
    float[4] B1 = { 0.001, 0.002, 0.003, 0.004 },
    float[2] B2 = { 0.01, 0.02 }
>
{
    Y1 = Gemm(X, W1, B1)
    Y2 = Relu(Y1)
    Z = Gemm(Y2, W2, B2)
    Out = Sigmoid(Z)
}

An onnx file is a model encoded as a Protocol Buffers message. It’s insightful to inspect this low-level representation of our example model.

A Protocol Buffers representation of a tiny onnx model. Ellipses in initializers do not belong to the message; they indicate omitted data entries.
ir_version: 7
opset_import {
  domain: ""
  version: 21
}
graph {
  name: "G"
  input {
    name: "X"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim { dim_param: "N" }
          dim { dim_value: 3 }
        }
      }
    }
  }
  output {
    name: "Out"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim { dim_param: "N" }
          dim { dim_value: 2 }
        }
      }
    }
  }
  initializer {
    name: "W1"
    dims: 3
    dims: 4
    data_type: 1
    float_data: 0.01

  }
  initializer {
    name: "W2"
    dims: 4
    dims: 2
    data_type: 1
    float_data: 0.11

  }
  initializer {
    name: "B1"
    dims: 4
    data_type: 1
    float_data: 0.001

  }
  initializer {
    name: "B2"
    dims: 2
    data_type: 1
    float_data: 0.01

  }
  node {
    input: "X"
    input: "W1"
    input: "B1"
    output: "Y1"
    op_type: "Gemm"
    domain: ""
  }
  node {
    input: "Y1"
    output: "Y2"
    op_type: "Relu"
    domain: ""
  }
  node {
    input: "Y2"
    input: "W2"
    input: "B2"
    output: "Z"
    op_type: "Gemm"
    domain: ""
  }
  node {
    input: "Z"
    output: "Out"
    op_type: "Sigmoid"
    domain: ""
  }
  value_info {
    name: "Y1"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim { dim_param: "N" }
          dim { dim_value: 4 }
        }
      }
    }
  }
  value_info {
    name: "Y2"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim { dim_param: "N" }
          dim { dim_value: 4 }
        }
      }
    }
  }
  value_info {
    name: "Z"
    type {
      tensor_type {
        elem_type: 1
        shape {
          dim { dim_param: "N" }
          dim { dim_value: 2 }
        }
      }
    }
  }
}

In this low-level representation, the graph has the following components:

Graph inputs, outputs, and initializers must have explicit types (in the example, these are float tensors of various shapes); internal values might not have the corresponding value_info entries.

External data

Since Protocol Buffers restrict the file size to two gigabytes, onnx alone cannot encode large models For example, Llama 3.1 8b needs at least 16 gigabytes to encode its model weights using the bf16 type. . To address this problem, onnx allows storing any tensor, such as an initializer or an operator attribute, in external files.

An external data reference specifies the location of the tensor file relative to the model file and the offset and the length within the file The offset and the length fields are optional. Missing offset means from the beginning of the file, missing length—until the end of the file. . The tensor is assumed to be in the flat row-major, little-endian format.

Let’s make our tiny perceptron model larger and move its weight into a separate file.

A perceptron model that stores its weights in an external file. The location path is relative to the model file.
<
    ir_version: 7,
    opset_import: ["" : 21]
>

G (float[N, 64] X) => (float[N, 10] Out)
<
    float[64, 1024] W1 = [
        "location": "weights.bin", "offset": "0", "length": "262144"
    ],
    float[1024, 10] W2 = [
        "location": "weights.bin", "offset": "262144", "length": "40960"
    ],
    float[1024] B1 = [
        "location": "weights.bin", "offset": "303104", "length": "4096"
    ],
    float[10] B2 = [
        "location": "weights.bin", "offset": "307200", "length": "40"
    ]
>
{
    Y1 = Gemm(X, W1, B1)
    Y2 = Relu(Y1)
    Z = Gemm(Y2, W2, B2)
    Out = Sigmoid(Z)
}

The textual syntax for external tensors desugars into a protobuf message with the data_location field set to EXTERNAL, and repeated external_data fields indicating the data location.

An example of an initializer message that refers to external data.
initializer {
  name: "W2"
  dims: 1024
  dims: 10
  data_type: 1
  external_data {
    key: "location"
    value: "weights.bin"
  }
  external_data {
    key: "offset"
    value: "262144"
  }
  external_data {
    key: "length"
    value: "40960"
  }
  data_location: EXTERNAL
}

The external data feature explains why most onnx tools accept a path to the model file: they might need to access external tensors, and tensor locations are always relative to the model path.

The onnx.external_data_helper Python module provides helpful utilities for dealing with external data.

Custom operators

onnx allows the model to define custom operators, also called functions. The specification calls operator namespaces domains and groups operators into operator sets (opsets). An opset is a versioned snapshot of operators from the same domain.

We know enough terminology now to understand the opset_import line at the top of our onnx programs. It pins the exact operator semantics within the model graph.

The textual syntax for custom operators is almost identical to that of a graph definition. Custom operator definitions must appear after the primary model graph and have an attribute section defining their domain and dependencies.

The following example demonstrates a model that defines a custom operator doubling its input.

An onnx model that doubles its input using a custom operator.
<
    ir_version: 7,
    opset_import: ["" : 21, "com.example" : 1]
>

G (float[N] X) => (float[N] Out)
{
    Out = com.example.Double(X)
}

<
    domain: "com.example",
    opset_import: ["": 21]
>
Double (float[N] X) => (float[N] Out) {
    Out = Add(X, X)
}

Nodes might have attributes (values or subgraphs) that modify their behavior. Custom operators can also define attributes.

The following examples introduce a custom Root operator that computes the n-th root of its argument.

An onnx program that defines a custom Root operator with a single attribute, the root index.
<
    ir_version: 7,
    opset_import: ["" : 21, "com.example" : 1]
>

G (float[N] X) => (float[N] Out) {
    Out = com.example.Root<nth = 2>(X)
}

<
    domain: "com.example",
    opset_import: ["": 21]
>
Root <nth: int = 2> (float[N] X) => (float[N] Out) {
    One = Constant<value_float = 1.0>()
    Nth = Constant<value_int = @nth>()
    NthFloat = Cast<to = 1>(Nth)
    E = Div(One, NthFloat)
    Out = Pow(X, E)
}

Subgraphs

onnx supports branching and looping using nested graphs as operator attributes. For example, the If operator accepts a single formal input—the condition—and two required attributes specifying the computation that must happen in then and else branches. Nested graphs can reference values from the outer scope.

There is no better way to demonstrate control flow than to solve FizzBuzz in onnx.

<
    ir_version: 7,
    opset_import: ["" : 21]
>

G (int64 Limit) => (string[N] Out) {
    Zero = Constant<value_int = 0>()
    One = Constant<value_int = 1>()
    Three = Constant<value_int = 3>()
    Five = Constant<value_int = 5>()
    Fifteen = Constant<value_int = 15>()
    Cond = Cast<to = 9>(One)

    Out = Loop (Limit, Cond) <body = Body (int64 I, bool C) => (bool OutC, string Item) {
        X = Add(I, One)
        OutC = Identity(C)

        M15 = Mod(X, Fifteen)
        Z15 = Equal(M15, Zero)
        Item = If (Z15) <
            then_branch = FizzBuzz () => (string R) {
                R = Constant<value_string = "fizzbuzz">()
            },
            else_branch = Other () => (string R) {
                M3 = Mod(X, Three)
                Z3 = Equal(M3, Zero)
                R = If (Z3) <
                    then_branch = Fizz () => (string R) {
                        R = Constant<value_string = "fizz">()
                    },
                    else_branch = Other () => (string R) {
                        M5 = Mod(X, Five)
                        Z5 = Equal(M5, Zero)
                        R = If (Z5) <
                            then_branch = Buzz () => (string R) {
                                R = Constant<value_string = "buzz">()
                            },
                            else_branch = Other => (string R) {
                                R = Cast<to = 8>(X)
                            }
                        >
                    }
                >
            }
        >
    }>
}

Appendix: running ONNX programs

You can use the following Python code snippet to parse and play with the textual format (the uv tool makes it easy: uv run --no-project script.py).

A snippet of Python code that parses onnx textual syntax and runs the model. The model source is in bold.
# /// script
# dependencies = [
#   "onnx~=1.17",
#   "onnxruntime~=1.18",
# ]
# ///
import tempfile

import numpy as np
import numpy.typing as npt
import onnx
import onnx.external_data_helper
import onnx.parser
import onnxruntime as ort


def parse_onnx(text: str) -> onnx.ModelProto:
    model = onnx.parser.parse_model(text)
    onnx.checker.check_model(model)
    return onnx.shape_inference.infer_shapes(model, check_type=True)


def run_onnx(
    model: onnx.ModelProto, inputs: dict[str, npt.NDArray], outputs: list[str]
) -> dict[str, npt.NDArray]:
    onnx.external_data_helper.load_external_data_for_model(model, ".")
    with tempfile.NamedTemporaryFile() as model_file:
        onnx.save_model(model, model_file.name)
        runtime = ort.InferenceSession(model_file.name)
        return runtime.run(outputs, inputs)


print(
    run_onnx(
        parse_onnx("""
    <ir_version: 7, opset_import: ["": 21]>
    Square (float[N] X) => (float[N] Out) {
        Out = Mul(X, X)
    }
    """),
        inputs={"X": np.array([1.0, 2.0, 3.0], dtype=np.float32)},
        outputs=["Out"],
    )
)

Resources