ONNX introduction
✏ 2025-02-17 ✂ 2025-02-17- The anatomy of an ONNX model
- External data
- Custom operators
- Subgraphs
- Appendix: running ONNX programs
- Resources
The machine learning ecosystem is a zoo of competing frameworks. Each framework offers its custom model representation format, sometimes more than one:
- Pytorch represents models using Python code that you can download from the Hugging Face model repository. This format is flexible, but it is a disaster from the security point of view because it allows for arbitrary code execution.
- TensorFlow supports multiple formats, including SavedModel. The latest version is the Keras model format, which is a zip-file with the model components and metadata. TensorFlow formats are underspecified, implementation-defined, and generally insecure.
onnx is an open format for representing machine learning models; it aims to unify the ecosystem. It doesn’t support advanced use cases, such as model training checkpoints, but it is simple, secure (no arbitrary code execution), and easy to build on.
This article is an introduction to the onnx file format I wish I had when embarked on my ml model transformation journey at Gensyn.
The anatomy of an ONNX model
The primary way to create an onnx file is to export it from a more popular model format,
such as Pytorch (using the torch.onnx
module)
or TensorFlow (using third-party packages, such as tf2onnx
).
Conceptually, an onnx file describes a model
containing a computation graph.
The model acts as a program;
it establishes the context through imports and helper function definitions.
The primary graph is the main
function;
it defines a pure function that maps model inputs to outputs.
This description is abstract, so let’s get practical and inspect the textual representation of a tiny perceptron model.<
ir_version: 7,
opset_import: ["" : 21]
>
G (float[N, 3] X) => (float[N, 2] Out)
<
float[3, 4] W1 = {
0.01, 0.02, 0.03, 0.04,
0.05, 0.06, 0.07, 0.08,
0.09, 0.10, 0.11, 0.12
},
float[4, 2] W2 = {
0.11, 0.12,
0.13, 0.14,
0.15, 0.16,
0.17, 0.18
},
float[4] B1 = { 0.001, 0.002, 0.003, 0.004 },
float[2] B2 = { 0.01, 0.02 }
>
{
Y1 = Gemm(X, W1, B1)
Y2 = Relu(Y1)
Z = Gemm(Y2, W2, B2)
Out = Sigmoid(Z)
}
- Lines 1–4 specify model attributes: the onnx format version and the versions of operator sets (opsets) this model uses. Refer to the Custom operators section for an explanation of the opset concept.
-
Line 6 defines the primary model graph named
G
. The graph takes one input namedX
(an N × 3 matrix of floats, where N is deduced from the input shape) and produces one output namedOut
(an N × 2 matrix of floats). - Lines 7–21 define the graph initializers corresponding to the model weights. When you train an onnx model, you optimize the initializer values.
- Lines 22–27 define the graph body, where each line (except for curly brackets) is an operator application. Each operator is a pure function mapping zero or more inputs to one or more outputs.
An onnx file is a model encoded as a Protocol Buffers message.
It’s insightful to inspect this low-level representation of our example model.ir_version: 7
opset_import {
domain: ""
version: 21
}
graph {
name: "G"
input {
name: "X"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_param: "N" }
dim { dim_value: 3 }
}
}
}
}
output {
name: "Out"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_param: "N" }
dim { dim_value: 2 }
}
}
}
}
initializer {
name: "W1"
dims: 3
dims: 4
data_type: 1
float_data: 0.01
…
}
initializer {
name: "W2"
dims: 4
dims: 2
data_type: 1
float_data: 0.11
…
}
initializer {
name: "B1"
dims: 4
data_type: 1
float_data: 0.001
…
}
initializer {
name: "B2"
dims: 2
data_type: 1
float_data: 0.01
…
}
node {
input: "X"
input: "W1"
input: "B1"
output: "Y1"
op_type: "Gemm"
domain: ""
}
node {
input: "Y1"
output: "Y2"
op_type: "Relu"
domain: ""
}
node {
input: "Y2"
input: "W2"
input: "B2"
output: "Z"
op_type: "Gemm"
domain: ""
}
node {
input: "Z"
output: "Out"
op_type: "Sigmoid"
domain: ""
}
value_info {
name: "Y1"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_param: "N" }
dim { dim_value: 4 }
}
}
}
}
value_info {
name: "Y2"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_param: "N" }
dim { dim_value: 4 }
}
}
}
}
value_info {
name: "Z"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_param: "N" }
dim { dim_value: 2 }
}
}
}
}
}
In this low-level representation, the graph has the following components:
- A list of inputs that the caller must provide to compute the outputs.
- A list of outputs that the graph computes.
- A list of initializers specifying the model weights.
- A list of nodes sorted topologically: a node can refer only to the graph inputs, initializers, and values that preceding nodes produce. Nodes refer to their inputs and outputs by names. Nodes might also have names, but all the nodes in our examples are unnamed.
- A list of
value_info
entries providing types for intermediate values.
Graph inputs, outputs, and initializers must have explicit types (in the example, these are float tensors of various shapes);
internal values might not have the corresponding value_info
entries.
External data
Since Protocol Buffers restrict the file size to two gigabytes, onnx alone cannot encode large models For example, Llama 3.1 8b needs at least 16 gigabytes to encode its model weights using the bf16 type. . To address this problem, onnx allows storing any tensor, such as an initializer or an operator attribute, in external files.
An external data reference specifies the location of the tensor file relative to the model file and the offset and the length within the file
The offset
and the length
fields are optional.
Missing offset means from the beginning of the file,
missing length—until the end of the file.
.
The tensor is assumed to be in the flat row-major, little-endian format.
Let’s make our tiny perceptron model larger and move its weight into a separate file.location
path is relative to the model file.
<
ir_version: 7,
opset_import: ["" : 21]
>
G (float[N, 64] X) => (float[N, 10] Out)
<
float[64, 1024] W1 = [
"location": "weights.bin", "offset": "0", "length": "262144"
],
float[1024, 10] W2 = [
"location": "weights.bin", "offset": "262144", "length": "40960"
],
float[1024] B1 = [
"location": "weights.bin", "offset": "303104", "length": "4096"
],
float[10] B2 = [
"location": "weights.bin", "offset": "307200", "length": "40"
]
>
{
Y1 = Gemm(X, W1, B1)
Y2 = Relu(Y1)
Z = Gemm(Y2, W2, B2)
Out = Sigmoid(Z)
}
The textual syntax for external tensors desugars into a protobuf message
with the data_location
field set to EXTERNAL
,
and repeated external_data
fields indicating the data location.initializer {
name: "W2"
dims: 1024
dims: 10
data_type: 1
external_data {
key: "location"
value: "weights.bin"
}
external_data {
key: "offset"
value: "262144"
}
external_data {
key: "length"
value: "40960"
}
data_location: EXTERNAL
}
The external data feature explains why most onnx tools accept a path to the model file: they might need to access external tensors, and tensor locations are always relative to the model path.
The onnx.external_data_helper
Python module provides helpful utilities for dealing with external data.
Custom operators
onnx allows the model to define custom operators, also called functions. The specification calls operator namespaces domains and groups operators into operator sets (opsets). An opset is a versioned snapshot of operators from the same domain.
We know enough terminology now to understand the opset_import
line at the top of our onnx programs.
It pins the exact operator semantics within the model graph.
The textual syntax for custom operators is almost identical to that of a graph definition. Custom operator definitions must appear after the primary model graph and have an attribute section defining their domain and dependencies.
The following example demonstrates a model that defines a custom operator doubling its input.<
ir_version: 7,
opset_import: ["" : 21, "com.example" : 1]
>
G (float[N] X) => (float[N] Out)
{
Out = com.example.Double(X)
}
<
domain: "com.example",
opset_import: ["": 21]
>
Double (float[N] X) => (float[N] Out) {
Out = Add(X, X)
}
- Line 3 imports two operator sets: the standard set and the custom set we define later in the program.
- Lines 11–17 define a custom operator that doubles its input. Note the difference between lines 1–4 that set the top-level model attributes and lines 11–14 that set attributes for the function that follows. The operator attributes specify its domain and the opsets required for the implementation.
Nodes might have attributes (values or subgraphs) that modify their behavior. Custom operators can also define attributes.
The following examples introduce a custom Root
operator that computes the n-th root of its argument.Root
operator with a single attribute, the root index.
<
ir_version: 7,
opset_import: ["" : 21, "com.example" : 1]
>
G (float[N] X) => (float[N] Out) {
Out = com.example.Root<nth = 2>(X)
}
<
domain: "com.example",
opset_import: ["": 21]
>
Root <nth: int = 2> (float[N] X) => (float[N] Out) {
One = Constant<value_float = 1.0>()
Nth = Constant<value_int = @nth>()
NthFloat = Cast<to = 1>(Nth)
E = Div(One, NthFloat)
Out = Pow(X, E)
}
-
Line 7 invokes our custom operator and explicitly specifies the
nth
attribute value. -
Line 14 defines the
Root
operator with an attribute section. - Line 16 converts an attribute value into a constant graph node. The following line casts the integer value into a floating-point number.
Subgraphs
onnx supports branching and looping using nested graphs as operator attributes.
For example, the If
operator accepts a single formal input—the condition—and two required attributes specifying the computation
that must happen in then
and else
branches.
Nested graphs can reference values from the outer scope.
There is no better way to demonstrate control flow than to solve FizzBuzz in onnx.
<
ir_version: 7,
opset_import: ["" : 21]
>
G (int64 Limit) => (string[N] Out) {
Zero = Constant<value_int = 0>()
One = Constant<value_int = 1>()
Three = Constant<value_int = 3>()
Five = Constant<value_int = 5>()
Fifteen = Constant<value_int = 15>()
Cond = Cast<to = 9>(One)
Out = Loop (Limit, Cond) <body = Body (int64 I, bool C) => (bool OutC, string Item) {
X = Add(I, One)
OutC = Identity(C)
M15 = Mod(X, Fifteen)
Z15 = Equal(M15, Zero)
Item = If (Z15) <
then_branch = FizzBuzz () => (string R) {
R = Constant<value_string = "fizzbuzz">()
},
else_branch = Other () => (string R) {
M3 = Mod(X, Three)
Z3 = Equal(M3, Zero)
R = If (Z3) <
then_branch = Fizz () => (string R) {
R = Constant<value_string = "fizz">()
},
else_branch = Other () => (string R) {
M5 = Mod(X, Five)
Z5 = Equal(M5, Zero)
R = If (Z5) <
then_branch = Buzz () => (string R) {
R = Constant<value_string = "buzz">()
},
else_branch = Other => (string R) {
R = Cast<to = 8>(X)
}
>
}
>
}
>
}>
}
-
The textual representation doesn’t support raw literals as operator arguments,
so explicitly declare all the constants we’ll use in lines 7–12.
Constant nodes have no inputs and one output, the value they wrap.
The magic constant
9
on line 12 is the boolean type id. -
onnx graphs express pure computation, so all operators must produce a value to be useful.
Thus, onnx control structures resemble functional programming primitives, such as
unfoldr
.The
Loop
operator accepts multiple arguments: the maximum number of iterations, the exit condition (the loop won’t start if that value isfalse
), and a sequence of internal loop variables (we don’t use any in this example). Thebody
graph attribute specifies the variable transformation at each step.The body transforms current iteration number, stop condition, and internal variables into the next exit condition (for early termination), the next values of internal variables, and an output value. The
Loop
operator accumulates all the output values into an output tensor. It returns the final values of intermediate variables and the accumulated outputs. -
Our loop body graph is a sequence of nested
If
operator calls. AnIf
operator accepts a boolean condition value and two graph attributes:then_branch
andelse_branch
. Our conditions check whether the input divides by 15, 5, and 3 and falls back to converting the number to string on line 39 (type id 8 corresponds to the string type). Note how nested graphs can freely access values from their lexical scope.
Appendix: running ONNX programs
You can use the following Python code snippet to parse and play with the textual format
(the uv tool makes it easy: uv run --no-project script.py
).# /// script
# dependencies = [
# "onnx~=1.17",
# "onnxruntime~=1.18",
# ]
# ///
import tempfile
import numpy as np
import numpy.typing as npt
import onnx
import onnx.external_data_helper
import onnx.parser
import onnxruntime as ort
def parse_onnx(text: str) -> onnx.ModelProto:
model = onnx.parser.parse_model(text)
onnx.checker.check_model(model)
return onnx.shape_inference.infer_shapes(model, check_type=True)
def run_onnx(
model: onnx.ModelProto, inputs: dict[str, npt.NDArray], outputs: list[str]
) -> dict[str, npt.NDArray]:
onnx.external_data_helper.load_external_data_for_model(model, ".")
with tempfile.NamedTemporaryFile() as model_file:
onnx.save_model(model, model_file.name)
runtime = ort.InferenceSession(model_file.name)
return runtime.run(outputs, inputs)
print(
run_onnx(
parse_onnx("""
<ir_version: 7, opset_import: ["": 21]>
Square (float[N] X) => (float[N] Out) {
Out = Mul(X, X)
}
"""),
inputs={"X": np.array([1.0, 2.0, 3.0], dtype=np.float32)},
outputs=["Out"],
)
)