tracel-ai / burn

Describe the bug
nodes with input tensors of category initializer fail to import due to initializer tensor names not being in scope

To Reproduce
import a minimal onnx graph /w initializer tensor, e.g.

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, x):
        x = F.interpolate(x, size=(2, 2), mode='bilinear', align_corners=True)
        return x

Expected behavior
burn-import generated model contains tensor initializers, e.g. _Concat_output_0 is defined in the scope

Actual behavior
burn import fails with _Concat_output_0 is not in scope, panic at this line:

burn/crates/burn-import/src/burn/scope.rs

Line 53 in 71bd5ef

panic!("No variable with name {}", tensor.name);

Desktop (please complete the following information):

OS: [Windows]

Additional context
this occurs in resize, gather, mul_scalar, and sub_scalar node imports as well (with initializer tensors)

@skewballfox, do you think this is related to #1857?

was this encountered with the version of burn last published in crates.io or on main?

also, I'm a bit confused. is /Concat_output_0 both a node output and an initializer?

this is encountered on main.

here is a more minimal reproducible example which clarifies the /Concat_output_0 initializer/output ambiguity:

python

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as an initializer
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_initializer = numpy_helper.from_array(sizes, name="sizes")

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[sizes_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

the same scope import error happens with constant input tensors too, e.g.

python

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[sizes_const_node, resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

@antimora I think this might not be related to lifting intializers or constants
here's some information from the debug print of resize.onnx prior to any editing:

model_proto: ModelProto {
...
                    NodeProto {
                        input: [
                            "input_tensor",
                            "",
                            "",
                            "sizes",
                        ],
                        output: [
                            "output",
                        ],
                        name: "resize_node",
...

sizes is in the list of initializers.

the two fields with "" are roi and scales. the two solutions that come to mind is create a dummy entry for empty strings, or rework the api for the functions that might have empty strings as inputs to create a default or None input arg.

a minimal gather import with Initializer indices also has a scope panic. gather imports correctly with Constant indices.

I don't think gather has the same empty input names as resize: https://onnx.ai/onnx/operators/onnx__Gather.html

broken initializer gather python

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as an initializer
    indices = np.array([0, 2], dtype=np.int64)
    indices_initializer = helper.make_tensor(
        name="indices",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[indices_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

working constant gather python

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as a constant node
    indices = np.array([0, 2], dtype=np.int64)
    indices_const_tensor = helper.make_tensor(
        name="indices_const_tensor",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    indices_const_node = helper.make_node(
        "Constant",
        name="indices_const_node",
        inputs=[],
        outputs=["indices"],
        value=indices_const_tensor,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[indices_const_node, gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

When we initialize nodes from the proto data all input arguments (which are simply strings, the original names of the arguments) pass through this function. All that get stored in the inputs is the original graph inputs and the node outputs after it has been processed.

I was sort of wrong in my initial assumption: I assumed using an empty string for a HashMap key wouldn't work, but I just tried it on the rust playground and its allowed. I think we still need to check for empty string names or some placeholder to indicate to use a default value though.

Honestly I'm kind of stumped. I'm going to recreate the gather.onnx locally tomorrow.

I thought the issue with the initializers might be that the original rename input function renamed arguments to empty string, so maybe there was some "if empty inline tensor" check happening, though when I tried that with size on resize.onnx, I ran into a different error: Tensor of Kind Int with dim shape None was passed with empty name .

whatever's happening only with initializers, but not other generated inputs, and not when those initializers are graph inputs.

the same scope import error happens with constant input tensors too, e.g.

...

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )
...

Looking at how that constant is defined, that will be lifted. So at the time scope is checked it will only exist as an input for resize.

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

This wouldn't be tied to a constant node because that constant is lifted, and the parent node is deleted.

I was more wondering about the input arguments to nodes that aren't graph inputs or node outputs, so lifted constants (where the original node is deleted like above), initializers, generated arguments.

The reason I ask is because this seems to be triggering when the argument only exist as an input, which is weird because scope is about determining ownership, but these args only exist in one place

adding this line to consume seems to fix the initializer import:
self.inputs.extend(self.initializers.values().cloned());

burn/crates/burn-import/src/onnx/from_onnx.rs

Line 163 in 71bd5ef

fn consume(mut self) -> (Vec<Node>, Vec<Argument>, Vec<Argument>) {

OnnxGraph::inputs doesn't capture the initializers.

this needs to contain the initializer tensor names when passed to register_input_output:

burn/crates/burn-import/src/onnx/to_burn.rs

Line 328 in 71bd5ef

let input_names = self

edit: nvm, this results in the initializer being a parameter to forward:
pub fn forward(&self, input1: Tensor<B, 4>, indices: Tensor<B, 1, Int>) -> Tensor<B, 4>

the error seems near this pathway, e.g. node input initializers are not being managed properly by BurnGraph:

burn/crates/burn-import/src/burn/graph.rs

Line 40 in 71bd5ef

pub struct BurnGraph<PS: PrecisionSettings> {

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

burn/crates/burn-core/src/nn/initializer.rs

Line 79 in 71bd5ef

pub fn init<B: Backend, const D: usize, S: Into<Shape<D>>>(

let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);

mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

burn/crates/burn-core/src/nn/initializer.rs

Line 79 in 71bd5ef

pub fn init<B: Backend, const D: usize, S: Into<Shape<D>>>(
let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);
mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

CC @nathanielsimard and @laggui . Should we make the initializer more generic for other tensors?

here is some work towards generic initializers, it would need more careful handling:
https://github.com/mosure/burn/blob/8122e8b674fcad727c439d8c235b59fc7d7062e1/crates/burn-core/src/nn/initializer.rs#L79

so I've been looking at output for cargo -r -- graph_path ./out for both the broken graphs and the graphs that I do know work.

Something interesting about the HashMap variables being checked is that it's only being added to during register_use
the way register_future_use is being called only on inputs, after register_use is called on graph inputs and node outputs.
These functions are being filtered to only look at the inputs that are tensors, probably because rust allows implicit copying of basic numeric types.
This doesn't trigger on reshape, but that's because it looks like reshape doesn't store it's rhs as a tensor. it's just a vector:

Reshape(
    ReshapeNode {
        input: TensorType {
            name: Ident(
                unsqueeze1_out1,
            ),
            dim: 5,
            kind: Float,
            shape: None,
        },
        output: TensorType {
            name: Ident(
                unsqueeze2_out1,
            ),
            dim: 6,
            kind: Float,
            shape: None,
        },
        shape: [
            1,
            1,
            3,
            4,
            5,
            1,
        ],
    },
)

I suspect that this is going to panic anytime we have a initializer or lifted constant that's typed as a tensor by burn graph. so far all the situations where I know constants are lifted in an op covered by test cases, the lifted values weren't tensors by this point. If you guys can think of a test case that has a lifted tensor for an op that takes a tensor argument, please let me know.

burn-import initializer tensor not added to scope