pykeio / ort

Fast ML inference & training for Rust with ONNX Runtime

Home Page:https://ort.pyke.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wrong and non-deterministic results with constant model

bedapisl opened this issue · comments

Hello,
I am getting wrong outputs when running a simple constant ONNX model. The model should always return 0.5, but the first few outputs seems to be corrupted - not 0.5 and even non-deterministic. Code for prediction:

use ndarray::{ArrayBase, CowRepr, Dim, IxDynImpl, Array};
use ort::{Environment, ExecutionProvider, SessionBuilder, Value};


fn main() {

    let environment = Environment::builder()
            .with_name("Detection")
            .with_execution_providers([ExecutionProvider::CPU(
                Default::default(),
            )])
            .build().expect("Failed to create ONNX Runtime environment")
            .into_arc();

    let session = SessionBuilder::new(&environment)
        .expect("Failed to create a session")
        .with_model_from_file("constant_model.onnx")
        .expect("Failed to load ONNX model");

    let input_data_prepared: ArrayBase<CowRepr<'_, f32>, Dim<IxDynImpl>> = 
        Array::zeros((1, 3, 192, 256)).into_dyn().into();

    let input_tensor = Value::from_array(session.allocator(), &input_data_prepared)
        .expect("Failed to convert ndarray to Tensor");

    let model_output = session
        .run(vec![input_tensor])
        .expect("ONNX inference failed")[0]
        .try_extract::<f32>()
        .expect("ONNX result extraction failed");

    for x in 0..6 {
        for y in 0..3 {
            println!("{:?}", model_output.view()[[0 as usize, x as usize, y as usize]]);
        }
    }
    println!("{:?}", model_output.view().shape());
}

Code for generating the constant_model.onnx:

import torch
import numpy as np

class ConstantModel(torch.nn.Module):
    def __init__(self):
        super(ConstantModel, self).__init__()

    def forward(self, x):
        data = np.zeros((1, 3, 6), dtype=np.float32)
        data += 0.5
        data = np.transpose(data, (0, 2, 1))
        return torch.tensor(data)


model = ConstantModel()

input_data = torch.rand((1, 3, 192, 256), dtype=torch.float32)

exported = torch.onnx.dynamo_export(model, input_data)

exported.save("constant_model.onnx")

Example output I am getting - the first few numbers are different in each run.

20032080000000.0
7e-45
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
0.5
[1, 6, 3]

I was able to reproduce it on 2 machines - both Debian 12, one has rustc 1.73.0, other has rustc 1.71.1.

I am adding archive with all the files needed for reproduction:

onnx_test.zip

I found out that removing the execution provider definition, specifically this code:

.with_execution_providers([ExecutionProvider::CPU(
                Default::default(),
            )])

solves the problem.

I would still keep this open, because I think the code with execution provider is valid and should work.

This issue was first discovered here: #108

Indeed registering the CPU EP causes a heap corruption. This is fixed in recent versions.