iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[spirv] Incorrect int4 store results

hanhanW opened this issue · comments

We have sub-type emulation working on CUDA and CPU backends. We are not able to enable the test on vulkan-spirv side because it is generating incorrect result. See https://github.com/openxla/iree/actions/runs/7053847806/job/19202219597

To repro:

Compile the below input: iree-compile --output-format=vm-bytecode --iree-hal-target-backends=vulkan-spirv ~/repro.mlir -o /tmp/a.vmfb.

Run the module: iree-check-module --device=vulkan --module=/tmp/a.vmfb --function=f32_to_i4_1d

func.func @f32_to_i4_1d() {
  %input = util.unfoldable_constant dense<[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]> : tensor<8xf32>
  %init0 = tensor.empty() : tensor<8xi4>
  %res = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]}
    ins(%input : tensor<8xf32>) outs(%init0 : tensor<8xi4>) {
  ^bb0(%in: f32, %out: i4):
    %2 = arith.fptoui %in : f32 to i32
    %3 = arith.trunci %2 : i32 to i4
    linalg.yield %3 : i4
  } -> tensor<8xi4>

  // TODO(#14996): Remove the signed extention and directly check with i4 types.
  %blocker = util.optimization_barrier %res : tensor<8xi4>
  %init1 = tensor.empty() : tensor<8xi8>
  %exti8 = linalg.generic {indexing_maps = [affine_map<(d0) -> (d0)>, affine_map<(d0) -> (d0)>], iterator_types = ["parallel"]}
    ins(%blocker : tensor<8xi4>) outs(%init1 : tensor<8xi8>) {
  ^bb0(%in: i4, %out: i8):
    %2 = arith.extsi %in : i4 to i8
    linalg.yield %2 : i8
  } -> tensor<8xi8>

  check.expect_eq_const(%exti8, dense<[0, 1, 2, 3, 4, 5, 6, 7]> : tensor<8xi8>) : tensor<8xi8>
  return
}