iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.

Home Page:http://iree.dev/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

a handle passed as operand #0 and consumed by this operation points to a payload entity more than once

jn-shen opened this issue · comments

What happened?

When trying to compile mlir file exported from mixed percision LLama2, getting the below error:

root@aiinfra-C9X299-PGF:/home/admin/iree-dist# ./bin/iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_80 --mlir-print-op-on-diagnostic=false ../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir -o llama.vmfb
failed to translate executables
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: a handle passed as operand #0 and consumed by this operation points to a payload entity more than once
    %37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
          ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
    %1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)
            ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1015:11: note: repeated target op
    %27 = torch.prims.convert_element_type %5, %int6_30 : !torch.vtensor<[1,?,4096],f16>, !torch.int -> !torch.vtensor<[1,?,4096],f32>
          ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: 'builtin.module' op failed to run transform dialect passes
    %37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
          ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
    %1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)
            ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {iree.gpu.target = #iree_gpu.target<arch = "sm_80", features = "+ptx76", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [], subgroup_size_choices = [32], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 166912>>}>
    %37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
          ^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
    %1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)

Steps to reproduce your issue

  1. Download LLama2 model and export as mlir by executing a python script : models/turbine_models/custom_models/stateless_llama.py
  2. Run:
    iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_80 --mlir-print-op-on-diagnostic=false ../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir -o llama.vmfb

What component(s) does this issue relate to?

MLIR, Compiler

Version information

63a2d14

Additional context

No response