a handle passed as operand #0 and consumed by this operation points to a payload entity more than once
jn-shen opened this issue · comments
What happened?
When trying to compile mlir file exported from mixed percision LLama2, getting the below error:
root@aiinfra-C9X299-PGF:/home/admin/iree-dist# ./bin/iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_80 --mlir-print-op-on-diagnostic=false ../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir -o llama.vmfb
failed to translate executables
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: a handle passed as operand #0 and consumed by this operation points to a payload entity more than once
%37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
%1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1015:11: note: repeated target op
%27 = torch.prims.convert_element_type %5, %int6_30 : !torch.vtensor<[1,?,4096],f16>, !torch.int -> !torch.vtensor<[1,?,4096],f32>
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: 'builtin.module' op failed to run transform dialect passes
%37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
%1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:1033:11: error: failed to run translation of source executable to target executable for backend #hal.executable.target<"cuda", "cuda-nvptx-fb", {iree.gpu.target = #iree_gpu.target<arch = "sm_80", features = "+ptx76", wgp = <compute = fp64|fp32|fp16|int64|int32|int16|int8, storage = b64|b32|b16|b8, subgroup = shuffle|arithmetic, dot = dp4xi8toi32, mma = [], subgroup_size_choices = [32], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 166912>>}>
%37 = torch.aten.mul.Tensor %36, %35 : !torch.vtensor<[4096],f16>, !torch.vtensor<[1,?,4096],f16> -> !torch.vtensor<[1,?,4096],f16>
^
../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir:360:13: note: called from
%1:65 = call @initialize(%0) : (!torch.vtensor<[1,?],si64>) -> (!torch.vtensor<[1,1],si64>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>, !torch.vtensor<[1,?,32,128],f16>)
Steps to reproduce your issue
- Download LLama2 model and export as mlir by executing a python script : models/turbine_models/custom_models/stateless_llama.py
- Run:
iree-compile --iree-hal-target-backends=cuda --iree-hal-cuda-llvm-target-arch=sm_80 --mlir-print-op-on-diagnostic=false ../SHARK-Turbine/models/Llama_2_7b_chat_hf_fp16.mlir -o llama.vmfb
What component(s) does this issue relate to?
MLIR, Compiler
Version information
Additional context
No response