[CUDA] cuMemPrefetchAsync Error
JiaweiRen2022 opened this issue · comments
What happened?
I want to compile and run an example of matrix multiplication.
It compiles successfully in version A (commit id :5abc05fd23efb109a2bf0170f47fd73cd01e2dad
) , but I get an error when running it. The error message says it is a arguments parsing error. as follows :
iree/runtime/src/iree/hal/drivers/cuda/cuda_allocator.c:339: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument; parsing value '40x60x90xf32=1'
This error is not present when running it on a newer version(commit id : 65817331c95a48ac3f7d972667f4bd2c35b8f61e
).
I debugged and found that the error occurred during the initialization of the gpu memory cuMemPrefetchAsync
.
However, I couldn't find the exact change that caused this error to be fixed.
Who can help me locate the position of this update? (According to the commit ID, it seems that the updates were made between April 17th and July 11th.)
Steps to reproduce your issue
Input Test Case:
module attributes {torch.debug_module_name = "matmul"} {
func.func @forward(%arg0: tensor<40x60x90xf32>) -> tensor<40x60x80xf32> {
%0 = "tosa.const"() {value = dense<2.0> : tensor<40x90x80xf32>} : () -> tensor<40x90x80xf32>
%1 = "tosa.matmul"(%arg0, %0) : (tensor<40x60x90xf32>, tensor<40x90x80xf32>) -> tensor<40x60x80xf32>
return %1 : tensor<40x60x80xf32>
}
}
old version cmd :
./iree-compile test.mlir --iree-input-type=tosa --iree-hal-target-backends=cuda -o a.vmfb
./iree-run-module --module=a.vmfb --device=cuda --function=forward --input="40x60x90xf32=1"
old version result :
iree/runtime/src/iree/hal/drivers/cuda/cuda_allocator.c:339: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument; parsing value '40x60x90xf32=1'
new version cmd:
./iree-compile test.mlir --iree-hal-target-backends=cuda -o a.vmfb
iree-run-module --module=a.vmfb --device=cuda --function=forward --input="40x60x90xf32=1"
new version result:
EXEC @forward
result[0]: hal.buffer_view
40x60x80xf32=[[180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180 180][180 180 180 180 180 18 .......
What component(s) does this issue relate to?
Runtime
Version information
old version commit id: 5abc05f
new version commit id: 6581733
Additional context
No response