cuda kernel code generated by Ansor‘s search space will use shared memory optimization to auto tuning？

Question

cuda kernel code generated by Ansor‘s search space will use shared memory optimization to auto tuning？

wugoukanle opened this issue a year ago · comments

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?

QianQiu · Answer 1 · Mon Feb 06 2023 14:43:17 GMT+0800 (China Standard Time)

Not just loop tiles, Ansor should be tuning such as unroll、block tile、vectorize and the implementation of the kernel(for many kernels, decide which kernels can be fused). but Ansor still lacks the ability of tensorize, So does not perform well on computationally intensive operators such as gemm and conv.
The above are just some rough instructions. if you want more detailed information, We recommend you to view the source code.
Tune config in TPAT
Tvm Source Code

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?