Tencent / TPAT

TensorRT Plugin Autogen Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cuda kernel code generated by Ansor‘s search space will use shared memory optimization to auto tuning?

wugoukanle opened this issue · comments

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?

Not just loop tiles, Ansor should be tuning such as unroll、block tile、vectorize and the implementation of the kernel(for many kernels, decide which kernels can be fused). but Ansor still lacks the ability of tensorize, So does not perform well on computationally intensive operators such as gemm and conv.
The above are just some rough instructions. if you want more detailed information, We recommend you to view the source code.
Tune config in TPAT
Tvm Source Code

cuda kernel code from tvm is just auto tuning from for loop tile? what is cuda kernel code tuning arguments in TVM Ansor?