NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unalignment case for conv

mengchihe opened this issue · comments

I find out that I can set AlignmentA/B in gemm to handle the case when shape is unalignment for int4.
But how can I run conv in similar case, such as channel equals to 1 or 3.
Is there any configuration to set global load granularity for conv, thanks.

CONV currently do not support this feature. However, it is pretty easy to add it.

In GEMM, this line splits a 128-bit load into multiple smaller loads. You just need to do the same to the conv mainloop in https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock/implicit_gemm_multistage.h . The underlying iterators in https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/conv/threadblock and several defaultxxx files need some plumbing, too.

We welcome the community to upstream this feature.

okay thanks, I will try to upload a patch

Awesome, just first try to figure out how GEMM works, and then just do the same for CONV. It is not hard.