SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how to build two NAT cuda

XiaoyuShi97 opened this issue · comments

Hi, I hope to build two NAT cuda with different head dimension. But I find that the second one always overwrites the first one. How can I modify setup.py to distinguish them? Are there other codes to be changed? Thx!

Hello and thank you for your interest.

There's actually two options: the better option is to just make dimensions dynamic, which may require you to edit more. You'd just have to get the per head dim from the tensor shapes, like we do for heads, batch size, height, width, and the like. You'd also have to pass those to the kernels and modify the args.

The option you're going for, which is to separate the two different formats, would require you to have two different versions. For that, we'd recommend modifying the kernel names, and all other method names, both in the CPP and CU files.
So for instance, you'd keep the original unchanged for DIM=32, and you'd make a copy of the two CPP and two CU files and append 64 to the file names for instance (natten.....64.cpp/.cu). Then you'd have to duplicate everything else for them as well basically, the autograd functions in natten.py, the imports in that same file, and also the tests in gradcheck.py.

If you are using ninja, which is the default in this repository, setup.py is not going to be called in any way, because upon importing in natten.py, ninja is going to compile (if necessary), and not setup.py.

I hope this answers your question. Please let me know if you need more help.

This perfectly answers my question. Thx a lot!