SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Details of Training

achen46 opened this issue · comments

Hi @alihassanijr , thanks for the great repository. For reproducing your results, how many nodes were used to train these models ? I see that config files are provided for each model, but wonder if any changes are needed if trained on multi-node.

Hi and thank you for your interest.
We tried both single and multi node settings, but did not notice any significant difference between the two.
If you want to train on multiple nodes, you'd have to divide the batch size to keep it consistent.
All of the models we've released were trained with a batch size of 1024, which is 128 samples per GPU, hence the 128 in the config files. If you increase the number of GPUs, you'd have to decrease batch size (i.e. 2 nodes with 16 GPUs -> batch size 64).

I hope this clarifies things.

Thanks for clarification. I will try to reproduce your numbers. Great work !