ShoufaChen / AdaptFormer

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Home Page:https://arxiv.org/abs/2205.13535

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to solve the problem of loss NAN?

xiaoxiAries opened this issue · comments

Hi,

I reprocess this code on SSv2 datatset. I follow the blr 0.1, and use 2 gpus with batchsize of 7 (the valid total batchsize is 14). But the loss is Nan when epoch is 14. How to solve this problem? Thanks~

Hi,

Which configuration do you use? Full-tuning baseline or AdaptFormer?

Hi,
I follow this configuration:

OMP_NUM_THREADS=1 python3 -m torch.distributed.launch
--nproc_per_node=2
--use_env main_video.py
--finetune /path/to/pre_trained/mae.pyth
--output_dir /path/to/output
--batch_size 7 --epochs 90 --blr 0.1 --weight_decay 0.0 --dist_eval
--data_path /path/to/SSV2 --data_set SSV2
--ffn_adapt

I am sorry I didn't experiment with your specific configuration. Reduce the learning rate and have a try.