[BUG] The last n-batches in the log always show 0.00% accuracy

Question

[BUG] The last n-batches in the log always show 0.00% accuracy

shunmian opened this issue 7 months ago · comments

Hey, first thanks for the fantastic code!

Describe the bug

The last n-batches in the log always show 0.00% accuracy despite of how many epochs has been run.

                     Test: [8600/8722]  Time: 0.09  Loss:  0.21 (0.243)  Acc@1: 100.00 (95.60)  Acc@5: 100.00 (99.83)
                     Test: [8650/8722]  Time: 0.10  Loss:  2.81 (0.247)  Acc@1: 12.50 (95.42)  Acc@5: 100.00 (99.83)
                     Test: [8700/8722]  Time: 0.09  Loss:  0.11 (0.253)  Acc@1: 100.00 (95.26)  Acc@5: 100.00 (99.83)
Acc always 0.00 ->   Test: [8722/8722]  Time: 0.03  Loss:  5.06 (0.254)  Acc@1:  0.00 (95.24)  Acc@5: 50.00 (99.83)

To Reproduce
Steps to reproduce the behavior:

./distributed_train.sh 1 "/home/pytorch-image-models/Datasets/1" --model timm/maxxvitv2_rmlp_base_rw_384.sw_in12k_ft_in1k --lr 0.0005 --warmup-epochs 0 --epochs 50 --weight-decay 1e-4 --sched cosine --scale 0.8 1 --aa rand-m1-n1-mstd0.01-mmax5 -b 24 -j 6 --amp --dist-bn reduce --num-classes 500 --pretrained --class-map  "/home/pytorch-image-models/Datasets1/class.txt" --input-size 3 384 384

Expected behavior
last n-batches should be trained to produce Acc highter than 0.00

Desktop (please complete the following information):

OS: ubunntu 20.04
timm: 0.8.19dev0
PyTorch version: 1.12.1+cu113

Ross Wightman · Answer 1 · Sat Nov 25 2023 01:59:31 GMT+0800 (China Standard Time)

@shunmian very unlikely there is a bug with the code/scripts, you should try to shuffle your validation set to see what happens, I expect samples model can't predict will be spread out and you won't see it mostly lumped in last batch. There are clearly other trouble batches

Ross Wightman · Answer 2 · Sat Nov 25 2023 02:44:44 GMT+0800 (China Standard Time)

the fact that you are getting either close to 100 or 0 for batches suggests your dataset is imbalanced, which means accuracy is a poor metric, timms train scripts are biased towards imagenet style pretrain which is fairly balanced. You'd be better of changing the scripts to use some sort of f-score.