[URGENT] Eval results are much lower than what's reported
encounter1997 opened this issue · comments
Hi, thanks for the excellent work!
I follow the instructions in README to evaluate the models provided in your repo. However, the AP I got for yolos_ti .pth, yolos_s_200_pre.pth, yolos_s_300_pre.pth, yolos_s_dWr.pth, and yolos_base.pth are 28.7, 12.5, 12.7, 13.2, and 13.8, respectively. While yolos_ti.pth matches the performance in your paper and log, other four models are significantly lower than what's expected.
Any idea why this would happen? Thanks in advance!
For example, when evaluating the base model, I ran
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path ../data/coco --batch_size 2 --backbone_name base --eval --eval_size 800 --init_pe_size 800 1344 --mid_pe_size 800 1344 --resume ../trained_weights/yolos/yolos_base.pth
and was expected to obtain a 42.0 AP performance, as shown in your paper and log. However, the result is only 13.8 AP.
The complete evaluation output is shown below.
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
| distributed init (rank 0): env://
| distributed init (rank 2): env://
| distributed init (rank 3): env://
| distributed init (rank 1): env://
| distributed init (rank 6): env://
| distributed init (rank 5): env://
| distributed init (rank 7): env://
| distributed init (rank 4): env://
Namespace(backbone_name='base', batch_size=2, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='../data/coco', dataset_file='coco', decay_rate=0.1, det_token_num=100, device='cuda', dice_loss_coef=1, dist_backend='nccl', dist_url='env://', distributed=True, eos_coef=0.1, epochs=150, eval=True, eval_size=800, giou_loss_coef=2, gpu=0, init_pe_size=[800, 1344], lr=0.0001, lr_backbone=1e-05, lr_drop=100, mid_pe_size=[800, 1344], min_lr=1e-07, num_workers=2, output_dir='', pre_trained='', rank=0, remove_difficult=False, resume='../trained_weights/yolos/yolos_base.pth', sched='warmupcos', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, use_checkpoint=False, warmup_epochs=0, warmup_lr=1e-06, weight_decay=0.0001, world_size=8)
Has mid pe
number of params: 127798368
loading annotations into memory...
Done (t=23.52s)
creating index...
index created!
800
loading annotations into memory...
Done (t=3.00s)
creating index...
index created!
Test: [ 0/313] eta: 0:39:39 class_error: 29.21 loss: 2.1542 (2.1542) loss_bbox: 0.4245 (0.4245) loss_ce: 0.7761 (0.7761) loss_giou: 0.9535 (0.9535) cardinality_error_unscaled: 5.3750 (5.3750) class_error_unscaled: 29.2100 (29.2100) loss_bbox_unscaled: 0.0849 (0.0849) loss_ce_unscaled: 0.7761 (0.7761) loss_giou_unscaled: 0.4768 (0.4768) time: 7.6030 data: 0.5298 max mem: 3963
Test: [256/313] eta: 0:00:26 class_error: 17.22 loss: 2.5668 (2.6435) loss_bbox: 0.5639 (0.5792) loss_ce: 0.8598 (0.8386) loss_giou: 1.1904 (1.2257) cardinality_error_unscaled: 3.8750 (4.2398) class_error_unscaled: 28.7817 (28.6160) loss_bbox_unscaled: 0.1128 (0.1158) loss_ce_unscaled: 0.8598 (0.8386) loss_giou_unscaled: 0.5952 (0.6129) time: 0.4406 data: 0.0137 max mem: 10417
Test: [312/313] eta: 0:00:00 class_error: 16.29 loss: 2.8745 (2.6626) loss_bbox: 0.5974 (0.5833) loss_ce: 0.8791 (0.8461) loss_giou: 1.3012 (1.2332) cardinality_error_unscaled: 3.8750 (4.2370) class_error_unscaled: 26.2946 (28.7748) loss_bbox_unscaled: 0.1195 (0.1167) loss_ce_unscaled: 0.8791 (0.8461) loss_giou_unscaled: 0.6506 (0.6166) time: 0.4251 data: 0.0134 max mem: 10417
Test: Total time: 0:02:25 (0.4663 s / it)
Averaged stats: class_error: 16.29 loss: 2.8745 (2.6626) loss_bbox: 0.5974 (0.5833) loss_ce: 0.8791 (0.8461) loss_giou: 1.3012 (1.2332) cardinality_error_unscaled: 3.8750 (4.2370) class_error_unscaled: 26.2946 (28.7748) loss_bbox_unscaled: 0.1195 (0.1167) loss_ce_unscaled: 0.8791 (0.8461) loss_giou_unscaled: 0.6506 (0.6166)
Accumulating evaluation results...
DONE (t=15.78s).
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.13810
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.26766
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.11832
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.05146
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.13066
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.23324
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.18115
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.29001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.31740
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.12520
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.31154
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.49446
Hi~@encounter1997, thanks for your interest in YOLOS and thanks for pointing out this issue :)
The codebase of YOLOS is built upon DETR's codebase, so there is a "bug" inherit from DETR: you need to set the num_GPU
and batchsize_per_GPU
during evaluation the same as during training. E.g.
, the num_GPU = 8
& batchsize_per_GPU = 1
for YOLOS-Small
& YOLOS-Base
.
It seems that you set batchsize_per_GPU = 2
during evaluation, which results in AP degeneration.
Try
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py --coco_path /path/to/coco --batch_size 1 --backbone_name small --eval --eval_size 800 --init_pe_size 512 864 --mid_pe_size 512 864 --resume /path/to/YOLOS-Small
to reproduce YOLOS-Small
AP, which should be 36.1
.
Thanks for your timely reply! I followed your advice and the problem was solved~