orrzohar / PROB

[CVPR 2023] Official Pytorch code for PROB: Probabilistic Objectness for Open World Object Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Four 3090 cannot get the the authors' results,that why?

Rzx520 opened this issue · comments

          > As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

I used four cards with batch_size = 3,the result is :

{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}

the authors' results is :
U-R:19.4,K-AP:59.5
Why is it that the author's performance cannot be achieved?
@Hatins @orrzohar

Originally posted by @Rzx520 in #26 (comment)

Hi @Rzx520,
When you change optimization hyperparameters, the results will inevitably change. That is true for PROB and nearly all deep learning models.

Luckily, PROB is relatively robust and requires minimal hyperparameter tuning to match our performance, at least on all the systems I have encountered. Specifically with Titan RTX 3090, our results were already reproduced (see Issue #26). On a system of 3090x4, lr_drop needed to be increased to 40 to match our reported results. If you have a different number of GPUs, there may be a better one for your system.

I am happy to help with this process, but to do so, I need to see your training curves.

Best,
Orr

The above result is the result of adjusting lr_drop to 40, so I am quite confused.

Did you use the same number of GPUs is in #26 ?
If not, then if you share your training curves I could try and help you with hyperparameter optimization.

Yes, I also used 4GPUs. Thank you very much. Since I turned off Wandb, I had to retrain to obtain the training curves.This may take a while as the server is being used.

image
image
image

Above is the result of this parameter setting training. @orrzohar

################ Deformable DETR ################
parser.add_argument('--lr', default=2e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
#parser.add_argument('--batch_size', default=3, type=int)
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=35, type=int)
parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
                    help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

Hi @Rzx520,
You are overtraining the model, should reduce the lr_drop to 150k iterations (lr_drop30).
I am concerned that you are using the same system as in #26 but getting different optimization results, I wonder how the two systems differ.
Best,
Orr

I am trying lr_drop=30, I will send it back when the training results are available.And I also wonder how the two systems differ,so I asked some questions in #26 (comment)

Hi @Rzx520,
I see, I do not know Hatins, so I have no way of facilitating communication.
I am very surprised that you both used 4x3090s, but each needed different results.

image
image

Above is the result of this parameter setting training, lr_drop = 30. @orrzohar

################ Deformable DETR ################
parser.add_argument('--lr', default=2e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
#parser.add_argument('--batch_size', default=3, type=int)
parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=35, type=int)
parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

Hi @Rzx520,
I noticed that you used batch_size=2 not batch_size=3 like in Hatins in #26.
Why is that the case? That could be a reason for the U_R50 discrepancy.
A broad note: a general trend I see is that the smaller the batch size, the less training that can be done without hurting U_R50.

I also noticed that Hatins reported simulary poorer results when using batch_size=2.
Best,
Orr

          > As you can see, I got the same results as the @orrzohar show in the paper. I wonder how many cards you used with batch_size = 2. I think if you use a single card, the result may be worse than I got (I used four cards with batch_size = 3) @Rzx520 . By the way, what are your final results? Are they far from the authors' results?

I used four cards with batch_size = 3,the result is :

{"train_lr": 1.999999999999943e-05, "train_class_error": 15.52755644357749, "train_grad_norm": 119.24543388206256, "train_loss": 5.189852057201781, "train_loss_bbox": 0.2700958194790585, "train_loss_bbox_0": 0.29624945830832017, "train_loss_bbox_1": 0.27978440371434526, "train_loss_bbox_2": 0.275065722955665, "train_loss_bbox_3": 0.27241891570675625, "train_loss_bbox_4": 0.27063051075218725, "train_loss_ce": 0.18834440561282928, "train_loss_ce_0": 0.27234036786085974, "train_loss_ce_1": 0.23321395799885028, "train_loss_ce_2": 0.20806531186409408, "train_loss_ce_3": 0.19453731594314128, "train_loss_ce_4": 0.18820172232765492, "train_loss_giou": 0.3351372324140976, "train_loss_giou_0": 0.3679243937037491, "train_loss_giou_1": 0.3483400315024699, "train_loss_giou_2": 0.34171414935044225, "train_loss_giou_3": 0.3379105142249501, "train_loss_giou_4": 0.3368650070453053, "train_loss_obj_ll": 0.02471167313379382, "train_loss_obj_ll_0": 0.034151954339996814, "train_loss_obj_ll_1": 0.03029250531194649, "train_loss_obj_ll_2": 0.0288731191750343, "train_loss_obj_ll_3": 0.028083207809715446, "train_loss_obj_ll_4": 0.026900355121292352, "train_cardinality_error_unscaled": 0.44506890101437985, "train_cardinality_error_0_unscaled": 0.6769398279525907, "train_cardinality_error_1_unscaled": 0.5726976196583499, "train_cardinality_error_2_unscaled": 0.4929900999093851, "train_cardinality_error_3_unscaled": 0.46150593285633223, "train_cardinality_error_4_unscaled": 0.45256225438417086, "train_class_error_unscaled": 15.52755644357749, "train_loss_bbox_unscaled": 0.054019163965779084, "train_loss_bbox_0_unscaled": 0.059249891647616536, "train_loss_bbox_1_unscaled": 0.055956880831476395, "train_loss_bbox_2_unscaled": 0.055013144572493046, "train_loss_bbox_3_unscaled": 0.054483783067331704, "train_loss_bbox_4_unscaled": 0.05412610215448962, "train_loss_ce_unscaled": 0.09417220280641464, "train_loss_ce_0_unscaled": 0.13617018393042987, "train_loss_ce_1_unscaled": 0.11660697899942514, "train_loss_ce_2_unscaled": 0.10403265593204704, "train_loss_ce_3_unscaled": 0.09726865797157064, "train_loss_ce_4_unscaled": 0.09410086116382746, "train_loss_giou_unscaled": 0.1675686162070488, "train_loss_giou_0_unscaled": 0.18396219685187454, "train_loss_giou_1_unscaled": 0.17417001575123495, "train_loss_giou_2_unscaled": 0.17085707467522113, "train_loss_giou_3_unscaled": 0.16895525711247505, "train_loss_giou_4_unscaled": 0.16843250352265265, "train_loss_obj_ll_unscaled": 30.889592197686543, "train_loss_obj_ll_0_unscaled": 42.68994404527915, "train_loss_obj_ll_1_unscaled": 37.86563257517548, "train_loss_obj_ll_2_unscaled": 36.09139981038161, "train_loss_obj_ll_3_unscaled": 35.10401065181873, "train_loss_obj_ll_4_unscaled": 33.62544476769816, "test_metrics": {"WI": 0.05356004827184098, "AOSA": 5220.0, "CK_AP50": 58.3890380859375, "CK_P50": 25.75118307055908, "CK_R50": 71.51227713815234, "K_AP50": 58.3890380859375, "K_P50": 25.75118307055908, "K_R50": 71.51227713815234, "U_AP50": 2.7862398624420166, "U_P50": 0.409358215516747, "U_R50": 16.530874785591767}, "test_coco_eval_bbox": [14.451444625854492, 14.451444625854492, 77.8148193359375, 57.15019607543945, 66.93928527832031, 49.282108306884766, 27.985671997070312, 70.54130554199219, 55.28901290893555, 82.7206039428711, 26.307403564453125, 65.15182495117188, 21.9127197265625, 77.91541290283203, 73.61457061767578, 67.8846206665039, 49.1287841796875, 36.78118896484375, 69.1879653930664, 53.060150146484375, 79.1402359008789, 59.972835540771484, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.7862398624420166], "epoch": 40, "n_parameters": 39742295}

the authors' results is : U-R:19.4,K-AP:59.5 Why is it that the author's performance cannot be achieved? @Hatins @orrzohar

Originally posted by @Rzx520 in #26 (comment)

What I tried at the beginning was batch size=3, and the results are shown above.Setting batch size to 2 is due to the parameter settings of OW-DETR.@orrzohar

I have some gains now, which is that when I set lr to 1e-4, lr_ When lr_drop=35 and batch size=3, there are some gains, but K_ AP only reached 58.3, not 59.4. Can you provide some suggestions?

################ Deformable DETR ################
parser.add_argument('--lr', default=1e-4, type=float)
parser.add_argument('--lr_backbone_names', default=["backbone.0"], type=str, nargs='+')
parser.add_argument('--lr_backbone', default=2e-5, type=float)
parser.add_argument('--lr_linear_proj_names', default=['reference_points', 'sampling_offsets'], type=str, nargs='+')
parser.add_argument('--lr_linear_proj_mult', default=0.1, type=float)
#parser.add_argument('--batch_size', default=5, type=int)
parser.add_argument('--batch_size', default=3, type=int)
#parser.add_argument('--batch_size', default=2, type=int)
parser.add_argument('--weight_decay', default=1e-4, type=float)
parser.add_argument('--epochs', default=51, type=int)
#parser.add_argument('--lr_drop', default=30, type=int)
parser.add_argument('--lr_drop', default=35, type=int)
#parser.add_argument('--lr_drop', default=40, type=int)

parser.add_argument('--lr_drop_epochs', default=None, type=int, nargs='+')
parser.add_argument('--clip_max_norm', default=0.1, type=float,
                    help='gradient clipping max norm')
parser.add_argument('--sgd', action='store_true')

image
image

Hi @Rzx520,
Are you still using 4 x Titan RTX ?
Generally, to get higher K_AP50, you need to train for longer, but the longer you train U_R goes down. The trick is to hit the balance between the two. ?
Looking at your chart, I think you can reduce lr_drop to 30 as the last 5 epochs are saturated before the lr_drop. This will give you 5 additional epochs with the lower learning rate and will hopefully improve the results.

To clarify, to run this experiment you DO NOT need to restart from scratch -- your model should have saved the checkpoint for epoch 30 and then you only need to train for the last 10 epochs after the lr_drop. Just make sure the lr is indeed lowered.

Best,
Orr

I am trying lr_drop=30, I will present the results here.@orrzohar

Hi @Rzx520,
OK great, thank you.
Would you mind confirming what system you are using for future reproducibility on simulary systems?
Best,
Orr

lr_drop=30 and parser.add_argument('--eval_every', default=1, type=int) @orrzohar This effect is not as good as lr_drop = 35.

image

image

Linux ubuntu 5.15.0-86-generic #96~20.04.1-Ubuntu

Hi @Rzx520,
OK I am trying to compile everything we have seen thus far:

--lr 2e-4, --lr_drop 40, --epochs 51 --batch_size 2 -> AP50=58.4, U_R=16.5
--lr 1e-4, --lr_drop 35, --epochs 51 --batch_size 3 -> AP50=58.4, U_R=19.4
--lr 1e-4, --lr_drop 30, --epochs 51 --batch_size 3 -> less good then above

Is that correct? Also, have you tried (as lr_drop 35->30 had the adverse affect):
--lr 1e-4, --lr_drop 40, --epochs 51 --batch_size 3

Best,
Orr

yes,I do.
--lr 1e-4, --lr_drop 30, --epochs 41 --batch_size 3
image

AP50=58.1 U_R=19.5

@orrzohar There's one issue, the above results epochs are not default values, but 41.

Hi @Rzx520,

Are the results above for:
--lr 1e-4, --lr_drop 40, --epochs 51 --batch_size 3?

And of course the hyper parameters are changed -- you changed the batch size as it did not fit on your GPUs. This will change other hyper parameters.
Best,
Orr

Hi @Rzx520,
I am closing this for now. If you can confirm what configuration you used to get the best results, I will add this to the README.
Best,
Orr