facebookresearch / CutLER

Code release for "Cut and Learn for Unsupervised Object Detection and Instance Segmentation" and "VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Clarification request about implemetation details

khatanas opened this issue · comments

Hello,

I would have a couple of questions about section 3.4. and 3.5.:
About section 3.4:

"To de-duplicate the predictions and the ground truth from round t, we filter out ground-truth masks with an IoU > 0:5 with the predicted masks."

  1. Is this performed automatically if all annotation files are placed under DETECTRON2_DATASETS/imagenet/annotations/?

About section 3.5., I am a bit confused about which learning rate you used for which training stage. In the Detector paragraph it is written:

"We train the detector on ImageNet with initial masks and bounding boxes for 160K iterations with a batch size of 16."

  1. What is the learning rate here?

A little further it is written:

"We then optimize the detector for 160K iterations using SGD with a learning rate of 0.005, which is decreased by 5 after 80K iterations, and a batch size of 16"

Assuming that these 2 sentences refer to the training stages where you are using DropLoss:

  1. When I check the cascade_mask_rcnn_R_50_FPN.yaml file, the BASE_LR parameter is set to 0.01, and GAMMA is set to 0.02 (decreased by 50), which is not coherent with anything from your paper. Is it normal, or might it come from a typo?
  2. Did you train only once before moving to the self-training stages (do the 2 previous sentences refer to the same training stage)?

Then, in the Self-training paragraph:

"We optimize the detector using SGD with a learning rate of 0.01 for 80K iterations."

  1. When I check the cascade_mask_rcnn_R_50_FPN_self_train.yaml file, the BASE_LR parameter is set to 0.005, which was the learning rate specified for the training using the DropLoss. Is it normal or might it come from a typo?

Would it be possible to have further clarifications?
Please let me know.