[Testing Transfer Learning] I cannot reproduce results on Novel classes only

Question

[Testing Transfer Learning] I cannot reproduce results on Novel classes only

egmaminta opened this issue a year ago · comments

Good day! First, I'd like to say great work on this!

As I was trying to reproduce the results found here, I'd like to focus on COCO (Novel, 31.4) and LVIS (Novel, 22.0).

Shown below is the bash script I'm using to test your fine-tuned open-vocabulary detector on COCO.

python3 ./tools/train_net.py \
--eval-only  \
--num-gpus 4 \
--config-file ./configs/COCO-InstanceSegmentation/CLIP_fast_rcnn_R_50_C4_ovd_testt.yaml \
MODEL.WEIGHTS ./pretrained_ckpt/regionclip/regionclip_finetuned-coco_rn50.pth \
MODEL.CLIP.OFFLINE_RPN_CONFIG ./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \
MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \
MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \
MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_17_target_cls_emb.pth \
MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \

After doing the inference, I get really, really low scores like ~0.0019 AP. May I respectfully ask if I missed anything?

Hoping for your kind response. Thank you.

All the best!

egmaminta · Answer 1 · Tue Aug 22 2023 13:29:48 GMT+0800 (China Standard Time)

Here's a screenshot of my results, by the way...

egmaminta · Answer 2 · Tue Aug 22 2023 18:04:10 GMT+0800 (China Standard Time)

I tried running RN50 (Generalized) from test_transfer_learning.sh. Here are the results:

Still really, really far from the results expected. Hoping you could share some guidance for this? Thank you.

Mingzhou He · Answer 3 · Thu Aug 31 2023 15:52:17 GMT+0800 (China Standard Time)

same problem

Mingzhou He · Answer 4 · Thu Aug 31 2023 16:25:25 GMT+0800 (China Standard Time)

The following print appears when the weight is loading：

WARNING [08/31 16:16:06 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint:
backbone.attnpool.c_proj.{bias, weight}
backbone.attnpool.k_proj.{bias, weight}
backbone.attnpool.positional_embedding
backbone.attnpool.q_proj.{bias, weight}
backbone.attnpool.v_proj.{bias, weight}
backbone.bn1.{bias, weight}
backbone.bn2.{bias, weight}
backbone.bn3.{bias, weight}
backbone.conv1.weight
...

Mingzhou He · Answer 5 · Thu Aug 31 2023 17:55:41 GMT+0800 (China Standard Time)

This is a problem caused by the pytorch version. The Pytorch 2.0 I used had a zero indicator problem. Didn't look closely at what went wrong, but I returned Pytorch1.9 to get normal metrics

Mingzhou He · Answer 6 · Thu Aug 31 2023 17:55:55 GMT+0800 (China Standard Time)

@egmaminta

egmaminta · Answer 7 · Fri Sep 01 2023 02:26:22 GMT+0800 (China Standard Time)

This is a problem caused by the pytorch version. The Pytorch 2.0 I used had a zero indicator problem. Didn't look closely at what went wrong, but I returned Pytorch1.9 to get normal metrics

Did you downgrade your PyTorch from 2.0 to 1.9? Wouldn't this be in conflict with Detectron2 (since latest requirest 11.8 CUDA version)?

Mingzhou He · Answer 8 · Fri Sep 01 2023 14:40:00 GMT+0800 (China Standard Time)

CUDA version follow pytorch version

egmaminta · Answer 9 · Fri Sep 01 2023 15:05:27 GMT+0800 (China Standard Time)

CUDA version follow pytorch version

Would it be possible to show here the steps on how you did it specifically? I tried to downgrade my PyTorch version but I would then encounter mismatch with Detectron2. Would gladly appreciate any help!

Mingzhou He · Answer 10 · Fri Sep 01 2023 15:08:28 GMT+0800 (China Standard Time)

As mentioned by the author in install.md, after reinstalling pytorch, you need to remove the build directory under the project and reinstall detectron2

egmaminta · Answer 11 · Fri Sep 01 2023 15:29:49 GMT+0800 (China Standard Time)

As mentioned by the author in install.md, after reinstalling pytorch, you need to remove the build directory under the project and reinstall detectron2

I did rebuild my detectron2 after using rm -rf build/ **/*.so... However, I get this error:

RuntimeError:
    The detected CUDA version (11.8) mismatches the version that was used to compile
    PyTorch (11.1). Please make sure to use the same CUDA versions.

Apologies, for any inconvenience. Appreciate any help!

Mingzhou He · Answer 12 · Fri Sep 01 2023 15:35:58 GMT+0800 (China Standard Time)

CUDA version is wrong, i suggest create a conda env, and install pytorch through pip, eg.

egmaminta · Answer 13 · Fri Sep 01 2023 15:39:24 GMT+0800 (China Standard Time)

CUDA version is wrong, i suggest create a conda env, and install pytorch through pip, eg.

OK! Will do ^^. Attempting to rebuild the whole project again.

Kent Vu · Answer 14 · Wed Nov 01 2023 23:02:08 GMT+0800 (China Standard Time)

@egmaminta @Hiram1026 Sorry for bothering you, but I downgraded pytorch to 1.9 and rebuild like the instruction but I still have the problem. Could you provide your env

zqf30 · Answer 15 · Sat Feb 24 2024 11:07:19 GMT+0800 (China Standard Time)

@Hiram1026 Thank you very much! I run the transferring learning example COCO again, and it works for me :)

And I also want to notify you guys, that Pytorch 1.13, which is not suitable, is my previous environment. When I ran RegionClip in this env, it came out with the same problem as @egmaminta

The following are my training curves now.

DDPYZ · Answer 16 · Wed Aug 28 2024 15:19:51 GMT+0800 (China Standard Time)

@egmaminta @Hiram1026 Sorry for bothering you, but I downgraded pytorch to 1.9 and rebuild like the instruction but I still have the problem. Could you provide your env

Have you solved? I just solved. create new conda environment, download torch like mentioned before,and userm -rf build/ **/*.so,rebuild again.

Kent Vu · Answer 17 · Wed Aug 28 2024 17:20:14 GMT+0800 (China Standard Time)

@egmaminta @Hiram1026 Sorry for bothering you, but I downgraded pytorch to 1.9 and rebuild like the instruction but I still have the problem. Could you provide your env

Have you solved? I just solved. create new conda environment, download torch like mentioned before,and userm -rf build/ **/*.so,rebuild again.

Thank you for your reply, but I no longer work on it.