LuoweiZhou / VLP

Vision-Language Pre-training for Image Captioning and Question Answering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to reproduce image features for COCO and CC

darkmatter08 opened this issue · comments

Hi Luowei --

I'm unable to reproduce the image features that you've published here for COCO and CC.
I've trained and evaluated the model using your provided features as well as my extracted features, on the VQA2 task (VQA2 uses COCO images). There is still an outstanding gap in performance. While you report 67.4, I can only achieve 64.3. This is a significant 3-point gap. I am wondering if others have encountered similar problem and how they have resolved it?

I've extracted my own features using the script you shared with me privately (slightly modified to resole dependency issues). Using the housebw/detectron image and your provided detectron checkpoint .pkl and config .yaml, I generate different features than yours. Comparing image-by-image, I have different values in the tensors/matricies. I also get different aggregate statisics (min, max, mean, variance) for features, image-by-image. This is the same situation for CC as well. I've also confirmed it is not a precision issue as well (float16 vs float32).

As it stands, I cannot replicate your results despite my best efforts to follow all your provided documentation, using the same environment, code, data dependencies, and source data.

I am attempting to use your SOTA model on a new dataset/task. Not being able to replicate your results is an impediment...

Thanks,
Shawn

Sorry for the delay. I have been traveling and will look into this issue early next week. Please stay tuned.

@darkmatter08 If you have figured it out, could you share your experience in reproducing the feature files (do and don't)? I will sanitize the feature extraction code soon.

Closing as problem is resolved.

To anyone attempting to reproduce: please verify md5sums of all files you download! @LuoweiZhou, I strongly encourage you to publish md5sums for every link you download to improve reproducibility.

Here are mine:

$ md5sum e2e_faster_rcnn_X-101-64x4d-FPN_2x.*
535a2f0f7a73948c7400ce864d4b8efa  e2e_faster_rcnn_X-101-64x4d-FPN_2x.pkl
cc3d540cee79506e1d5a22bec3aef5bf  e2e_faster_rcnn_X-101-64x4d-FPN_2x.yaml

@darkmatter08 Thanks for sharing! Whoever has downloaded these two files, please verify the IDs.

The checkpoints used in our other repo GVD are from a different training (hence different) but named the same.

@LuoweiZhou best practice would be for you to share the md5sums of each of your files here. Can you compute the md5sums of your known good files and verify they are the same as mine? I can confirm I'm able to reproduce the features now.

The ones I have are the ones in the VLP repo.

The feature extraction code is now available here: https://github.com/LuoweiZhou/detectron-vlp