evaluation on the VIP and JHMDB datasets

Question

evaluation on the VIP and JHMDB datasets

AndyTang15 opened this issue 4 years ago · comments

Hi Allan,

Happy new year! And many thanks for releasing the code of this great work!

I used the codebase and the pretrained model provided in the repo to evaluate the VIP and JHMDB datasets, the results are:
VIP: 37.12(mIOU), JHMDB: 57.62(PCK@0.1) and 79.59(PCK@0.2).

They are noticeably lower than the results in your paper:
VIP: 38.6(mIOU), JHMDB: 59.3(PCK@0.1) and 84.9(PCK@0.2).

Could you please help to check whether I evaluated them in a right way?
For VIP, I used the command:
python test.py --filelist eval/VIP_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path vip_results --topk 10 --videoLen 4 --radius 12 --temperature 0.05 --cropSize 560

For JHMDB, I used the command:
python test.py --filelist eval/jhmdb_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path jhmdb_results --topk 10 --videoLen 7 --radius 12 --temperature 0.05 --cropSize 320

The hyperparameters above were selected based on your paper except temperature (I've also tried 0.07 but found 0.05 is better).

BTW, there're two bugs for JHMDB evaluation:

https://github.com/ajabri/videowalk/blob/master/code/data/jhmdb.py#L231
the "sio" should be imported in this python file
https://github.com/ajabri/videowalk/blob/master/code/test.py#L161
it should be "test_utils" rather than "utils"

A. Jabri · Answer 1 · Fri Jan 22 2021 03:39:25 GMT+0800 (China Standard Time)

Hi @AndyTang15,

Thanks for your interest, and I apologize for the late reply.

I haven't re-run the JHMDB and VIP evaluations since refactoring and retraining models for the code release, so thanks for bringing this to my attention, and I will take a closer look!

One detail that will improve the JHMDB result is that the radius should be (commensurately) decreased, since the input is about 4x smaller (320x320 v.s. 900x480). So, you might consider a radius of 5 instead of 12. I apologize for the confusion (and the typo in the appendix).

python test.py --filelist eval/jhmdb_vallist.txt --model-type scratch \
--resume ../pretrained.pth --save-path jhmdb_results \
--topk 10 --videoLen 7 --radius 5 --temperature 0.05 --cropSize 320

AndyTang15 · Answer 2 · Sat Jan 23 2021 00:25:13 GMT+0800 (China Standard Time)

@ajabri Hi Allan,
Many thanks for your reply and help. I've tried radius=5 following your command, as well as radius=3. The results on JHMDB are:

radius=5: PCK@0.1 58.64, PCK@0.2 80.54
radius=3: PCK@0.1 58.84, PCK@0.2 80.23

The performances have all been improved but still lower than the results reported in your paper before refactoring, especially on PCK@0.2, would it be possible for you to help me with this again? Many thanks!

Daniel McKee · Answer 3 · Thu Mar 04 2021 14:11:02 GMT+0800 (China Standard Time)

Hi @ajabri and @AndyTang15,
Just wanted to check on whether you were able to reconcile the performance with results in the paper? I ran into the same issue with JHMDB performance, and I was not able to reproduce results with various radius settings.

Thanks!

A. Jabri · Answer 4 · Wed Mar 10 2021 07:47:56 GMT+0800 (China Standard Time)

Hi @dmckee5,

I have not yet reconciled this issue (the lower PCK@0.2 with this repository). If you are reporting or comparing to our results, at this point, please go ahead and report the result you've reproduced. I am hoping to get to this soon.

Xiao Pan · Answer 5 · Wed Oct 20 2021 17:01:57 GMT+0800 (China Standard Time)

Hi @ajabri @AndyTang15 ,
May I ask where did you download the VIP dataset? The official link in the original paper is expired. Is there any cloud drive version?

Xiao Pan · Answer 6 · Sun Oct 24 2021 20:09:57 GMT+0800 (China Standard Time)

Hi Allan,

Happy new year! And many thanks for releasing the code of this great work!

I used the codebase and the pretrained model provided in the repo to evaluate the VIP and JHMDB datasets, the results are: VIP: 37.12(mIOU), JHMDB: 57.62(PCK@0.1) and 79.59(PCK@0.2).

They are noticeably lower than the results in your paper: VIP: 38.6(mIOU), JHMDB: 59.3(PCK@0.1) and 84.9(PCK@0.2).

Could you please help to check whether I evaluated them in a right way? For VIP, I used the command: python test.py --filelist eval/VIP_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path vip_results --topk 10 --videoLen 4 --radius 12 --temperature 0.05 --cropSize 560

For JHMDB, I used the command: python test.py --filelist eval/jhmdb_vallist.txt --model-type scratch --resume ../pretrained.pth --save-path jhmdb_results --topk 10 --videoLen 7 --radius 12 --temperature 0.05 --cropSize 320

The hyperparameters above were selected based on your paper except temperature (I've also tried 0.07 but found 0.05 is better).

BTW, there're two bugs for JHMDB evaluation:

https://github.com/ajabri/videowalk/blob/master/code/data/jhmdb.py#L231
the "sio" should be imported in this python file

https://github.com/ajabri/videowalk/blob/master/code/test.py#L161
it should be "test_utils" rather than "utils"

How did you get VIP_vallist.txt and jhmdb_vallist.txt ?

Luming Tang · Answer 7 · Wed May 10 2023 13:46:26 GMT+0800 (China Standard Time)

Hi @AndyTang15, I used the same commands as you but looks like my results are much worse than urs. Just wondering is there other modification you have made to the code? Plus, what's the filelist you are using? The filelist I used is from the original UVC repo and it contains 268 lines. Any help would be appreciated. Thanks!

rvandeghen · Answer 8 · Thu Feb 29 2024 16:41:06 GMT+0800 (China Standard Time)

Hi,
Where can I find your VIP_vallist.txt ? Also Did you use the VIP_Fine from this repo
Thanks