facebookresearch / LaViLa

Code release for "Learning Video Representations from Large Language Models"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reproducing zero-shot eval results on EK100-MIR

melongua opened this issue · comments

Hi,
I have downloaded the pretrained ckpt c89337, and use eval_zeroshot.py to evaluate on EK100_MIR in a zero-shot manner.

I prepared the dataset following the instruction follow the command:
python eval_zeroshot.py --dataset ek100_mir --root datasets/EK100/video_ht256px/ --clip-length 4 --resume $PATH

The results I got are:
mAP: V->T: 0.334 T->V: 0.251 AVG: 0.292
nDCG: V->T: 0.331 T->V: 0.300 AVG: 0.315

If I increase the clip_len from 4 to 16 as described in the paper, the results are:
mAP: V->T: 0.341 T->V: 0.264 AVG: 0.303
nDCG: V->T: 0.335 T->V: 0.305 AVG: 0.320

Both seems to be much lower than the number reported in the paper:
mAP: 36.1 , nDCG:34.6

May I ask what might be the cause of the performance gap ? Thanks in advance.

Hi @melongua,

Can you provide some more details about (1) the EK100 data that you are using and (2) some other customized metadata e.g. the relevancy matrix? I believe these might have some effect on the final performance. We've uploaded the ones we used in this doc.

Best,
Yue