BruceW91 / CVSE

The official source code for the paper Consensus-Aware Visual-Semantic Embedding for Image-Text Matching (ECCV 2020)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ablation Study (Beta=1) Results not replicable

Nishant3815 opened this issue · comments

Hi,

Thanks a ton for making this work open source. I have been following this work and trying to replicate its results for one of my experiments and to validate your consensus hypothesis. In Table 3, with Beta=1; we see that the R@1 for t2i falls from 59.9 to 54.8 and for i2t, it falls from 74.8 to 72.2 .

I tried running this experiment for both FLICKR and MS COCO dataset without changes to the code and the results that I have been getting don't match and moreover suggest that the consensus module has no effect as the performances are more or less similar.

Below are the results:

  • For COCO DATASET
  • Results at Beta=1:
    image
  • Results at Beta=0.9:

image

and for the 1k test set in COCO, results are:
image_

-FOR FLICKR DATASET

image

During your runs, have you observed similar issue? It would be really helpful if can guide me through this. Currently my results are in contradiction with paper's core hypothesis. It would be really helpful if you can guide me regarding the same.

Hello, I can't download the data from the link provided by the authors. Could you download the data successfully?

111

Hi,

Thanks a ton for making this work open source. I have been following this work and trying to replicate its results for one of my experiments and to validate your consensus hypothesis. In Table 3, with Beta=1; we see that the R@1 for t2i falls from 59.9 to 54.8 and for i2t, it falls from 74.8 to 72.2 .

I tried running this experiment for both FLICKR and MS COCO dataset without changes to the code and the results that I have been getting don't match and moreover suggest that the consensus module has no effect as the performances are more or less similar.

Below are the results:

  • For COCO DATASET

  • Results at Beta=1:
    image

  • Results at Beta=0.9:

image

and for the 1k test set in COCO, results are: image_

-FOR FLICKR DATASET

image

During your runs, have you observed similar issue? It would be really helpful if can guide me through this. Currently my results are in contradiction with paper's core hypothesis. It would be really helpful if you can guide me regarding the same.

In my experiments, I have transfered the code on another machine with not totally same running environment. And the results are always stable under different versions of numpy and pytorch. Maybe you need to tune the hyper-parameter Beta and the construction of variable tmp in function evaluation.label_complete() in your environment.