cdancette / detect-shortcuts

Repo for ICCV 2021 paper: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

Home Page:https://cdancette.fr/projects/vqa-ce/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

About the randomness of the algorithm

tingxueronghua opened this issue · comments

Thanks for your effort and advice before! I generated the results using published code, and I found that my results in hard.json have a similarity of 86.9% with the provided hard.json, and of 97.3% with counterexamples.json.
I could understand that this algorithm should have a certain degree of randomness, and I just want to make sure whether you have checked about this? Is this degree of randomness normal? Or does this simply mean I run something wrong?

This might be caused by the fact that I use numpy.array rather than torch.tensor for "transanctions_matrix" in the function "match_rules"? I tried the code twice, and got much more similar results.

Hi, could you show me the command you used to run the program ?

Thanks for your reply!
I did not change default parameters, and this is my command:
python vqa.py --gminer_path <path_to_gminer> --save_dir logs/vqa2

I use numpy.array rather than torch.tensor for "transanctions_matrix" in the function "match_rules". And I use CUDA 10.1 for compilation.

Yes. I compile it using CUDA 10.1.

Hi, thanks for your advice before, and I have another question about the rules. Could I simply use the rules extracted to split train datasets into "counterexample", "hard" and "simple"? I did so but found samples in "counterexample" seem to be more than "easy" in train dataset. Is this normal?

There should not be more samples in counterexamples compared to easy with the default parameters from the repo. If you use a higher min support, then this can be the case.
How many examples do you have in counterexamples vs easy ?

Thanks for your reply! I did not change the parameters, and I used my own extracted rules. Finally, in the train dataset, I got 249872 for "counterexamples", and 188078 for "easy"... I also think this is quite strange, but the results using my rules are quite similar to the splits in your public .json files. This is the reason why I think most of my extracted rules are correct, and the splits should not be too different from yours...

Anyway, I think it is hard to find the problem in my code... All these problems could be solved when the rules and corresponding answers could be published. Looking forward to them!