About the randomness of the algorithm

Question

About the randomness of the algorithm

tingxueronghua opened this issue 3 years ago · comments

Thanks for your effort and advice before! I generated the results using published code, and I found that my results in hard.json have a similarity of 86.9% with the provided hard.json, and of 97.3% with counterexamples.json.
I could understand that this algorithm should have a certain degree of randomness, and I just want to make sure whether you have checked about this? Is this degree of randomness normal? Or does this simply mean I run something wrong?

Yucheng Han · Answer 1 · Wed Oct 20 2021 14:11:17 GMT+0800 (China Standard Time)

This might be caused by the fact that I use numpy.array rather than torch.tensor for "transanctions_matrix" in the function "match_rules"? I tried the code twice, and got much more similar results.

Corentin Dancette · Answer 2 · Wed Oct 20 2021 23:12:17 GMT+0800 (China Standard Time)

Hi, could you show me the command you used to run the program ?

Yucheng Han · Answer 3 · Wed Oct 20 2021 23:36:27 GMT+0800 (China Standard Time)

Thanks for your reply!
I did not change default parameters, and this is my command:
python vqa.py --gminer_path <path_to_gminer> --save_dir logs/vqa2

I use numpy.array rather than torch.tensor for "transanctions_matrix" in the function "match_rules". And I use CUDA 10.1 for compilation.

Corentin Dancette · Answer 4 · Wed Oct 20 2021 23:59:39 GMT+0800 (China Standard Time)

Do you use the gminer from https://github.com/cdancette/GMiner ?

Yucheng Han · Answer 5 · Thu Oct 21 2021 14:45:00 GMT+0800 (China Standard Time)

Yes. I compile it using CUDA 10.1.

Yucheng Han · Answer 6 · Fri Oct 22 2021 13:56:16 GMT+0800 (China Standard Time)

Hi, thanks for your advice before, and I have another question about the rules. Could I simply use the rules extracted to split train datasets into "counterexample", "hard" and "simple"? I did so but found samples in "counterexample" seem to be more than "easy" in train dataset. Is this normal?

Corentin Dancette · Answer 7 · Fri Oct 22 2021 19:09:58 GMT+0800 (China Standard Time)

There should not be more samples in counterexamples compared to easy with the default parameters from the repo. If you use a higher min support, then this can be the case.
How many examples do you have in counterexamples vs easy ?

Yucheng Han · Answer 8 · Sat Oct 23 2021 20:55:10 GMT+0800 (China Standard Time)

Thanks for your reply! I did not change the parameters, and I used my own extracted rules. Finally, in the train dataset, I got 249872 for "counterexamples", and 188078 for "easy"... I also think this is quite strange, but the results using my rules are quite similar to the splits in your public .json files. This is the reason why I think most of my extracted rules are correct, and the splits should not be too different from yours...

Yucheng Han · Answer 9 · Sun Oct 24 2021 14:51:00 GMT+0800 (China Standard Time)

Anyway, I think it is hard to find the problem in my code... All these problems could be solved when the rules and corresponding answers could be published. Looking forward to them!