Problem about intrinsic reward at pre-training stage

Question

Problem about intrinsic reward at pre-training stage

dong845 opened this issue 2 years ago · comments

Hi,

I think I meet a problem that my results of intrinsic reward is about 0.0014 after training of 4e7 frames and I just follow the instruction of github without changing any parameters, the environments I use is MiniGrid-KeyCorridorS3R3-v0,MiniGrid-MultiRoom-N4-S5-v0,MiniGrid-UnlockPickup-v0, which are mentioned in the paper as pre-training of many-to-many transfer. Therefore, I don't know whether there are something I missed. Hoping you can help me. Thx a lot.

Simone Parisi · Answer 1 · Tue Oct 11 2022 20:56:09 GMT+0800 (China Standard Time)

Hi, can you be a bit more specific?
What do you mean by "results of intrinsic reward"? Success rate averaged across all 10 environments after intrinsic-only pre-training? What results are you trying to replicate? (Can you reference a plot in the paper?)
What commands are you running?

Leo-Lyu · Answer 2 · Wed Oct 12 2022 15:14:28 GMT+0800 (China Standard Time)

Hi, thank you for the reply. The training of Cbet can be split into two parts. And what I mean is the first part (pre-training) that trains without extrinsic reward. And the "mean_total_reward" I get is about 0.0014, which equals "mean_intrinsic_rewards". Then I see the plot 21 in appendix, which shows intrinsic reward of multi-env with random reset at pre-training stage can be between 0.1 and 1, which is much better than mine. So I just feel confused about the result I get and I want to know whether I missed something, because I read the code and parameters in it are same with what mentioned in the paper and I don't need to change anything. Thx