The result is not that ideal like the paper showed
Jarvis-K opened this issue · comments
I just run maddpg in simple_speaker_listener
several times,but none of them get the -20 avg-reward like the paper proposed. Are there anything i should modify to get a better or more stable result?
Looks like you're not the only one having trouble reproducing some results: #12
I am getting -60 rewards, is that normal for just running the code without any alternations?
Also, in scenario=simple_speaker_listener, this code cannot converge to the result reported in Fig.4. Anyone knows the problem?