Reproducibility: VDN Results
gsavarela opened this issue · comments
Hi,
Thanks for your contribution to the communities of reinforcement learning and robotics.
Unfortunately, I am having problems reproducing the results for VDN for tasks Arctic Transport, Material Transport, Predator Capture Prey, in table II of the article. Oddly enough Warehouse seems okay. Perhaps you could confirm that your method of aggregating runs follows Papoudakis, et al. 2021?
Maximum returns: For each algorithm, we identify the evaluation timestep during training in which
the algorithm achieves the highest average evaluation returns across five random seeds. We report the
average returns and the 95% confidence interval across five seeds from this evaluation timestep
Moreover the configuration files for each experiment is consistent with #13. The undiscounted returns and their respective 95% (normal) confidence intervals for each tasks are as follows:
Arctic Transport: -28.315 +/- 0.89
Material Transport: 21.895 +/- 0.74
Predator Capture Prey: 125.094 +/- 2.45
Warehouse: 28.572 +/- 0.44
While the ones in the paper are:
Arctic Transport: -6.98 +/- 1.75
Material Transport: 5.15 +/- 1.3
Predator Capture Prey: 33.25 +/- 0.46
Warehouse: 28.7+/- 1.49
Additionally, I send the plots obtained for each task, and the pattern for the algorithm is consistent with the published versions (Figure 3):
What am I missing? Should I normalize for the number of agents? Epymarl is for cooperative MARL, perhaps the reward signals are being aggregated into the joint rewards? Could you please clarify?
Regards,
Guilherme Varela