openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"

Home Page:https://arxiv.org/pdf/1706.02275.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot reproduce experiment results

arbaazkhan2 opened this issue · comments

Is this code vastly different from the code used to generate results for the paper?
I cannot reproduce any of the results of the experiments simple_spread, simple_reference, simple_tag even after running for over 2 million iterations. The policy doesn't even look like its getting better. Any tips on this? Has somebody else got it to work?

Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?

Further, I don't see the ensemble policy part or the estimating other agents policies part in the (Section 4,2 and 4.3 in the paper) code. Am I missing something?

This was answered in #8, it isn't in this repo.

Hi! There was a bug in the code that prevented the sharing of reward in collaborative environments. This should be fixed now! Note that the results will be different from the paper since we refactored the code since publication, but the models should still train.

For the ensemble policies/ estimating other agent's policies, that code was created by Yi Wu. Please contact him if you'd like it to be open-sourced.

commented

For policy ensemble and approximation, I have put the code online for easy access:
https://www.dropbox.com/s/jlc6dtxo580lpl2/maddpg_ensemble_and_approx_code.zip?dl=0