openai / maddpg

Code for the MADDPG algorithm from the paper "Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments"

Home Page:https://arxiv.org/pdf/1706.02275.pdf

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Episode in cooperative navigation env

kargarisaac opened this issue · comments

Hi,
Thank you for releasing the code. I have some questions about the 'done' situation in the cooperative navigation environment. I don't see any done function for the env. I just see the maximum time step for one episode for the terminal condition.
1- Is it the only situation that the env will be done and we need to reset the world?
2- How about when agents cover the landmarks? do they try to continue to cover the landmarks until the max time step is reached?
3- what is the max steps for the results you reported in table 2 in the paper for cooperative navigation env? Do you calculate the number of touches and the mean distance to landmarks in these number of time steps?

Thank you in advance