Algorithm results about PDQN/HPPO in gym-hybrid

Question

Algorithm results about PDQN/HPPO in gym-hybrid

PaParaZz1 opened this issue 3 years ago · comments

Hi, this is a nice project for hybrid action space, and I see you mentioned PDQN/HPPO in README.md. Do you have some experiment results about these algorithms in this environment. If not, we want to invite you to implement related algorithms and benchmarks in our repo DI-engine together, we will offer corresponding supports for you. Do you have will to construct a hybrid action space RL benchmark? Other comments are also welcome.

Swain · Answer 1 · Tue Oct 26 2021 20:21:06 GMT+0800 (China Standard Time)

We implement PADDPG in your gym-hybrid env in this link

Thomas Hirtz · Answer 2 · Wed Oct 27 2021 16:41:29 GMT+0800 (China Standard Time)

Thank you very much for your feedback!
Unfortunately these days I am very busy and I cannot take care of it.
I did implement P-QLearning in my q-learning-algorithms in the past, I do not remember if it converged or the score.

Note: Algorithms are now using architectures that needs to know the which parameters are related to which action (e.g. MP-DQN). I think it may be better to change the way to handle the observation space. I am not completely sure yet what is the best way to do it. Even though it would definitely future-proof the repository, it would also break any agent that used this env...
gym-platform is using one tuple of space per parameter-action pair, didn't test how inconvenient it is to have empty tuple (e.g. for breaking).