for discrete env

Question

for discrete env

ccplxx opened this issue 6 years ago · comments

I read the paper DIAYN just now, and can't understand how to train the DIAYN in an env with discrete actions, because SAC is for continuous env. But in the paper, some experiments are based on mountain car and inverted pendulum. Thank you

Tuomas Haarnoja · Answer 1 · Fri Nov 09 2018 22:41:38 GMT+0800 (China Standard Time)

I'm not too familiar with the DIAYN implementation, maybe @ben-eysenbach can help.

ccplxx · Answer 2 · Sat Nov 10 2018 10:05:46 GMT+0800 (China Standard Time)

Thank you, haarnoja. can SAC for discrete actions env? if it can, how?

Tuomas Haarnoja · Answer 3 · Mon Nov 12 2018 22:44:49 GMT+0800 (China Standard Time)

Yeah you can use SAC with discrete actions too, but this implementation does not support them. You would need to replace the policy with softmax distribution \pi(.,s) \propto \exp Q(s,.), which you can compute exactly for finite action space.