for discrete env
ccplxx opened this issue · comments
ccplxx commented
I read the paper DIAYN just now, and can't understand how to train the DIAYN in an env with discrete actions, because SAC is for continuous env. But in the paper, some experiments are based on mountain car and inverted pendulum. Thank you
Tuomas Haarnoja commented
I'm not too familiar with the DIAYN implementation, maybe @ben-eysenbach can help.
ccplxx commented
Thank you, haarnoja. can SAC for discrete actions env? if it can, how?
Tuomas Haarnoja commented
Yeah you can use SAC with discrete actions too, but this implementation does not support them. You would need to replace the policy with softmax distribution \pi(.,s) \propto \exp Q(s,.), which you can compute exactly for finite action space.