VICE vs. SACClassifier

Question

VICE vs. SACClassifier

jgkim2020 opened this issue 5 years ago · comments

jgkim2020 commented 5 years ago

This is not a issue about the code implementation per se, but rather a question on the difference between two algorithms.

It seems that the VICE class implementation follows the equation from the original paper as well as the RSS paper when training the "logit" f(s) via the softmax discriminator D(s,a) with cross-entropy loss.

However, the SACClassifier class implementation does not use log_pi(a|s) and instead trains the "logit" via the sigmoid discriminator D(s) with cross-entropy loss. Since the SACClassifier utilizes negatives samples (by sampling from the replay buffer) when training the "logit" (or equivalently the event prob.) it doesn't seem to be the "Naive Classifier" case mentioned in the RSS paper.

What is the reasoning/theory behind SACClassifier? Any references (relevant paper, etc.) would be much appreciated :)

jgkim2020 · Answer 1 · Wed Aug 07 2019 12:00:30 GMT+0800 (China Standard Time)

Nevermind, I realized that SACClassifier only trains the classifier on the first episode (self._epoch == 0) and is indeed the "Naive Classifier" case from the paper.

Avi Singh · Answer 2 · Thu Aug 08 2019 02:24:42 GMT+0800 (China Standard Time)

Glad you figured it out! The reason SACClassifier was implemented this non-intuitive way was because it made it extremely simple to implement VICE and VICE-RAQ on top of it.