VAT implementation is wrong
AskAukNuTutor opened this issue · comments
Thank you for your code.
Following the original VAT paper, consistency_func
in hparams.py should be reverse_kl
for VAT, although it is set to forward_kl
in your code.
The adversarial noise r in VAT is obtained by maximizing D_KL(p(y|x)||p(y|x+r)), however, the consistency loss D_KL(p(y|x+r)||p(y|x)) is used when consistency_func=forward_kl
. It matters because of the asymmetricity of KL divergence, I think.
We tried both and forward_kl
worked better.
IMO, consistency_func
cannot be a hyper-parameter to be tuned (it's a part of the VAT model). If you consider consistency_func
to be a hyper-parameter, it should be noted in Table 4 of your NIPS'18 paper.
For instance, I compared the VAT+EntMin with the following two settings in the CIFAR10-4000 scenario:
Setting-A: consistency_func=forward_kl
and max_cons_multiplier=0.3
(original parameters)
Setting-B: consistency_func=reverse_kl
and max_cons_multiplier=1.0
(modified parameters)
As a result, I observed that VAT+EntMin with Setting-B outperformed that with Setting-A about 2% in test error rates (11.7% vs 13.7%). Of course it is a result of a single run, so I do not insist that Setting-B outperforms Setting-A in general.