VAT implementation is wrong

Question

VAT implementation is wrong

AskAukNuTutor opened this issue 5 years ago · comments

Thank you for your code.

Following the original VAT paper, consistency_func in hparams.py should be reverse_kl for VAT, although it is set to forward_kl in your code.

The adversarial noise r in VAT is obtained by maximizing D_KL(p(y|x)||p(y|x+r)), however, the consistency loss D_KL(p(y|x+r)||p(y|x)) is used when consistency_func=forward_kl. It matters because of the asymmetricity of KL divergence, I think.

Colin Raffel · Answer 1 · Tue Jun 04 2019 01:37:33 GMT+0800 (China Standard Time)

We tried both and forward_kl worked better.

AskAukNuTutor · Answer 2 · Tue Jun 04 2019 09:41:40 GMT+0800 (China Standard Time)

IMO, consistency_func cannot be a hyper-parameter to be tuned (it's a part of the VAT model). If you consider consistency_func to be a hyper-parameter, it should be noted in Table 4 of your NIPS'18 paper.

For instance, I compared the VAT+EntMin with the following two settings in the CIFAR10-4000 scenario:
Setting-A: consistency_func=forward_kl and max_cons_multiplier=0.3 (original parameters)
Setting-B: consistency_func=reverse_kl and max_cons_multiplier=1.0 (modified parameters)

As a result, I observed that VAT+EntMin with Setting-B outperformed that with Setting-A about 2% in test error rates (11.7% vs 13.7%). Of course it is a result of a single run, so I do not insist that Setting-B outperforms Setting-A in general.