huanglianghua/pay-attention-pytorch

PyTorch implementation of the ICLR 2018 paper Learning to Pay Attention.

Thanks for the baseline code pytorch-cifar.

VGG-ATT-PC:

Tot: 100/100 | Loss: 0.182 | Acc: 95.260% (9526/10000)
VGG-ATT-DP:

Tot 100/100 | Loss: 0.196 | Acc: 94.900% (9490/10000)
VGG-ATT (same CNN structure as VGG-ATT-PC and VGG-ATT-DP but no attention):

Tot: 100/100 | Loss: 0.178 | Acc: 95.460% (9546/10000)
VGG: 93.xxx%

(Seems main improvements are caused by the higher resolution of early layers, not the attentions? Sad :(... Need more experiments on larger datasets like cifar-100 or ImageNet)

About

PyTorch implementation of the ICLR 2018 paper Learning to Pay Attention.

MIT License

Languages

Language:Python 100.0%