glample / tagger

Named Entity Recognition Tool

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

SGD x Adam

pvcastro opened this issue · comments

Hi there @glample

Do you have any theories as to why, in your implementation, SGD is performing better than Adam optimizer (or any other optimizers, for that matter)? Do you think it's related to not having batch processing implemented?

Thanks!

Hi,

My experience in general (and I know that many people had similar observations), is that SGD is what works best with batch size 1. Batch size 1 is also what works best in general, but people use bigger batch size (like 32 or 128) for training speed. When using bigger batch sizes, Adam usually gives better results than SGD. But well, this also depends a bit on the task.. But for NER I always observed that SGD was significantly the best.

Ok, thanks!
I'm presenting a paper based on your LSTM-CRF architecture on a conference for Portuguese NLP in september ("Portuguese Named Entity Recognition using LSTM-CRF" - http://www.inf.ufrgs.br/propor-2018/accepted-papers/), so I'm getting ready for it. If you have any tips, they would be most welcome! Thanks!