OverLordGoldDragon / keras-adamw

Keras/TF implementation of AdamW, SGDW, NadamW, Warm Restarts, and Learning Rate multipliers

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparison Against Adam

JinLi711 opened this issue · comments

Is it possible for you to benchmark your implementation of AdamW against Tensorflow's implementation of Adam on multiple datasets? It would be useful information for users to decide whether AdamW is the right choice. I would be interested in the differences in the time it takes for every epoch step.

Already done - see build logs based on tests, in particular test_control() (example below). Testing isn't exhaustive, but on both sparse and dense tensors, for a tiny model, the AdamW and TF implementations are about equally fast - and on local tests, same held for a medium model.

Edit: agreeably more benchmarks could be useful, which I may implement in the future.