Comparison Against Adam
JinLi711 opened this issue · comments
Is it possible for you to benchmark your implementation of AdamW against Tensorflow's implementation of Adam on multiple datasets? It would be useful information for users to decide whether AdamW is the right choice. I would be interested in the differences in the time it takes for every epoch step.
Already done - see build logs based on tests, in particular test_control()
(example below). Testing isn't exhaustive, but on both sparse and dense tensors, for a tiny model, the AdamW and TF implementations are about equally fast - and on local tests, same held for a medium model.
Edit: agreeably more benchmarks could be useful, which I may implement in the future.