BatchNormsync with Adam Optimizer
tarun005 opened this issue · comments
Is the bnsync code written specifically for SGD optimizer? The loss is not converging if I use and train the model with Adam optimizer.
@tarun005 Have you tested with SGD optimizer? Does it drive the training process to convergence?
Yes, the model converges with SGD, but same model does not if I replace SGD with adam.
@tarun005 Although I suppose that BN should be irrelevant to the optimization method, when I used the syncbn by just adding the folder lib
to $PATH, I met an error saying 'segmentation fault'. What's your usage?
Agree that BN shouldn't be relevant to optimization method, but I have read somewhere that Adam requires global statistics at every iteration, so the implementation of BNsync given here could be an issue.