scaling up the loss before calculating gradient
hanit92 opened this issue · comments
Hanit Hakim commented
Hi,
first of all, great paper and great code, thank you for sharing it :)
I was wondering - why do you scale up the loss before the backward() call (multiplying by 1024.), and then dividing it again before the weights update?