iter_loss
wuzuowuyou opened this issue · comments
wuzuowuyou commented
First thank you for your meticulous work!
iter_loss = 0
for logit in logits:
# Resize labels for {100%, 75%, 50%, Max} logits
_, _, H, W = logit.shape
# print("path_img=",path_img)
labels_ = resize_labels(labels, size=(H, W))
iter_loss += criterion(logit, labels_.to(device))
# Propagate backward (just compute gradients)
iter_loss /= CONFIG.SOLVER.ITER_SIZE
iter_loss.backward()
why iter_loss /= CONFIG.SOLVER.ITER_SIZE
instead of iter_loss /= logits.size()
Kazuto Nakashima commented
The line is not to average the multiple logits but is to make the accumulated gradients invariant to the number of iteration ITER_SIZE
. The block accumulates 1/N-scaled gradients by N times and then update the parameters with N/N-magnitude gradients. It is equivalent to compute the raw gradients once and update parameters immediately. The common trick to save memory.