iter_loss

Question

iter_loss

wuzuowuyou opened this issue 4 years ago · comments

First thank you for your meticulous work!

iter_loss = 0
            for logit in logits:
                # Resize labels for {100%, 75%, 50%, Max} logits
                _, _, H, W = logit.shape

                # print("path_img=",path_img)
                labels_ = resize_labels(labels, size=(H, W))
                iter_loss += criterion(logit, labels_.to(device))

            # Propagate backward (just compute gradients)
            iter_loss /= CONFIG.SOLVER.ITER_SIZE
            iter_loss.backward()

why iter_loss /= CONFIG.SOLVER.ITER_SIZE

instead of iter_loss /= logits.size()

Kazuto Nakashima · Answer 1 · Thu Jul 09 2020 17:58:01 GMT+0800 (China Standard Time)

The line is not to average the multiple logits but is to make the accumulated gradients invariant to the number of iteration ITER_SIZE. The block accumulates 1/N-scaled gradients by N times and then update the parameters with N/N-magnitude gradients. It is equivalent to compute the raw gradients once and update parameters immediately. The common trick to save memory.