huawei-noah / AdderNet

Code for paper " AdderNet: Do We Really Need Multiplications in Deep Learning?"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why apply SGD on input feature X

geralt-write-code opened this issue · comments

in Section 3.2, this paper applys Stochastic gradient descent on input feature X, can input features be optimized? Can't understand the purpose of it.

Although X cannot be optimized, its derivative should be calculated since it is required in the calculation of gradients of filters according to the chain rule.