Why apply SGD on input feature X
geralt-write-code opened this issue · comments
geralt-write-code commented
in Section 3.2, this paper applys Stochastic gradient descent on input feature X, can input features be optimized? Can't understand the purpose of it.
HantingChen commented
Although X cannot be optimized, its derivative should be calculated since it is required in the calculation of gradients of filters according to the chain rule.