Why apply SGD on input feature X

Question

Why apply SGD on input feature X

geralt-write-code opened this issue 4 years ago · comments

in Section 3.2, this paper applys Stochastic gradient descent on input feature X, can input features be optimized? Can't understand the purpose of it.

HantingChen · Answer 1 · Fri Oct 09 2020 10:19:40 GMT+0800 (China Standard Time)

Although X cannot be optimized, its derivative should be calculated since it is required in the calculation of gradients of filters according to the chain rule.