local subgradient = S*torch.sign(weight)
MrLinNing opened this issue · comments
L1 sparsity should be torch.abs(weight), can you detail more about it?
local subgradient = S*torch.sign(weight)
The (sub)gradient of absolute value function (L1 sparsity loss) is the sign function. Here we compute the subgradient directly without defining loss.
thank you! @liuzhuang13
Why you used subgradient? did you try directly defining loss ?
Because absolute value function is not differentiable at point x=0, so it is subgradient instead of gradient. But in practice, the weight x never becomes 0 so it is actually equivalent to gradient.
Unlike Pytorch, in Torch there is no automatic differentiation, so I found this to be the most convenient way to do the thing we wanted, and we just used it.