kl regularization returns nan for gradient.
pumplerod opened this issue · comments
Perhaps I am missing something obvious however I found kl
regularization to work a bit better for my data so I switch from l2
to kl
and began to get nan
values in the gradient.
Here is the test code I used to verify...
Python 3.8.10
torch==1.13.0
torchsort==0.1.9
Use of l2
works as expected.
import torch
import torchsort
# This works...
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='l2', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[-0.1140, -0.1389, -0.1270, 0.1145, -0.1347, -0.1235, -0.1235, -0.1347,-0.1235]]),)
Use of kl
returns nan
.
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='kl', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]]),)
Hi @pumplerod, thank you for your patience. I was able to successfully reproduce the nan
gradients with you code above. It does seem like reducing increasing the regularization strength does fix the issue, but I will look into this more closely, as ideally the gradients should always be defined, even if they are just 0.