teddykoker / torchsort

Fast, differentiable sorting and ranking in PyTorch

Home Page:https://pypi.org/project/torchsort/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kl regularization returns nan for gradient.

pumplerod opened this issue · comments

Perhaps I am missing something obvious however I found kl regularization to work a bit better for my data so I switch from l2 to kl and began to get nan values in the gradient.

Here is the test code I used to verify...
Python 3.8.10
torch==1.13.0
torchsort==0.1.9

Use of l2 works as expected.

import torch
import torchsort

# This works...
X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='l2', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[-0.1140, -0.1389, -0.1270,  0.1145, -0.1347, -0.1235, -0.1235, -0.1347,-0.1235]]),)

Use of kl returns nan.

X = torch.tensor([[0.1, 0.3, 0.5, 0.03, 0.2, 0.15, 0.65, 0.7, 0.9]], requires_grad=True)
Y = torchsort.soft_rank(X.view(1, -1),regularization='kl', regularization_strength=1e-4).view(-1)
Y = (Y - Y.min()) / (Y.max() - Y.min())
X_Loss = -torch.log(1 - torch.abs(Y - X.flatten()) + 1e-10)
torch.autograd.grad(X_Loss.mean(), X)
#--yields-> (tensor([[nan, nan, nan, nan, nan, nan, nan, nan, nan]]),)

Hi @pumplerod, thank you for your patience. I was able to successfully reproduce the nan gradients with you code above. It does seem like reducing increasing the regularization strength does fix the issue, but I will look into this more closely, as ideally the gradients should always be defined, even if they are just 0.