Using a different loss function than MSE

Question

Using a different loss function than MSE

Gituhin opened this issue a year ago · comments

I am using Lassonet for my thesis, but I want to do so with a quantile loss function instead of Mean Squared Error. After going through the code and found a variable self.criterion set to MSE loss by default (interfaces.py, class LassoNetRegressor, line 566). After instantiating the class, I manually changed it self.criterion = quantile loss and trained the lassonet.

However, the loss didn't converge and remains high even after several epochs and the train assertion becomes false and it exits with error line 316, interfaces.py.
Can someone suggest a solution?

Louis Abraham · Answer 1 · Thu Mar 23 2023 19:50:48 GMT+0800 (China Standard Time)

Can you print the full error message? In particular give the full traceback so that we can know whether it's a dense model or not.

How did you implement quantile loss?

Does a simple MLP work correctly?

Tuhin Subhra De · Answer 2 · Sat Apr 01 2023 03:36:59 GMT+0800 (China Standard Time)

The quantile loss is defined as follows:

def qloss(preds:torch.Tensor, target:torch.Tensor, quantile:float) -> torch.Tensor:
   error = target - preds.squeeze(1) # to make preds shape = (n, )
   return torch.max((quantile - 1) * error, quantile * error).sum()

Normal MLP Architecture (which is kind of replicating the residual network concept of lassonet):

class MLP(nn.Module):
  def __init__(self, input_dim, hidden_dim):
    super(MLP, self).__init__()
    self.layer1 = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                                nn.ReLU())
    self.layer2 = nn.Linear(hidden_dim, 1)
    self.skip = nn.Linear(input_dim, 1)
      
  def forward(self, x):
    x_skip = self.skip(x)
    x = self.layer1(x)
    x = self.layer2(x)
    x = x + x_skip
    return x

Upon training with above architecture it returns satisfactory results with final quantile loss of 3.65 which is also cross checked with other models.

Training Log for normal MLP

Epoch: 30, Avg loss: 35.2953067712634, Last Loss: 0.24375311014586343
Epoch: 60, Avg loss: 13.823530698644062, Last Loss: 0.11340944496327296
Epoch: 90, Avg loss: 8.871674853133099, Last Loss: 0.057913775270518206
Epoch: 120, Avg loss: 7.761587535173886, Last Loss: 0.03557623652153863
Epoch: 150, Avg loss: 6.519703884676147, Last Loss: 0.022024002944764035
Epoch: 180, Avg loss: 6.020249006876607, Last Loss: 0.02365970772291837
Epoch: 210, Avg loss: 6.4498613277430765, Last Loss: 0.02749163222099957
Epoch: 240, Avg loss: 6.112927409654815, Last Loss: 0.02440094661499677
Epoch: 270, Avg loss: 5.759016223177362, Last Loss: 0.03134409200932202
Epoch: 300, Avg loss: 6.001484612011204, Last Loss: 0.038777831313864565
Epoch: 330, Avg loss: 6.062069096857044, Last Loss: 0.02552318584706006
Epoch: 360, Avg loss: 4.3918836065898805, Last Loss: 0.03013314467455563
Epoch: 390, Avg loss: 4.768448385739695, Last Loss: 0.016655400668915646
Epoch: 420, Avg loss: 4.958467233045215, Last Loss: 0.01705446866684808
Epoch: 450, Avg loss: 5.734827264042265, Last Loss: 0.017732498323258294
Epoch: 480, Avg loss: 6.105442322863577, Last Loss: 0.04295464646843259
Epoch: 510, Avg loss: 5.3933428761242315, Last Loss: 0.03153319370533643
Epoch: 540, Avg loss: 5.652945175400031, Last Loss: 0.015559812007204864
Epoch: 570, Avg loss: 4.970648525136502, Last Loss: 0.029464684543864108
Epoch: 600, Avg loss: 3.6572357326837297, Last Loss: 0.0197497935273808

I am using lassonet in the following way:

from sklearn.preprocessing import StandardScaler
from lassonet import LassoNetRegressor

model = LassoNetRegressor(hidden_dims = (86,), val_size = 0, verbose = 2)
model.criterion = qloss
sc = StandardScaler()
X_train = torch.tensor(sc.fit_transform(X.T), dtype=torch.float32)
# X_train shape = (3953, 15), y_train.shape = (3953,)
y_train = torch.Tensor(y)

hist_obj = model.path(X_train, y_train)

After running for few epochs and for different lambdas it gave this error-

epoch: 13
loss: 2921739.5
epoch: 14
loss: 8718601.0
epoch: 15
loss: 11001912.0
epoch: 16
loss: 8222154.0
epoch: 0
loss: 8565492.0
epoch: 1
loss: 19698790.0
epoch: 2
loss: 8724698.0
Loss: 188280816.0
l2_regularization: 1392959.75
l2_regularization_skip: 46.82383728027344
Loss is 188280816.0
Did you normalize input?
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
 in ()
      4 y_train = torch.Tensor(y)
      5 
----> 6 hist_obj = model.path(X_train, y_train)

5 frames
/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
    470             if self.model.selected_count() == 0:
    471                 break
--> 472             last = self._train(
    473                 X_train,
    474                 y_train,

/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience)
    329                     return ans
    330 
--> 331                 optimizer.step(closure)
    332                 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M)
    333             print("epoch:", epoch)

/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
    138                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
    139                 with torch.autograd.profiler.record_function(profile_name):
--> 140                     out = func(*args, **kwargs)
    141                     obj._optimizer_step_code()
    142                     return out

/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in _use_grad(self, *args, **kwargs)
     21         try:
     22             torch.set_grad_enabled(self.defaults['differentiable'])
---> 23             ret = func(self, *args, **kwargs)
     24         finally:
     25             torch.set_grad_enabled(prev_grad)

/usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py in step(self, closure)
    128         if closure is not None:
    129             with torch.enable_grad():
--> 130                 loss = closure()
    131 
    132         for group in self.param_groups:

/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in closure()
    324                             f"l2_regularization_skip: {model.l2_regularization_skip()}"
    325                         )
--> 326                         assert False
    327                     ans.backward()
    328                     loss += ans.item() * len(batch) / n_train

AssertionError:

Louis Abraham · Answer 3 · Sat Apr 01 2023 06:19:54 GMT+0800 (China Standard Time)

Your traceback is incomplete. Can you set verbose = 2? You should have some message "Initialized dense model" along with information about the dense model.

Tuhin Subhra De · Answer 4 · Sat Apr 01 2023 14:26:13 GMT+0800 (China Standard Time)

Yeah so this is the information.

epoch: 927
loss: 29672.576171875
epoch: 928
loss: 29670.18359375
Initialized dense model
929 epochs, val_objective 2.97e+04, val_loss 2.97e+04, regularization 1.42e+00, l2_regularization 2.15e-01
epoch: 0
loss: 29667.796875
epoch: 1
loss: 6788969.5
Loss: 57149707255808.0
l2_regularization: 78822872.0
l2_regularization_skip: 6728.37109375
Loss is 57149707255808.0
Did you normalize input?
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[](https://localhost:8080/#) in ()
      4 y_train = torch.Tensor(y)
      5 
----> 6 hist_obj = model.path(X_train, y_train)

5 frames
[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
    470             if self.model.selected_count() == 0:
    471                 break
--> 472             last = self._train(
    473                 X_train,
    474                 y_train,

[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience)
    329                     return ans
    330 
--> 331                 optimizer.step(closure)
    332                 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M)
    333             print("epoch:", epoch)

[/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    138                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
    139                 with torch.autograd.profiler.record_function(profile_name):
--> 140                     out = func(*args, **kwargs)
    141                     obj._optimizer_step_code()
    142                     return out

[/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in _use_grad(self, *args, **kwargs)
     21         try:
     22             torch.set_grad_enabled(self.defaults['differentiable'])
---> 23             ret = func(self, *args, **kwargs)
     24         finally:
     25             torch.set_grad_enabled(prev_grad)

[/usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py](https://localhost:8080/#) in step(self, closure)
    128         if closure is not None:
    129             with torch.enable_grad():
--> 130                 loss = closure()
    131 
    132         for group in self.param_groups:

[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in closure()
    324                             f"l2_regularization_skip: {model.l2_regularization_skip()}"
    325                         )
--> 326                         assert False
    327                     ans.backward()
    328                     loss += ans.item() * len(batch) / n_train

AssertionError:

Louis Abraham · Answer 5 · Sun Apr 02 2023 19:50:11 GMT+0800 (China Standard Time)

It looks like your objective is already not good in the dense model.Initialized dense model: val_loss 2.97e+04. Maybe you need to start from there to understand what is wrong? Compare your training with our _train function.

Tuhin Subhra De · Answer 6 · Sun Apr 02 2023 20:30:36 GMT+0800 (China Standard Time)

Okay I will look into the paper once again to get a better clarity of the internal procedures and also see my loss function and other stuffs. I will get back if there is anything more which can be discussed. Thank you for your help and time.

Louis Abraham · Answer 7 · Mon Apr 03 2023 16:27:09 GMT+0800 (China Standard Time)

My point is precisely that you don't need to dive deep in the paper. LassoNet starts the optimization from a dense model. This dense model seems to be badly fitted. Try to figure out why!

Tuhin Subhra De · Answer 8 · Fri Apr 07 2023 01:50:54 GMT+0800 (China Standard Time)

Hi Louis,
I figured it out, probably there was a problem with the batch size. Initially the whole dataset was passed but when I switched to mini batches loss began to stabilize and converged at the end. I also obtained the path information in history object for various lambdas.

Thank you for your help. I guess this issue can be closed.

Louis Abraham · Answer 9 · Wed Apr 26 2023 20:50:06 GMT+0800 (China Standard Time)

Nice, I'm happy if I could help you!