lasso-net / lassonet

Feature selection in neural networks

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using a different loss function than MSE

Gituhin opened this issue · comments

I am using Lassonet for my thesis, but I want to do so with a quantile loss function instead of Mean Squared Error. After going through the code and found a variable self.criterion set to MSE loss by default (interfaces.py, class LassoNetRegressor, line 566). After instantiating the class, I manually changed it self.criterion = quantile loss and trained the lassonet.

However, the loss didn't converge and remains high even after several epochs and the train assertion becomes false and it exits with error line 316, interfaces.py.
Can someone suggest a solution?

Can you print the full error message? In particular give the full traceback so that we can know whether it's a dense model or not.

How did you implement quantile loss?

Does a simple MLP work correctly?

The quantile loss is defined as follows:

def qloss(preds:torch.Tensor, target:torch.Tensor, quantile:float) -> torch.Tensor:
   error = target - preds.squeeze(1) # to make preds shape = (n, )
   return torch.max((quantile - 1) * error, quantile * error).sum()

Normal MLP Architecture (which is kind of replicating the residual network concept of lassonet):

class MLP(nn.Module):
  def __init__(self, input_dim, hidden_dim):
    super(MLP, self).__init__()
    self.layer1 = nn.Sequential(nn.Linear(input_dim, hidden_dim),
                                nn.ReLU())
    self.layer2 = nn.Linear(hidden_dim, 1)
    self.skip = nn.Linear(input_dim, 1)
      
  def forward(self, x):
    x_skip = self.skip(x)
    x = self.layer1(x)
    x = self.layer2(x)
    x = x + x_skip
    return x

Upon training with above architecture it returns satisfactory results with final quantile loss of 3.65 which is also cross checked with other models.

Training Log for normal MLP

Epoch: 30, Avg loss: 35.2953067712634, Last Loss: 0.24375311014586343
Epoch: 60, Avg loss: 13.823530698644062, Last Loss: 0.11340944496327296
Epoch: 90, Avg loss: 8.871674853133099, Last Loss: 0.057913775270518206
Epoch: 120, Avg loss: 7.761587535173886, Last Loss: 0.03557623652153863
Epoch: 150, Avg loss: 6.519703884676147, Last Loss: 0.022024002944764035
Epoch: 180, Avg loss: 6.020249006876607, Last Loss: 0.02365970772291837
Epoch: 210, Avg loss: 6.4498613277430765, Last Loss: 0.02749163222099957
Epoch: 240, Avg loss: 6.112927409654815, Last Loss: 0.02440094661499677
Epoch: 270, Avg loss: 5.759016223177362, Last Loss: 0.03134409200932202
Epoch: 300, Avg loss: 6.001484612011204, Last Loss: 0.038777831313864565
Epoch: 330, Avg loss: 6.062069096857044, Last Loss: 0.02552318584706006
Epoch: 360, Avg loss: 4.3918836065898805, Last Loss: 0.03013314467455563
Epoch: 390, Avg loss: 4.768448385739695, Last Loss: 0.016655400668915646
Epoch: 420, Avg loss: 4.958467233045215, Last Loss: 0.01705446866684808
Epoch: 450, Avg loss: 5.734827264042265, Last Loss: 0.017732498323258294
Epoch: 480, Avg loss: 6.105442322863577, Last Loss: 0.04295464646843259
Epoch: 510, Avg loss: 5.3933428761242315, Last Loss: 0.03153319370533643
Epoch: 540, Avg loss: 5.652945175400031, Last Loss: 0.015559812007204864
Epoch: 570, Avg loss: 4.970648525136502, Last Loss: 0.029464684543864108
Epoch: 600, Avg loss: 3.6572357326837297, Last Loss: 0.0197497935273808


I am using lassonet in the following way:

from sklearn.preprocessing import StandardScaler
from lassonet import LassoNetRegressor

model = LassoNetRegressor(hidden_dims = (86,), val_size = 0, verbose = 2)
model.criterion = qloss
sc = StandardScaler()
X_train = torch.tensor(sc.fit_transform(X.T), dtype=torch.float32)
# X_train shape = (3953, 15), y_train.shape = (3953,)
y_train = torch.Tensor(y)

hist_obj = model.path(X_train, y_train)

After running for few epochs and for different lambdas it gave this error-

epoch: 13
loss: 2921739.5
epoch: 14
loss: 8718601.0
epoch: 15
loss: 11001912.0
epoch: 16
loss: 8222154.0
epoch: 0
loss: 8565492.0
epoch: 1
loss: 19698790.0
epoch: 2
loss: 8724698.0
Loss: 188280816.0
l2_regularization: 1392959.75
l2_regularization_skip: 46.82383728027344
Loss is 188280816.0
Did you normalize input?
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
 in ()
      4 y_train = torch.Tensor(y)
      5 
----> 6 hist_obj = model.path(X_train, y_train)

5 frames
/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
    470             if self.model.selected_count() == 0:
    471                 break
--> 472             last = self._train(
    473                 X_train,
    474                 y_train,

/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience)
    329                     return ans
    330 
--> 331                 optimizer.step(closure)
    332                 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M)
    333             print("epoch:", epoch)

/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs)
    138                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
    139                 with torch.autograd.profiler.record_function(profile_name):
--> 140                     out = func(*args, **kwargs)
    141                     obj._optimizer_step_code()
    142                     return out

/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in _use_grad(self, *args, **kwargs)
     21         try:
     22             torch.set_grad_enabled(self.defaults['differentiable'])
---> 23             ret = func(self, *args, **kwargs)
     24         finally:
     25             torch.set_grad_enabled(prev_grad)

/usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py in step(self, closure)
    128         if closure is not None:
    129             with torch.enable_grad():
--> 130                 loss = closure()
    131 
    132         for group in self.param_groups:

/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in closure()
    324                             f"l2_regularization_skip: {model.l2_regularization_skip()}"
    325                         )
--> 326                         assert False
    327                     ans.backward()
    328                     loss += ans.item() * len(batch) / n_train

AssertionError:

Your traceback is incomplete. Can you set verbose = 2? You should have some message "Initialized dense model" along with information about the dense model.

Yeah so this is the information.

epoch: 927
loss: 29672.576171875
epoch: 928
loss: 29670.18359375
Initialized dense model
929 epochs, val_objective 2.97e+04, val_loss 2.97e+04, regularization 1.42e+00, l2_regularization 2.15e-01
epoch: 0
loss: 29667.796875
epoch: 1
loss: 6788969.5
Loss: 57149707255808.0
l2_regularization: 78822872.0
l2_regularization_skip: 6728.37109375
Loss is 57149707255808.0
Did you normalize input?
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[](https://localhost:8080/#) in ()
      4 y_train = torch.Tensor(y)
      5 
----> 6 hist_obj = model.path(X_train, y_train)

5 frames
[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback)
    470             if self.model.selected_count() == 0:
    471                 break
--> 472             last = self._train(
    473                 X_train,
    474                 y_train,

[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience)
    329                     return ans
    330 
--> 331                 optimizer.step(closure)
    332                 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M)
    333             print("epoch:", epoch)

[/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in wrapper(*args, **kwargs)
    138                 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__)
    139                 with torch.autograd.profiler.record_function(profile_name):
--> 140                     out = func(*args, **kwargs)
    141                     obj._optimizer_step_code()
    142                     return out

[/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in _use_grad(self, *args, **kwargs)
     21         try:
     22             torch.set_grad_enabled(self.defaults['differentiable'])
---> 23             ret = func(self, *args, **kwargs)
     24         finally:
     25             torch.set_grad_enabled(prev_grad)

[/usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py](https://localhost:8080/#) in step(self, closure)
    128         if closure is not None:
    129             with torch.enable_grad():
--> 130                 loss = closure()
    131 
    132         for group in self.param_groups:

[/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in closure()
    324                             f"l2_regularization_skip: {model.l2_regularization_skip()}"
    325                         )
--> 326                         assert False
    327                     ans.backward()
    328                     loss += ans.item() * len(batch) / n_train

AssertionError:

It looks like your objective is already not good in the dense model.Initialized dense model: val_loss 2.97e+04. Maybe you need to start from there to understand what is wrong? Compare your training with our _train function.

Okay I will look into the paper once again to get a better clarity of the internal procedures and also see my loss function and other stuffs. I will get back if there is anything more which can be discussed. Thank you for your help and time.

My point is precisely that you don't need to dive deep in the paper. LassoNet starts the optimization from a dense model. This dense model seems to be badly fitted. Try to figure out why!

Hi Louis,
I figured it out, probably there was a problem with the batch size. Initially the whole dataset was passed but when I switched to mini batches loss began to stabilize and converged at the end. I also obtained the path information in history object for various lambdas.

Thank you for your help. I guess this issue can be closed.

Nice, I'm happy if I could help you!