Using a different loss function than MSE
Gituhin opened this issue · comments
I am using Lassonet for my thesis, but I want to do so with a quantile loss function instead of Mean Squared Error. After going through the code and found a variable self.criterion set to MSE loss by default (interfaces.py, class LassoNetRegressor, line 566). After instantiating the class, I manually changed it self.criterion = quantile loss and trained the lassonet.
However, the loss didn't converge and remains high even after several epochs and the train assertion becomes false and it exits with error line 316, interfaces.py.
Can someone suggest a solution?
Can you print the full error message? In particular give the full traceback so that we can know whether it's a dense model or not.
How did you implement quantile loss?
Does a simple MLP work correctly?
The quantile loss is defined as follows:
def qloss(preds:torch.Tensor, target:torch.Tensor, quantile:float) -> torch.Tensor:
error = target - preds.squeeze(1) # to make preds shape = (n, )
return torch.max((quantile - 1) * error, quantile * error).sum()
Normal MLP Architecture (which is kind of replicating the residual network concept of lassonet):
class MLP(nn.Module):
def __init__(self, input_dim, hidden_dim):
super(MLP, self).__init__()
self.layer1 = nn.Sequential(nn.Linear(input_dim, hidden_dim),
nn.ReLU())
self.layer2 = nn.Linear(hidden_dim, 1)
self.skip = nn.Linear(input_dim, 1)
def forward(self, x):
x_skip = self.skip(x)
x = self.layer1(x)
x = self.layer2(x)
x = x + x_skip
return x
Upon training with above architecture it returns satisfactory results with final quantile loss of 3.65 which is also cross checked with other models.
Training Log for normal MLP
Epoch: 30, Avg loss: 35.2953067712634, Last Loss: 0.24375311014586343
Epoch: 60, Avg loss: 13.823530698644062, Last Loss: 0.11340944496327296
Epoch: 90, Avg loss: 8.871674853133099, Last Loss: 0.057913775270518206
Epoch: 120, Avg loss: 7.761587535173886, Last Loss: 0.03557623652153863
Epoch: 150, Avg loss: 6.519703884676147, Last Loss: 0.022024002944764035
Epoch: 180, Avg loss: 6.020249006876607, Last Loss: 0.02365970772291837
Epoch: 210, Avg loss: 6.4498613277430765, Last Loss: 0.02749163222099957
Epoch: 240, Avg loss: 6.112927409654815, Last Loss: 0.02440094661499677
Epoch: 270, Avg loss: 5.759016223177362, Last Loss: 0.03134409200932202
Epoch: 300, Avg loss: 6.001484612011204, Last Loss: 0.038777831313864565
Epoch: 330, Avg loss: 6.062069096857044, Last Loss: 0.02552318584706006
Epoch: 360, Avg loss: 4.3918836065898805, Last Loss: 0.03013314467455563
Epoch: 390, Avg loss: 4.768448385739695, Last Loss: 0.016655400668915646
Epoch: 420, Avg loss: 4.958467233045215, Last Loss: 0.01705446866684808
Epoch: 450, Avg loss: 5.734827264042265, Last Loss: 0.017732498323258294
Epoch: 480, Avg loss: 6.105442322863577, Last Loss: 0.04295464646843259
Epoch: 510, Avg loss: 5.3933428761242315, Last Loss: 0.03153319370533643
Epoch: 540, Avg loss: 5.652945175400031, Last Loss: 0.015559812007204864
Epoch: 570, Avg loss: 4.970648525136502, Last Loss: 0.029464684543864108
Epoch: 600, Avg loss: 3.6572357326837297, Last Loss: 0.0197497935273808
I am using lassonet in the following way:
from sklearn.preprocessing import StandardScaler
from lassonet import LassoNetRegressor
model = LassoNetRegressor(hidden_dims = (86,), val_size = 0, verbose = 2)
model.criterion = qloss
sc = StandardScaler()
X_train = torch.tensor(sc.fit_transform(X.T), dtype=torch.float32)
# X_train shape = (3953, 15), y_train.shape = (3953,)
y_train = torch.Tensor(y)
hist_obj = model.path(X_train, y_train)
After running for few epochs and for different lambdas it gave this error-
epoch: 13 loss: 2921739.5 epoch: 14 loss: 8718601.0 epoch: 15 loss: 11001912.0 epoch: 16 loss: 8222154.0 epoch: 0 loss: 8565492.0 epoch: 1 loss: 19698790.0 epoch: 2 loss: 8724698.0 Loss: 188280816.0 l2_regularization: 1392959.75 l2_regularization_skip: 46.82383728027344 Loss is 188280816.0 Did you normalize input? --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) in () 4 y_train = torch.Tensor(y) 5 ----> 6 hist_obj = model.path(X_train, y_train) 5 frames /usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback) 470 if self.model.selected_count() == 0: 471 break --> 472 last = self._train( 473 X_train, 474 y_train, /usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience) 329 return ans 330 --> 331 optimizer.step(closure) 332 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M) 333 print("epoch:", epoch) /usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in wrapper(*args, **kwargs) 138 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__) 139 with torch.autograd.profiler.record_function(profile_name): --> 140 out = func(*args, **kwargs) 141 obj._optimizer_step_code() 142 return out /usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py in _use_grad(self, *args, **kwargs) 21 try: 22 torch.set_grad_enabled(self.defaults['differentiable']) ---> 23 ret = func(self, *args, **kwargs) 24 finally: 25 torch.set_grad_enabled(prev_grad) /usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py in step(self, closure) 128 if closure is not None: 129 with torch.enable_grad(): --> 130 loss = closure() 131 132 for group in self.param_groups: /usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py in closure() 324 f"l2_regularization_skip: {model.l2_regularization_skip()}" 325 ) --> 326 assert False 327 ans.backward() 328 loss += ans.item() * len(batch) / n_train AssertionError:
Your traceback is incomplete. Can you set verbose = 2? You should have some message "Initialized dense model" along with information about the dense model.
Yeah so this is the information.
epoch: 927 loss: 29672.576171875 epoch: 928 loss: 29670.18359375 Initialized dense model 929 epochs, val_objective 2.97e+04, val_loss 2.97e+04, regularization 1.42e+00, l2_regularization 2.15e-01 epoch: 0 loss: 29667.796875 epoch: 1 loss: 6788969.5 Loss: 57149707255808.0 l2_regularization: 78822872.0 l2_regularization_skip: 6728.37109375 Loss is 57149707255808.0 Did you normalize input? --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) [](https://localhost:8080/#) in () 4 y_train = torch.Tensor(y) 5 ----> 6 hist_obj = model.path(X_train, y_train) 5 frames [/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in path(self, X, y, X_val, y_val, lambda_seq, lambda_max, return_state_dicts, callback) 470 if self.model.selected_count() == 0: 471 break --> 472 last = self._train( 473 X_train, 474 y_train, [/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in _train(self, X_train, y_train, X_val, y_val, batch_size, epochs, lambda_, optimizer, return_state_dict, patience) 329 return ans 330 --> 331 optimizer.step(closure) 332 model.prox(lambda_=lambda_ * optimizer.param_groups[0]["lr"], M=self.M) 333 print("epoch:", epoch) [/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in wrapper(*args, **kwargs) 138 profile_name = "Optimizer.step#{}.step".format(obj.__class__.__name__) 139 with torch.autograd.profiler.record_function(profile_name): --> 140 out = func(*args, **kwargs) 141 obj._optimizer_step_code() 142 return out [/usr/local/lib/python3.9/dist-packages/torch/optim/optimizer.py](https://localhost:8080/#) in _use_grad(self, *args, **kwargs) 21 try: 22 torch.set_grad_enabled(self.defaults['differentiable']) ---> 23 ret = func(self, *args, **kwargs) 24 finally: 25 torch.set_grad_enabled(prev_grad) [/usr/local/lib/python3.9/dist-packages/torch/optim/sgd.py](https://localhost:8080/#) in step(self, closure) 128 if closure is not None: 129 with torch.enable_grad(): --> 130 loss = closure() 131 132 for group in self.param_groups: [/usr/local/lib/python3.9/dist-packages/lassonet/interfaces.py](https://localhost:8080/#) in closure() 324 f"l2_regularization_skip: {model.l2_regularization_skip()}" 325 ) --> 326 assert False 327 ans.backward() 328 loss += ans.item() * len(batch) / n_train AssertionError:
It looks like your objective is already not good in the dense model.Initialized dense model: val_loss 2.97e+04
. Maybe you need to start from there to understand what is wrong? Compare your training with our _train
function.
Okay I will look into the paper once again to get a better clarity of the internal procedures and also see my loss function and other stuffs. I will get back if there is anything more which can be discussed. Thank you for your help and time.
My point is precisely that you don't need to dive deep in the paper. LassoNet starts the optimization from a dense model. This dense model seems to be badly fitted. Try to figure out why!
Hi Louis,
I figured it out, probably there was a problem with the batch size. Initially the whole dataset was passed but when I switched to mini batches loss began to stabilize and converged at the end. I also obtained the path information in history object for various lambdas.
Thank you for your help. I guess this issue can be closed.
Nice, I'm happy if I could help you!