Large dataset error

Question

Large dataset error

ZeonlungPun opened this issue 6 months ago · comments

my feature number is 30000, it get an error :
Loss is 511581280.0
Did you normalize input?
Choosing lambda with cross-validation: 0%| | 0/5 [01:12<?, ?it/s]
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 3, in
path = model.fit( x, y)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 744, in fit
self.path(X, y, return_state_dicts=False)
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 679, in path
path = super().path(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 472, in path
last = self._train(
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 331, in _train
optimizer.step(closure)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 373, in wrapper
out = func(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
ret = func(self, *args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/optim/sgd.py", line 66, in step
loss = closure()
File "/opt/conda/lib/python3.10/site-packages/lassonet/interfaces.py", line 326, in closure
assert False
AssertionError

however,when the feature number is 1000, it would not get this error

Louis Abraham · Answer 1 · Wed Jan 24 2024 17:01:40 GMT+0800 (China Standard Time)

This is because of a wrong condition I used in a previous version of pydivsufsort. I used to check loss == loss + 1 to detect infinite numbers instead of torch.isfinite(loss).
Could you test with the latest version I just uploaded to PyPI?

That being said, your loss still looks very large. Did you actually normalize inputs?

ZeonLungPun · Answer 2 · Wed Jan 24 2024 17:49:28 GMT+0800 (China Standard Time)

of course i have normalized inputs, and i use these codes :

from lassonet import LassoNetRegressorCV
model = LassoNetRegressorCV() # LassoNetRegressorCV
path = model.fit(X_train, y_train)
print("Best model scored", model.score(X_test, y_test))
print("Lambda =", model.best_lambda_)

however, my input's shape is (20000,30000)

Louis Abraham · Answer 3 · Wed Jan 24 2024 18:03:18 GMT+0800 (China Standard Time)

The number of samples is irrelevant as the MSE has reduction="mean".

Did you test with the latest version?

ZeonLungPun · Answer 4 · Thu Jan 25 2024 08:44:01 GMT+0800 (China Standard Time)

yes, i have tried the latest version; at the begining, the loss is normal; when the new fitting begin , the loss will be explosive :
……
epoch: 850
loss: 0.017978345975279808
epoch: 851
loss: 0.017944464460015297
epoch: 852
loss: 0.0179106704890728
epoch: 853
loss: 0.017876965925097466
epoch: 854
loss: 0.017843332141637802
epoch: 855
loss: 0.017809787765145302
epoch: 0
loss: 0.017776312306523323
epoch: 1
loss: 5.919191360473633
epoch: 2
loss: 245.20724487304688
epoch: 3
loss: 37423.44140625
epoch: 4
loss: 10632257.0
Loss is 3204740096.0
Did you normalize input?
Loss: 3204740096.0
l2_regularization: 0.3105020225048065
l2_regularization_skip: 575.5364379882812

Louis Abraham · Answer 5 · Thu Jan 25 2024 17:01:48 GMT+0800 (China Standard Time)

I think you are using an older version because the epoch: and loss: lines were removed from the previous version on PyPI. I just added some additional logging for the automatically selected value of lambda_start. Could you test again with:

pip install git+https://github.com/lasso-net/lassonet

and use verbose=2 as parameter?

ZeonLungPun · Answer 6 · Fri Jan 26 2024 11:05:53 GMT+0800 (China Standard Time)

i have follwed your tips:

but the same error happened:

Louis Abraham · Answer 7 · Sun Jan 28 2024 06:38:23 GMT+0800 (China Standard Time)

Could you try to manually set lambda_start? To some larger value like 100.

ZeonLungPun · Answer 8 · Sat Feb 03 2024 15:26:19 GMT+0800 (China Standard Time)

same error happened …… i think maybe is something related to the huge shape of dataset , i have tested that when the shape is (2000,3000), all the thing normal

Louis Abraham · Answer 9 · Wed Feb 14 2024 18:15:40 GMT+0800 (China Standard Time)

Can you post the logging output?

ElrondL · Answer 10 · Sun Apr 28 2024 20:10:10 GMT+0800 (China Standard Time)

Hey @louisabraham what else was changed in 0.0.15? After 0.0.15 LassoNetRegressor keeps returning 'None' for the lassoregressor model's state_dict, even though using the exact same settings 0.0.14 returns the model well. What were the updates between 14 and 15 in addition to the auto logging that could have caused this?

ZeonLungPun · Answer 11 · Sat May 04 2024 18:53:32 GMT+0800 (China Standard Time)

Loss is 15310032732160.0
Did you normalize input?
Traceback (most recent call last):
File "D:\anaconda\envs\newtorch\lib\site-packages\IPython\core\interactiveshell.py", line 3397, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in <cell line: 1>
runfile('D:/paper/npmcm2021d/read_select.py', wdir='D:/paper/npmcm2021d')
File "D:\pycharm\PyCharm Community Edition 2021.3.2\plugins\python-ce\helpers\pydev_pydev_bundle\pydev_umd.py", line 198, in runfile
pydev_imports.execfile(filename, global_vars, local_vars) # execute the script
File "D:\pycharm\PyCharm Community Edition 2021.3.2\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "D:/paper/npmcm2021d/read_select.py", line 30, in
path = model.path(X_train, y_train)
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 472, in path
last = self._train(
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 331, in _train
optimizer.step(closure)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\optim\optimizer.py", line 88, in wrapper
return func(*args, **kwargs)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "D:\anaconda\envs\newtorch\lib\site-packages\torch\optim\sgd.py", line 120, in step
loss = closure()
File "D:\anaconda\envs\newtorch\lib\site-packages\lassonet\interfaces.py", line 326, in closure
assert False
AssertionError