ValueError: group should be (n_features,)
duemig opened this issue · comments
It doesn't happen for me. Can you provide a full script to reproduce instead of a screenshot. Here is what I tried:
import numpy as np
from pyglmnet import GLM
group_ids = np.random.random(36)
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)
glm = GLM(distr="gaussian", group=group_ids, alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=X_train_trans, y=y_train)
can you modify my script to show me how can I make it fail? It works for me whether I use np.float32 or np.float64.
Yes, GridsearchCV used to work but I am not quite sure if it works on the latest version of sklearn.
import numpy as np
from pyglmnet import GLM
group_ids = np.float32(np.random.random(36))
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)
glm = GLM(distr="gaussian", group=np.float32(group_ids), alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=np.float32(X_train_trans), y=np.float32(y_train))
But with the sklearn GirdsearchCV as well ? so not GLMCV ?
Can I use the package as grouplasso for penalizing betas of a cubic spline representation
You need to use the development version for this. Unfortunately we have a release due for a long time. Can you try using the development version in the meanwhile?
But with the sklearn GirdsearchCV as well ? so not GLMCV ?
you can use both depending on your application.
Can I use the package as grouplasso for penalizing betas of a cubic spline representation
sorry I don't know exactly what you are trying to do. But yes, we do support group lasso.
Thank you for your answer.
I will try this tmr and let you know whether it works.
However, from the source code it seems that tscv (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) is not supported.
This would be super helpful for time series prediction tasks where k-fold etc. fail.
It would be nice for GLMCV to accept a cv object from sklearn but nothing stops you from using your own cv and using cross_val_score etc.
Just to be sure it's not a problem with the convergence criteria, can you set the max_iter lower and check the timings?
seems like the slowness is arising from the same root cause (group lasso). duplicated by #267