ValueError: group should be (n_features,)

Question

ValueError: group should be (n_features,)

duemig opened this issue 6 years ago · comments

I dont get this error

Mainak Jas · Answer 1 · Thu May 09 2019 02:23:07 GMT+0800 (China Standard Time)

It doesn't happen for me. Can you provide a full script to reproduce instead of a screenshot. Here is what I tried:

import numpy as np
from pyglmnet import GLM

group_ids = np.random.random(36)
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)

glm = GLM(distr="gaussian", group=group_ids, alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=X_train_trans, y=y_train)

David Dümig · Answer 2 · Thu May 09 2019 03:15:40 GMT+0800 (China Standard Time)

I found it

now it works.

It is due to the datatype (np.float32 vs np.float64)

Could you fix that?

Can I use sklearn GridsearchCV to determine the parameters??

Thanks

Best,
David

Mainak Jas · Answer 3 · Thu May 09 2019 03:23:49 GMT+0800 (China Standard Time)

can you modify my script to show me how can I make it fail? It works for me whether I use np.float32 or np.float64.

Yes, GridsearchCV used to work but I am not quite sure if it works on the latest version of sklearn.

David Dümig · Answer 4 · Thu May 09 2019 03:34:33 GMT+0800 (China Standard Time)

import numpy as np
from pyglmnet import GLM

group_ids = np.float32(np.random.random(36))
X_train_trans = np.random.random((42603, 36))
y_train = np.random.random(42603)

glm = GLM(distr="gaussian", group=np.float32(group_ids), alpha=0.05, reg_lambda=0.2, max_iter=1000)
glm.fit(X=np.float32(X_train_trans), y=np.float32(y_train))

But with the sklearn GirdsearchCV as well ? so not GLMCV ?

Can I use the package as grouplasso for penalizing betas of a cubic spline representation

David Dümig · Answer 5 · Thu May 09 2019 03:49:23 GMT+0800 (China Standard Time)

Is there already an open issue for the following

Or am I doing something wrong ?

If I install pyglmnet I get version 1.0.0

David Dümig · Answer 6 · Thu May 09 2019 04:01:17 GMT+0800 (China Standard Time)

Does not seem to work ;(

Mainak Jas · Answer 7 · Thu May 09 2019 04:11:58 GMT+0800 (China Standard Time)

You need to use the development version for this. Unfortunately we have a release due for a long time. Can you try using the development version in the meanwhile?

Mainak Jas · Answer 8 · Thu May 09 2019 04:14:35 GMT+0800 (China Standard Time)

But with the sklearn GirdsearchCV as well ? so not GLMCV ?

you can use both depending on your application.

Can I use the package as grouplasso for penalizing betas of a cubic spline representation

sorry I don't know exactly what you are trying to do. But yes, we do support group lasso.

David Dümig · Answer 9 · Thu May 09 2019 04:30:11 GMT+0800 (China Standard Time)

Thank you for your answer.

I will try this tmr and let you know whether it works.

However, from the source code it seems that tscv (https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html) is not supported.

This would be super helpful for time series prediction tasks where k-fold etc. fail.

Mainak Jas · Answer 10 · Thu May 09 2019 09:24:04 GMT+0800 (China Standard Time)

It would be nice for GLMCV to accept a cv object from sklearn but nothing stops you from using your own cv and using cross_val_score etc.

David Dümig · Answer 11 · Thu May 09 2019 15:56:58 GMT+0800 (China Standard Time)

Hey,

Is there a reason why it becomes so slow when I use the Github version?

GridsearchCV seems to work

But it is super slow ;(

Any suggestions ? For my purpose its infeasible.

Mainak Jas · Answer 12 · Thu May 09 2019 20:35:08 GMT+0800 (China Standard Time)

Just to be sure it's not a problem with the convergence criteria, can you set the max_iter lower and check the timings?

Pavan Ramkumar · Answer 13 · Wed Oct 16 2019 02:57:16 GMT+0800 (China Standard Time)

seems like the slowness is arising from the same root cause (group lasso). duplicated by #267