FEAT - Add GroupLasso and GroupLogisticRegression estimators

Question

FEAT - Add GroupLasso and GroupLogisticRegression estimators

mathurinm opened this issue 9 months ago · comments

Tomasz Kacprzak · Answer 1 · Wed Jan 10 2024 03:22:03 GMT+0800 (China Standard Time)

Hello, following up on the issue from Celer, it would be great to be able to run GroupLasso with non-neg weights. The standard Lasso already uses positive=True. Would that be possible to implement for GroupLasso?

For my application I also need to run on sparse X input matrices. From what I have seen, some of the regressors can handle the csc matrices. Would that be possible for GroupLasso?

Quentin Bertrand · Answer 2 · Wed Jan 10 2024 06:15:36 GMT+0800 (China Standard Time)

Hello,
Thanks a lot for your interest!

adding a group Lasso with positive weights should be "easy", we should be able to do it quickly (i.e. this week if we have time)
handling sparse matrices for the group Lasso is definitely doable, but might take a bit more time to check that the implementation is correct

Tomasz Kacprzak · Answer 3 · Wed Jan 10 2024 16:49:01 GMT+0800 (China Standard Time)

Thank you, I am glad that it is doable. Is there anything I can do to help?

mathurinm · Answer 4 · Thu Jan 11 2024 00:37:19 GMT+0800 (China Standard Time)

@tomaszkacprzak we can start by supporting sparse matrices for the GroupBCD solver. Do you think you could send a PR doing that ?

You'd need to implement construct_grad_sparse and bcd_epoch_sparse in solvers/group_bcd.py. They should do the same thing as their existing dense versions (same name without _sparse), but when X is a csc matrix, passed as a triplet X_data, X_indices, X_indptr

See current implementations for CD (not BCD) in solvers/anderson_cd.py: sparse version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L313 vs dense version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L272

Then inside the solver we just have an if when doing the epoch of coordinate descent : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L147

Do not hesitate to start a very WIP PR, so that we can discuss over there!

Quentin Bertrand · Answer 5 · Thu Jan 11 2024 01:11:59 GMT+0800 (China Standard Time)

In the meantime, I will open a separate PR for the positive weights

Quentin Bertrand · Answer 6 · Sat Jan 13 2024 04:05:37 GMT+0800 (China Standard Time)

Do we want a GroupLasso estimator as well ?

mathurinm · Answer 7 · Mon Jan 15 2024 17:23:13 GMT+0800 (China Standard Time)

I think it's a popular enough model such that the visibility benefits of implementing its class overweighs the cost of maintenance: I am +1 on the GroupLasso estimator

mathurinm · Answer 8 · Wed Feb 14 2024 16:01:31 GMT+0800 (China Standard Time)

@tomaszkacprzak did you try our implementation of positive for the Group Lasso ? ANd would you be interested in implementing sparse X support for group ?

Tomasz Kacprzak · Answer 9 · Mon Mar 04 2024 21:00:46 GMT+0800 (China Standard Time)

@mathurinm I just saw it, going to give it a try. Then I'll start the PR for sparse X. Thanks!

Tomasz Kacprzak · Answer 10 · Mon Mar 04 2024 21:56:51 GMT+0800 (China Standard Time)

@mathurinm I noticed that in test_group.py L79 the random seed is set so that for positive=True all come out 0. When I change the random seed, some positive weights appear, and the error is thrown.

Example for random seed 12323

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 5 / 50 (10%)
Max absolute difference: 0.46942555
Max relative difference: 0.22505575
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.183918,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.295971, 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.241598, 0.      , 0.      ,...

Any suggestions what can cause this?

Quentin Bertrand · Answer 11 · Mon Mar 04 2024 22:56:35 GMT+0800 (China Standard Time)

Thanks for the report, this seems to show that the positive=True feature introduced in 1dcbfff was not correct.
I will investigate this week.

Badr MOUFAD · Answer 12 · Tue Mar 05 2024 00:22:49 GMT+0800 (China Standard Time)

Yes indeed @QB3, I open a PR #225

mathurinm · Answer 13 · Wed Mar 06 2024 20:56:41 GMT+0800 (China Standard Time)

This should be fixed now @tomaszkacprzak, thanks @Badr-MOUFAD

Tomasz Kacprzak · Answer 14 · Thu Mar 07 2024 02:59:33 GMT+0800 (China Standard Time)

Great, I just checked it out, works well. Thanks @Badr-MOUFAD. I'll start a PR for sparse X.

mathurinm · Answer 15 · Fri May 31 2024 19:09:50 GMT+0800 (China Standard Time)

Fixed by #228