scikit-learn-contrib / skglm

Fast and modular sklearn replacement for generalized linear models

Home Page:http://contrib.scikit-learn.org/skglm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FEAT - Add GroupLasso and GroupLogisticRegression estimators

mathurinm opened this issue · comments

Hello, following up on the issue from Celer, it would be great to be able to run GroupLasso with non-neg weights. The standard Lasso already uses positive=True. Would that be possible to implement for GroupLasso?

For my application I also need to run on sparse X input matrices. From what I have seen, some of the regressors can handle the csc matrices. Would that be possible for GroupLasso?

Hello,
Thanks a lot for your interest!

  • adding a group Lasso with positive weights should be "easy", we should be able to do it quickly (i.e. this week if we have time)
  • handling sparse matrices for the group Lasso is definitely doable, but might take a bit more time to check that the implementation is correct

Thank you, I am glad that it is doable. Is there anything I can do to help?

@tomaszkacprzak we can start by supporting sparse matrices for the GroupBCD solver. Do you think you could send a PR doing that ?

You'd need to implement construct_grad_sparse and bcd_epoch_sparse in solvers/group_bcd.py. They should do the same thing as their existing dense versions (same name without _sparse), but when X is a csc matrix, passed as a triplet X_data, X_indices, X_indptr

See current implementations for CD (not BCD) in solvers/anderson_cd.py: sparse version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L313 vs dense version https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L272

Then inside the solver we just have an if when doing the epoch of coordinate descent : https://github.com/scikit-learn-contrib/skglm/blob/main/skglm/solvers/anderson_cd.py#L147

Do not hesitate to start a very WIP PR, so that we can discuss over there!

In the meantime, I will open a separate PR for the positive weights

Do we want a GroupLasso estimator as well ?

I think it's a popular enough model such that the visibility benefits of implementing its class overweighs the cost of maintenance: I am +1 on the GroupLasso estimator

@tomaszkacprzak did you try our implementation of positive for the Group Lasso ? ANd would you be interested in implementing sparse X support for group ?

@mathurinm I just saw it, going to give it a try. Then I'll start the PR for sparse X. Thanks!

@mathurinm I noticed that in test_group.py L79 the random seed is set so that for positive=True all come out 0. When I change the random seed, some positive weights appear, and the error is thrown.

Example for random seed 12323

AssertionError: 
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 5 / 50 (10%)
Max absolute difference: 0.46942555
Max relative difference: 0.22505575
 x: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.183918,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.295971, 0.      , 0.      ,...
 y: array([0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.      , 0.      , 0.      ,
       0.      , 0.      , 0.      , 0.241598, 0.      , 0.      ,...

Any suggestions what can cause this?

Thanks for the report, this seems to show that the positive=True feature introduced in 1dcbfff was not correct.
I will investigate this week.

Yes indeed @QB3, I open a PR #225

This should be fixed now @tomaszkacprzak, thanks @Badr-MOUFAD

Great, I just checked it out, works well. Thanks @Badr-MOUFAD. I'll start a PR for sparse X.

Fixed by #228