Regularization

Question

Regularization

rossdiener opened this issue 7 years ago · comments

It should be straightforward to implement an L2 regularization of linear ordinal regression. Doing so for L1 and/or then writing tests will be more challenging.

Steven Wu · Answer 1 · Sat Feb 24 2018 03:52:51 GMT+0800 (China Standard Time)

Paper on the implementation of regularizing ordinal models in R: https://arxiv.org/pdf/1706.05003.pdf

RobF · Answer 2 · Tue Jun 02 2020 22:06:15 GMT+0800 (China Standard Time)

Hi,
I am very interested in a LASSO ordinal logistic regression implementation.
Is it on the agenda ?
If not, I will be happy to contribute to help implementing it!

Ross Diener · Answer 3 · Tue Jun 02 2020 22:55:37 GMT+0800 (China Standard Time)

Hy @RobeeF! Welcome to bevel. Right now I'm a bit busy and I don't have any plans to implement regularization. However, I'd love to see this project gain some momentum and highly encourage you to contribute. If you do, I can commit to reviewing the pull requests and addressing questions or issues you might encounter.

Just a thought: Rather than LASSO, it's probably easier to implement an L2 regularization first. A lot of the code in Bevel relies on explicit (hand calculated) formulas for the derivative of the ordinal regression loss function. There happens to be a simple formula for the derivative of the L2 loss function, so that would be much easier to add to existing code.

RobF · Answer 4 · Wed Jun 03 2020 16:16:20 GMT+0800 (China Standard Time)

Hi @rossdiener,
I get the point of starting with L2 regularization.
For gradient computing, have you tried to rely on automatic differentiation tools rather than hand-calculated gradients ?
I can see that you are using numdifftools to compute the Jacobian but not for log-likelihood gradient computing.
Is there a reason for this ?

Ross Diener · Answer 5 · Thu Jun 04 2020 09:21:59 GMT+0800 (China Standard Time)

Hey @RobeeF - Seems like you're pretty familiar with the codebase. That's awesome.

The reason we didn't use automatic differentiation tools is because there is a general formula for the derivative of the log likelihood for linear ordinal regression. We might as well use the formula since it's always going to compute faster than a numerical tool and it's immune to numerical instability.

The formula for the second derivative exists, and in an ideal world we would calculate it by hand and implement it explicitly according to the formulas. However, it's a pain in the ass to do by hand because the formulas are messy and there's a lot of them (a whole hessian matrix). So we numerically differentiate the explicit formula for first derivative using numdifftools to get the second derivative.

RobF · Answer 6 · Thu Jun 04 2020 15:46:19 GMT+0800 (China Standard Time)

Hi @rossdiener,
In my experience, autodiff tools computations can be faster than the actual handwritten gradient (even if this gradient is explicit).

Maybe I will give it a shot and send you the running time comparison. It could also enable to develop new code faster !

Have a nice day