Shopify / bevel

Ordinal regression in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Regularization

rossdiener opened this issue · comments

It should be straightforward to implement an L2 regularization of linear ordinal regression. Doing so for L1 and/or then writing tests will be more challenging.

Paper on the implementation of regularizing ordinal models in R: https://arxiv.org/pdf/1706.05003.pdf

commented

Hi,
I am very interested in a LASSO ordinal logistic regression implementation.
Is it on the agenda ?
If not, I will be happy to contribute to help implementing it!

Hy @RobeeF! Welcome to bevel. Right now I'm a bit busy and I don't have any plans to implement regularization. However, I'd love to see this project gain some momentum and highly encourage you to contribute. If you do, I can commit to reviewing the pull requests and addressing questions or issues you might encounter.

Just a thought: Rather than LASSO, it's probably easier to implement an L2 regularization first. A lot of the code in Bevel relies on explicit (hand calculated) formulas for the derivative of the ordinal regression loss function. There happens to be a simple formula for the derivative of the L2 loss function, so that would be much easier to add to existing code.

commented

Hi @rossdiener,
I get the point of starting with L2 regularization.
For gradient computing, have you tried to rely on automatic differentiation tools rather than hand-calculated gradients ?
I can see that you are using numdifftools to compute the Jacobian but not for log-likelihood gradient computing.
Is there a reason for this ?

Hey @RobeeF - Seems like you're pretty familiar with the codebase. That's awesome.

The reason we didn't use automatic differentiation tools is because there is a general formula for the derivative of the log likelihood for linear ordinal regression. We might as well use the formula since it's always going to compute faster than a numerical tool and it's immune to numerical instability.

The formula for the second derivative exists, and in an ideal world we would calculate it by hand and implement it explicitly according to the formulas. However, it's a pain in the ass to do by hand because the formulas are messy and there's a lot of them (a whole hessian matrix). So we numerically differentiate the explicit formula for first derivative using numdifftools to get the second derivative.

commented

Hi @rossdiener,
In my experience, autodiff tools computations can be faster than the actual handwritten gradient (even if this gradient is explicit).

Maybe I will give it a shot and send you the running time comparison. It could also enable to develop new code faster !

Have a nice day