glm-tools / pyglmnet

Python implementation of elastic-net regularized generalized linear models

Home Page:http://glm-tools.github.io/pyglmnet/

Repository from Github https://github.comglm-tools/pyglmnetRepository from Github https://github.comglm-tools/pyglmnet

Poisson Regression Same Results in R as in Python?

gaussianZeroOne opened this issue · comments

commented

I am trying to implement the same code from R using glmnet and in Python. The issue is that I am using Poisson regression.

In R, I want to run the following regression. A vector y on 3 variables. It is a simple example but I want to keep it simple to easily compare results in Python:

library(glmnet)
y= c(2.631007e-13, 2.953114e-09, 1.537151e-03)

covar = diag(3)

mod <- glmnet(x=covar, y=y, family="poisson")

The above produces the following coefficients: [1] 0.000000 0.000000 7.557667. Only the 3rd variable has any weight from the penalization.

In Python, using this package:

from pyglmnet import GLM

import numpy as np

glm = GLM(alpha=1, distr='poisson', tol=1e-07)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

This gives me [0,0,0] for the 3 coefficients.

I assume I may need to use K-Fold validation, and also search across various lambda (penalization) parameters. However, I am unsure how exactly to do this and if this is actually a feature of pyglmnet?

Has anyone been able to figure out how to run glmnet with poisson in Python to produce exactly the same results?

yes it is! You should use GLMCV to search across various lambda. Can you make sure that these are matched between R and Python?

commented

Thank you @jasmainak.

I've tried the following in pyglmnet:

from pyglmnet import GLMCV

glm = GLMCV(alpha=1, distr='poisson', tol=1e-07, cv=2)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

However it produces [0, 0, 0]. I am unable to tweak the pyglmnet to produce the exact same code as in R's glmnet.

Could it be that the lambda sequences are different? I noticed in R's glmneet it produces the sequence of coefficients through each iteration. Is there a way to do this as well in pyglmnet to see if the differences are due to different starting points?

@gaussianZeroOne could you also provide the R code that you have tried? I am not that familiar with R but I think @titipata might be able to help.

@jasmainak, I edited the code above which should have the R code to reproduce.

And yes, it seems like there is something going on with the optimizer. I tried the same example and couldn't get the same result as in R. Maybe @pavanramkumar knows more in detail why it does not give the same result in this case.

cool thanks, I'll take a look tomorrow!