Poisson Regression Same Results in R as in Python?

Question

Poisson Regression Same Results in R as in Python?

gaussianZeroOne opened this issue 6 years ago · comments

I am trying to implement the same code from R using glmnet and in Python. The issue is that I am using Poisson regression.

In R, I want to run the following regression. A vector y on 3 variables. It is a simple example but I want to keep it simple to easily compare results in Python:

library(glmnet)
y= c(2.631007e-13, 2.953114e-09, 1.537151e-03)

covar = diag(3)

mod <- glmnet(x=covar, y=y, family="poisson")

The above produces the following coefficients: [1] 0.000000 0.000000 7.557667. Only the 3rd variable has any weight from the penalization.

In Python, using this package:

from pyglmnet import GLM

import numpy as np

glm = GLM(alpha=1, distr='poisson', tol=1e-07)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

This gives me [0,0,0] for the 3 coefficients.

I assume I may need to use K-Fold validation, and also search across various lambda (penalization) parameters. However, I am unsure how exactly to do this and if this is actually a feature of pyglmnet?

Has anyone been able to figure out how to run glmnet with poisson in Python to produce exactly the same results?

Mainak Jas · Answer 1 · Sat Mar 14 2020 05:16:18 GMT+0800 (China Standard Time)

yes it is! You should use GLMCV to search across various lambda. Can you make sure that these are matched between R and Python?

G · Answer 2 · Mon Mar 16 2020 21:51:56 GMT+0800 (China Standard Time)

Thank you @jasmainak.

I've tried the following in pyglmnet:

from pyglmnet import GLMCV

glm = GLMCV(alpha=1, distr='poisson', tol=1e-07, cv=2)

print(glm.fit(np.eye(3),np.array([2.631007e-13, 2.953114e-09, 1.537151e-03])).beta_)

However it produces [0, 0, 0]. I am unable to tweak the pyglmnet to produce the exact same code as in R's glmnet.

Could it be that the lambda sequences are different? I noticed in R's glmneet it produces the sequence of coefficients through each iteration. Is there a way to do this as well in pyglmnet to see if the differences are due to different starting points?

Mainak Jas · Answer 3 · Tue Mar 17 2020 12:27:29 GMT+0800 (China Standard Time)

@gaussianZeroOne could you also provide the R code that you have tried? I am not that familiar with R but I think @titipata might be able to help.

Titipat Achakulvisut · Answer 4 · Tue Mar 17 2020 13:34:56 GMT+0800 (China Standard Time)

@jasmainak, I edited the code above which should have the R code to reproduce.

Titipat Achakulvisut · Answer 5 · Tue Mar 17 2020 13:53:38 GMT+0800 (China Standard Time)

And yes, it seems like there is something going on with the optimizer. I tried the same example and couldn't get the same result as in R. Maybe @pavanramkumar knows more in detail why it does not give the same result in this case.

Mainak Jas · Answer 6 · Thu Mar 19 2020 04:13:22 GMT+0800 (China Standard Time)

cool thanks, I'll take a look tomorrow!