Extend GLM Families
mrocklin opened this issue · comments
So far in https://github.com/dask/dask-glm/blob/master/dask_glm/families.py we have families for linear/normal and logistic that look like the following:
class Logistic(object):
@staticmethod
def loglike(Xbeta, y):
eXbeta = exp(Xbeta)
return (log1p(eXbeta)).sum() - dot(y, Xbeta)
@staticmethod
def pointwise_loss(beta, X, y):
'''Logistic Loss, evaluated point-wise.'''
beta, y = beta.ravel(), y.ravel()
Xbeta = X.dot(beta)
return Logistic.loglike(Xbeta, y)
@staticmethod
def pointwise_gradient(beta, X, y):
'''Logistic gradient, evaluated point-wise.'''
beta, y = beta.ravel(), y.ravel()
Xbeta = X.dot(beta)
return Logistic.gradient(Xbeta, X, y)
@staticmethod
def gradient(Xbeta, X, y):
p = sigmoid(Xbeta)
return dot(X.T, p - y)
@staticmethod
def hessian(Xbeta, X):
p = sigmoid(Xbeta)
return dot(p * (1 - p) * X.T, X)
- Do we want to extend these to other GLM families?
- If so then what other options are most useful?
- Is the current interface sufficient and ideal?
The only other GLM family I know of that is used "in the wild" is the Poisson family which should be very straightforward to include. (@mpancia do you know of any others?)
@hussainsultan didn't like the use of classes filled with staticmethods just for convenient namespacing, he would prefer a families
module with Normal.py
, etc. I originally implemented this with classes in case we ever wanted to take advantage of cacheing to speed up function calls, but that's premature optimization on my part.
@moody-marlin Poisson, for sure; I would also include some of the other basic families, even though they are slightly less common (e.g. binomial and/or with alternate link functions e.g. probit)
After looking at it for a little bit, I think mimicing the statsmodels
API for their families makes sense: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/genmod/families/family.py
They have (what amounts to) an abstract base class Family
which specifies Link
and Variance
. The abstract properties/methods are reflective of the method they are using to fit it (IRLS), as well as some more generic things (computation of residuals, a fit method, etc.).
I think something would make just as much sense here, except our specification of the ABC Family
ought to incorporate the necessary functions for fitting via ADMM/ newton's method/etc. (the log-likelihood, the pointwise gradient, and the pointwise Hessian, etc.).
In addition, I think that the Family
specification should be factored out of the algorithms in algorithms.py
as they were before -- there's no reason to peg them to a Family
, as they're just generic implementations and should be repurposable for other things.
@mpancia to be concrete can you give an example of what algorithms in algorithms.py
should take as inputs?
@mrocklin Yeah, sure! In, e.g. Newton's method:
def newton(X, y, max_steps=50, tol=1e-8, family=Logistic):
'''Newtons Method for Logistic Regression.'''
gradient, hessian = family.gradient, family.hessian
n, p = X.shape
beta = np.zeros(p) # always init to zeros?
Xbeta = dot(X, beta)
iter_count = 0
converged = False
while not converged:
beta_old = beta
# should this use map_blocks()?
hess = hessian(Xbeta, X)
grad = gradient(Xbeta, X, y)
hess, grad = da.compute(hess, grad)
# should this be dask or numpy?
# currently uses Python 3 specific syntax
step, _, _, _ = np.linalg.lstsq(hess, grad)
beta = (beta_old - step)
iter_count += 1
# should change this criterion
coef_change = np.absolute(beta_old - beta)
converged = (
(not np.any(coef_change > tol)) or (iter_count > max_steps))
if not converged:
Xbeta = dot(X, beta) # numpy -> dask converstion of beta
return beta
The only time that family
is accessed is to retrieve family.gradient
and family.hessian
; Seems to me that newton
ought to just have gradient
and hessian
arguments, so that I could use it on a more arbitrary function.
I actually really like this idea but I do think a refactor which preserves the runtimes will be more difficult than it initially seems; for example, @mpancia I'd encourage you to try and refactor the gradient_descent
code and track the runtimes for a given dataset. In that code, we explicitly exploit the linear-model-ness and keep Xbeta
as an in-memory numpy
array, instead of having a pointwise call on beta
every time we want to update. If we wanted to refactor, this could involve some clever cacheing or something possibly?
However, for newton / admm, this seems very do-able.
Moreover, if we could pull this off for all the algorithms, this would most likely allow for improved testing. We could run the algorithms on many different carefully crafted convex functions for which we know the optima (which isn't currently possible for regularized problems), and separately test the implementations of the GLM families / regularizers. This wouldn't be full-out integration testing, but it would still increase our understanding of the accuracy of our implementations.