Extend GLM Families

Question

Extend GLM Families

mrocklin opened this issue 7 years ago · comments

So far in https://github.com/dask/dask-glm/blob/master/dask_glm/families.py we have families for linear/normal and logistic that look like the following:

class Logistic(object):
    @staticmethod
    def loglike(Xbeta, y):
        eXbeta = exp(Xbeta)
        return (log1p(eXbeta)).sum() - dot(y, Xbeta)

    @staticmethod
    def pointwise_loss(beta, X, y):
        '''Logistic Loss, evaluated point-wise.'''
        beta, y = beta.ravel(), y.ravel()
        Xbeta = X.dot(beta)
        return Logistic.loglike(Xbeta, y)

    @staticmethod
    def pointwise_gradient(beta, X, y):
        '''Logistic gradient, evaluated point-wise.'''
        beta, y = beta.ravel(), y.ravel()
        Xbeta = X.dot(beta)
        return Logistic.gradient(Xbeta, X, y)

    @staticmethod
    def gradient(Xbeta, X, y):
        p = sigmoid(Xbeta)
        return dot(X.T, p - y)

    @staticmethod
    def hessian(Xbeta, X):
        p = sigmoid(Xbeta)
        return dot(p * (1 - p) * X.T, X)

Do we want to extend these to other GLM families?
If so then what other options are most useful?
Is the current interface sufficient and ideal?

Chris White · Answer 1 · Mon Mar 20 2017 23:42:15 GMT+0800 (China Standard Time)

The only other GLM family I know of that is used "in the wild" is the Poisson family which should be very straightforward to include. (@mpancia do you know of any others?)

@hussainsultan didn't like the use of classes filled with staticmethods just for convenient namespacing, he would prefer a families module with Normal.py, etc. I originally implemented this with classes in case we ever wanted to take advantage of cacheing to speed up function calls, but that's premature optimization on my part.

Matthew Pancia · Answer 2 · Tue Mar 21 2017 00:04:09 GMT+0800 (China Standard Time)

@moody-marlin Poisson, for sure; I would also include some of the other basic families, even though they are slightly less common (e.g. binomial and/or with alternate link functions e.g. probit)

Matthew Pancia · Answer 3 · Thu Mar 23 2017 01:28:58 GMT+0800 (China Standard Time)

After looking at it for a little bit, I think mimicing the statsmodels API for their families makes sense: https://github.com/statsmodels/statsmodels/blob/master/statsmodels/genmod/families/family.py

They have (what amounts to) an abstract base class Family which specifies Link and Variance. The abstract properties/methods are reflective of the method they are using to fit it (IRLS), as well as some more generic things (computation of residuals, a fit method, etc.).

I think something would make just as much sense here, except our specification of the ABC Family ought to incorporate the necessary functions for fitting via ADMM/ newton's method/etc. (the log-likelihood, the pointwise gradient, and the pointwise Hessian, etc.).

In addition, I think that the Family specification should be factored out of the algorithms in algorithms.py as they were before -- there's no reason to peg them to a Family, as they're just generic implementations and should be repurposable for other things.

Matthew Rocklin · Answer 4 · Thu Mar 23 2017 01:31:32 GMT+0800 (China Standard Time)

@mpancia to be concrete can you give an example of what algorithms in algorithms.py should take as inputs?

Matthew Pancia · Answer 5 · Thu Mar 23 2017 01:35:14 GMT+0800 (China Standard Time)

@mrocklin Yeah, sure! In, e.g. Newton's method:

def newton(X, y, max_steps=50, tol=1e-8, family=Logistic):
    '''Newtons Method for Logistic Regression.'''

    gradient, hessian = family.gradient, family.hessian
    n, p = X.shape
    beta = np.zeros(p)  # always init to zeros?
    Xbeta = dot(X, beta)

    iter_count = 0
    converged = False

    while not converged:
        beta_old = beta

        # should this use map_blocks()?
        hess = hessian(Xbeta, X)
        grad = gradient(Xbeta, X, y)

        hess, grad = da.compute(hess, grad)

        # should this be dask or numpy?
        # currently uses Python 3 specific syntax
        step, _, _, _ = np.linalg.lstsq(hess, grad)
        beta = (beta_old - step)

        iter_count += 1

        # should change this criterion
        coef_change = np.absolute(beta_old - beta)
        converged = (
            (not np.any(coef_change > tol)) or (iter_count > max_steps))

        if not converged:
            Xbeta = dot(X, beta)  # numpy -> dask converstion of beta

    return beta

The only time that family is accessed is to retrieve family.gradient and family.hessian; Seems to me that newton ought to just have gradient and hessian arguments, so that I could use it on a more arbitrary function.

Chris White · Answer 6 · Thu Mar 23 2017 02:51:13 GMT+0800 (China Standard Time)

I actually really like this idea but I do think a refactor which preserves the runtimes will be more difficult than it initially seems; for example, @mpancia I'd encourage you to try and refactor the gradient_descent code and track the runtimes for a given dataset. In that code, we explicitly exploit the linear-model-ness and keep Xbeta as an in-memory numpy array, instead of having a pointwise call on beta every time we want to update. If we wanted to refactor, this could involve some clever cacheing or something possibly?

However, for newton / admm, this seems very do-able.

Moreover, if we could pull this off for all the algorithms, this would most likely allow for improved testing. We could run the algorithms on many different carefully crafted convex functions for which we know the optima (which isn't currently possible for regularized problems), and separately test the implementations of the GLM families / regularizers. This wouldn't be full-out integration testing, but it would still increase our understanding of the accuracy of our implementations.