Logistic Accuracy tests

Question

Logistic Accuracy tests

cicdw opened this issue 7 years ago · comments

We need to determine a robust testing framework for our logistic regression algorithms, starting with a review of the accuracy of the current ADMM code.

Unregularized Problems

For unregularized problems with an intercept, we can start with the high-level test
y.sum() = sigmoid(X.dot(beta)).sum()
For unregularized problems without an intercept, we can create a handful of tiny datasets to test on, or use well-studied public datasets (e.g., iris)

Regularized Problems

As @hussainsultan already did, we can start with some high level 'marginal' tests which test extreme values for the various input parameters (e.g., when the regularization parameter is 'large' for l1-problems, the coefficients should all be 0)
For regularized problems, we might need to hand craft a few tiny datasets to test on for accuracy

Chris White commented 7 years ago

See #12

Chris White · Answer 1 · Fri Feb 10 2017 01:57:01 GMT+0800 (China Standard Time)

For unregularized problems, we can test for optimality of arbitrary problems easily by using y = sigmoid(X.dot(beta)) (no thresholding) and testing that the estimated coefficients are close to beta; how do we reliably test regularized problems though? I've considered dual certificates, and possibly 0 (sub)gradients, but is there a better way?

cc: @mcg1969

Chris White · Answer 2 · Thu Feb 23 2017 04:47:36 GMT+0800 (China Standard Time)

Some examples of the issues we face in implementing a robust testing framework can be found here: https://github.com/dask/dask-glm/blob/master/notebooks/AccuracyBook.ipynb

Michael C. Grant · Answer 3 · Thu Feb 23 2017 04:49:10 GMT+0800 (China Standard Time)

Yeah, some sort of dual or subgradient criterion is likely needed...

Chris White · Answer 4 · Thu Mar 30 2017 09:16:57 GMT+0800 (China Standard Time)

https://gist.github.com/moody-marlin/e2de54ca17d615b263f80372031cb865 cc: @mpancia

Proximal grad does worst because the line search is currently very crude.

Stoney Vintson · Answer 5 · Thu Apr 06 2017 07:58:49 GMT+0800 (China Standard Time)

I need to find support for this, but in Stephen P. Boyd's (Stanford EE Dept) NIPS workshop about ADMM on Jan 25th 2012 he mentioned that ADMM without regularization is fragile. Boyd said that what makes it very robust and guaranteed to converge is regularization. This is mentioned somewhere before the 18 minute mark. I will find a source for this. http://videolectures.net/nipsworkshops2011_boyd_multipliers/

I have been using this site as a source of information on ADMM
http://web.stanford.edu/~boyd/papers/admm_distr_stats.html