Initialization Routines
cicdw opened this issue · comments
An all-too-often ignored side of optimization is the initialization; there is a lot of research out there suggesting that for both convex (and even more so for non-convex) optimization problems, a large amount of work can be saved by initializing algorithms at clever starting values.
Currently we are initializing all algorithms with the 0 vector. Once the API (#11) is sorted out, we should have multiple options for how to initialize, including (but not limited to):
- random Gaussian initializations
- running some other, faster algorithm at a very low tolerance (ex: initialization Newton with the output of gradient descent set at a very low tolerance setting)
- outputs of previous runs (will be built into a
refit
method, to be raised in a future issue) - more interesting but academically well-grounded ideas
cc: @mpancia
As we discussed earlier, I think this is a really cool idea, and I'm glad to be part of the discussion.
As a novice to this (and for the purposes of furthering a discussion), do you know any good surveys of what the academically well-grounded
things look like and/or some higher-level discussions of the benefits of Smart Initialization™?
No surveys that I know of unfortunately, but here's a list off the top of my head:
- I've heard people say you can use the close connection between LDA and Logistic Regression to initialize one with the other (I haven't thought about too much about speed / efficiency trade-offs here)
- In the non-convex case, there's the "famous" k-means++
- the starting guess for the approximation to the Hessian in BFGS can have significant consequences for the convergence of the algorithm
- you can exploit the close connection between W-OLS and Logistic Regression to infer things about variable addition / dropping, which is related to multiple refits (see the Logistic Regression chapter in Elements of Statistical Learning)
Ultimately, I think the biggest bang will come from smart initializations when refitting a model, but I'd like to include at least a little thought on initializations from scratch as well.