dask / dask-glm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Initialization Routines

cicdw opened this issue · comments

An all-too-often ignored side of optimization is the initialization; there is a lot of research out there suggesting that for both convex (and even more so for non-convex) optimization problems, a large amount of work can be saved by initializing algorithms at clever starting values.

Currently we are initializing all algorithms with the 0 vector. Once the API (#11) is sorted out, we should have multiple options for how to initialize, including (but not limited to):

  • random Gaussian initializations
  • running some other, faster algorithm at a very low tolerance (ex: initialization Newton with the output of gradient descent set at a very low tolerance setting)
  • outputs of previous runs (will be built into a refit method, to be raised in a future issue)
  • more interesting but academically well-grounded ideas

cc: @mpancia

As we discussed earlier, I think this is a really cool idea, and I'm glad to be part of the discussion.

As a novice to this (and for the purposes of furthering a discussion), do you know any good surveys of what the academically well-grounded things look like and/or some higher-level discussions of the benefits of Smart Initialization™?

No surveys that I know of unfortunately, but here's a list off the top of my head:

Ultimately, I think the biggest bang will come from smart initializations when refitting a model, but I'd like to include at least a little thought on initializations from scratch as well.