AutoML

Question

AutoML

johnantonn opened this issue 3 years ago · comments

Ioannis Antoniadis commented 3 years ago

References:

https://link.springer.com/book/10.1007/978-3-030-05318-5

The discipline particular to our interest is more Model Selection and Hyperparameter Optimization (CASH) rather than meta-learning or AutoML (the last two are more general). There are several approaches to it:

Black box optimization techniques:
- Grid Search
- Random Search
- Bayesian Optimization
Multi-fidelity techniques:
- Bandits
..

Ioannis Antoniadis · Answer 1 · Wed Oct 06 2021 18:51:56 GMT+0800 (China Standard Time)

Lecture on automl by Andreas Mueller and relevant slides:

Lecture on automl by Frank Hutter and Joaquin Vanschoren:

https://www.youtube.com/watch?v=0eBR8a4MQ30&ab_channel=StevenVanVaerenbergh

Bayesian optimization:

Auto-Sklearn:

https://arxiv.org/pdf/2007.04074.pdf

Ioannis Antoniadis · Answer 2 · Fri Oct 08 2021 19:07:10 GMT+0800 (China Standard Time)

The problem:

CASH: Combined Algorithm Selection and Hyperparameter Optimization

Two major classses of AutoML optimizers:

Simple optimizers: only take care of model/hyperparameter selection
Pipeline optimizers: may also include preprocessing components

Research on implemented tools:

Auto-Weka:
- Java
- Pipeline optimizer
- Bayesian optimization (SMAC)
scikit-optimize:
- Python, on top of scikit-learn
- Simple optimizer
- Bayesian optimization (SMAC)
Hyperopt-sklearn
- Python, on top of scikit-learn
- Pipeline optimizer
- Bayesian optimization (TPE)
Auto-sklearn
- Python, on top of scikit-learn, improvement of the Auto-Weka methodology
- Bayesian optimization (SMAC)
TPOT
- Python, on top of scikit-learn
- Pipeline optimizer
- Genetic Programming (GP)
Hyperband
- Python
- Pipeline optimizer
- Bandit-based
Optunity
- Python
- Simple optimizer
- Includes several optimization algorithms:
  - Grid Search
  - Random Search
  - Particle Swarm Optimization
  - Nelder-Mead simplex
  - CMA-ES
  - TPE
  - Sobol sequences

Note: Major disadvantage of all of the state-of-the-art AutoML optimizers (either simple or pipeline) is that they provide pre-defined list of models and components to use.

Ioannis Antoniadis · Answer 3 · Fri Oct 15 2021 19:58:18 GMT+0800 (China Standard Time)

Next up for auto-sklearn experimentation:

Inspect the validation procedure of auto-sklearn and modify it to accommodate reduced validation sets (get rid of unlabelled points or else it'll crush)
Incorporate additional AD models from PyOD