AutoML
johnantonn opened this issue · comments
References:
The discipline particular to our interest is more Model Selection and Hyperparameter Optimization (CASH) rather than meta-learning or AutoML (the last two are more general). There are several approaches to it:
- Black box optimization techniques:
- Grid Search
- Random Search
- Bayesian Optimization
- Multi-fidelity techniques:
- Bandits
- ..
Lecture on automl by Andreas Mueller and relevant slides:
- https://www.youtube.com/watch?v=tqtTHRwa8dE&ab_channel=AndreasMueller
- https://amueller.github.io/COMS4995-s19/slides/aml-13-parameter-tuning-automl/#1
Lecture on automl by Frank Hutter and Joaquin Vanschoren:
Bayesian optimization:
- https://www.youtube.com/watch?v=C5nqEHpdyoE&t=17s&ab_channel=UAI2018
- https://arxiv.org/abs/1807.02811
Auto-Sklearn:
The problem:
- CASH: Combined Algorithm Selection and Hyperparameter Optimization
Two major classses of AutoML optimizers:
- Simple optimizers: only take care of model/hyperparameter selection
- Pipeline optimizers: may also include preprocessing components
Research on implemented tools:
- Auto-Weka:
- Java
- Pipeline optimizer
- Bayesian optimization (SMAC)
- scikit-optimize:
- Python, on top of scikit-learn
- Simple optimizer
- Bayesian optimization (SMAC)
- Hyperopt-sklearn
- Python, on top of scikit-learn
- Pipeline optimizer
- Bayesian optimization (TPE)
- Auto-sklearn
- Python, on top of scikit-learn, improvement of the Auto-Weka methodology
- Bayesian optimization (SMAC)
- TPOT
- Python, on top of scikit-learn
- Pipeline optimizer
- Genetic Programming (GP)
- Hyperband
- Python
- Pipeline optimizer
- Bandit-based
- Optunity
- Python
- Simple optimizer
- Includes several optimization algorithms:
- Grid Search
- Random Search
- Particle Swarm Optimization
- Nelder-Mead simplex
- CMA-ES
- TPE
- Sobol sequences
Note: Major disadvantage of all of the state-of-the-art AutoML optimizers (either simple or pipeline) is that they provide pre-defined list of models and components to use.
Next up for auto-sklearn experimentation:
- Inspect the validation procedure of auto-sklearn and modify it to accommodate reduced validation sets (get rid of unlabelled points or else it'll crush)
- Incorporate additional AD models from PyOD