Sonata165/Auto-HPO_MDPN

Introduction and Flow Chart

Automatic machine learning (AutoML) has gained wide attention and applications in both industry and academia. Automatic hyper-parameter optimization(auto-HPO) is one of the most critical parts. The effectiveness of many machine learning algorithms is extremely sensitive to hyper-parameters. Without a good set of hyper-parameters, the task cannot be solved well even with the optimal algorithm.

Our contributions are summarized as follows.

We consider the mapping from data to the optimal hyper-parameters and apply this mapping to the selection of the optimal hyper-parameters. On different tasks of an algorithm, the model has strong transferability, which greatly saves time overhead. For this reason, the model can achieve ultra-high-dimensional optimization of hyper-parameters.
With XGBoost as an example, we design the neural network structure for the mapping as well as training approaches, which could be applied to other machine learning tasks with slight modification.
Experimental results on real data demonstrate that the proposed approach significantly outperforms the state-of-art algorithms in both accuracy and efficiency.

Flow Chart

The components of our algorithm are shown in this figure. Part1 is to train MDPN with processed data come from part2 and part3. In part4, MDPN is used to predict parameters, which are optimized further in part5.

Operation Methods

Experiment

Data set

CNN and SVHN for cnn, another 98 data sets for xgboost like in the following table.

baseline

In this part, we introduce the baseline we choose briefly and basic settings about the experiments.

Bayesian optimization is shorted in BO, which is a sequential design strategy for global optimization of black-box functions that doesn't require derivatives. BO is widely used in hyper-parameter optimization. Here we use BO to optimize XGBoost as control group.

Zeroth-order optimization is the process of minimizing an objective , given oracle access to evaluations at adaptive chosen input , which is shorted as ZOOpt. Here we use ZOOpt as control group of evaluation of our model on CNN.

Metrics we recorded in our experiments contains time overhead and accuracy of each test data set. We evaluate our model on hundreds of data sets, so some global statistics are necessary to take analyses. We extract median, maximum, and two quartiles of all test sets' accuracy.

Environment

Result

We design three groups experiments containing an experiment group, a blank control group(BCG) and a control group. In the experiment group, we use MDPN and MDPN + LOPT to take optimization. In the control group, BO and ZOOpt are used. In the blank control group, we optimize hyper-parameters with a MDPN or MDPN + LOPT model without pre-training.

As for partitioning of data sets. We partition raw data sets into and with size 9:1, and is used to train MDPN before experiments. Then we continue to partition each data set in by 9:1 into and . We firstly use to train XGBoost or CNN, then we use to test the performances of XGBoost or CNN with the hyper-parameters MDPN and baseline give out.

Sonata165 / Auto-HPO_MDPN