Modify budget estimator to use AutoSklearn classifier for fit
johnantonn opened this issue · comments
The total_budget
parameter should be estimated using the budget_estimator
functionality, or at least the binding should be in place in case it is required in the future.
There are some assumptions required for this to work, such as:
- Based on the individual search space sizes and the total search space size, an arbitrary percentage of configurations of the total search space needs to be decided to base the total budget estimation, e.g. 2% or 5%.
- According to the previous assumption (requirement), the budget estimator should yield an estimation of the mean + 3 * standard deviations of that quantity and that's the
total_budget
search parameter to be used.
Example: for the below search space:
- CBLOF: 8800.0
- COPOD: 10.0
- IForest: 6800.0
- kNN: 2000.0
- LOF: 2000.0
with individual search spaces < 10k configurations and a total search space size of ~20k, a reasonable assumption for the target percentage of configurations to try could be 5%.
Additionally, since all of the experiments are using AutoSklearn classifiers/objects, the same must be done for estimating the total budget, due to the different execution times of AutoSklearn compared to the original PyOD/sklearn models. There's a trick that uses fit_pipeline
function to fit a single configuration, more details can be found here.