johnantonn / cash-for-unsupervised-ad

Systematic Evaluation of CASH Search Strategies for Unsupervised Anomaly Detection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Modify budget estimator to use AutoSklearn classifier for fit

johnantonn opened this issue · comments

The total_budget parameter should be estimated using the budget_estimator functionality, or at least the binding should be in place in case it is required in the future.

There are some assumptions required for this to work, such as:

  • Based on the individual search space sizes and the total search space size, an arbitrary percentage of configurations of the total search space needs to be decided to base the total budget estimation, e.g. 2% or 5%.
  • According to the previous assumption (requirement), the budget estimator should yield an estimation of the mean + 3 * standard deviations of that quantity and that's the total_budget search parameter to be used.

Example: for the below search space:

  • CBLOF: 8800.0
  • COPOD: 10.0
  • IForest: 6800.0
  • kNN: 2000.0
  • LOF: 2000.0

with individual search spaces < 10k configurations and a total search space size of ~20k, a reasonable assumption for the target percentage of configurations to try could be 5%.

Additionally, since all of the experiments are using AutoSklearn classifiers/objects, the same must be done for estimating the total budget, due to the different execution times of AutoSklearn compared to the original PyOD/sklearn models. There's a trick that uses fit_pipeline function to fit a single configuration, more details can be found here.