jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Using custom functions

sadhamanus opened this issue · comments

Can custom submodular functions be written and optimized (minimized in particular) using the library and if so how? Thanks.

Yes, you can write your own function by following the template of the built-in functions, e.g. graph cut: https://github.com/jmschrei/apricot/blob/master/apricot/functions/graphCut.py Unfortunately, submodular minimization is not yet built into apricot.

Writing your own custom function should be simple. The gist is to make an object that inherits from either BaseSelection or BaseGraphSelection (depending on if it's a feature-based function or a graph-based function, i.e. if it operates on the feature values of examples directly or on pairwise similarities). You need to fill in the __init__ method with whatever hyperparameters you want your selector to have, the _initialize function to be what the model should do before selection and optionally given a list of examples to seed the selection (potentially just leave this blank if not relevant), the _calculate_gain function to evaluate the gain that would you see in the objective function if you added each element in idxs individually to the current selection, and _select_next to perform whatever logic should occur when you've chosen the next item to add to the growing set, such as update the cached statistics.