jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Selection with pre-selected set

M-A-Hassan opened this issue · comments

Hi,

Is there a way to pass a preselected set of data to the optimizer to be considered while calculating the gain?
The preselected set is user-defined input to the optimizer and will not be altered or modified in any way by the optimizer.

Thanks for the effort and making this package available.
Mohamed

Sorry for taking so long to reply.
Now I have tried it, but it seems that this way only works if your initial set is a subset of the data.
Please take a look at this:

elif self.initial_subset.ndim == 2:

When a try to pass a 2 dim array, as my initial set, this line of code raises the following value error:
ValueError: When using facility location, the initial subset must be a one dimensional array of indices.