jmschrei / apricot

apricot implements submodular optimization for the purpose of selecting subsets of massive data sets to train machine learning models quickly. See the documentation page: https://apricot-select.readthedocs.io/en/latest/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not scalable

saurabh11baghel opened this issue · comments

@jmschrei @domoritz I want to select a subset of 100 samples from a dataset of total 100000 samples and 25 features. The FeatureBasedSelection method is taking forever without doing anything at all.

data_subset,labels_subset = FeatureBasedSelection(100,verbose=True).fit_transform(data,labels)

it is showing verbose output as following for the past one hour.
0%| | 0/100 [00:00<?, ?it/s]

What do you thing is wrong?

Sorry for missing this. I don't really know what is wrong. Can you try downloading the latest patch and trying it again? Also try using optimizer='stochastic', which should be significantly faster but not the exact greedy solution.

Please re-open if you are still encountering issues.