WillKoehrsen / feature-selector

Feature selector is a tool for dimensionality reduction of machine learning datasets

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ValueError:The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2.

NiDHanWang opened this issue · comments

Hi there! Thanks so much for such good piece of work, it really helps!

But recently an error raised when I use identify_zero_importance. It worked well when I turn off the early_stopping , and error raised when I turn it on.

Here's my code:
from feature_selector import FeatureSelector select_label=train_fill['SalePrice'] select_featrue=train_fill.drop(columns=['SalePrice','Id']) fs=FeatureSelector(data=select_featrue,labels=select_label) fs.identify_zero_importance(task='regression',eval_metric='L2',n_iterations=10,early_stopping=True)

Here's the error:

ValueError Traceback (most recent call last)
in
----> 1 fs.identify_zero_importance(task='regression',eval_metric='L2',n_iterations=10,early_stopping=True)

D:\anaconda\lib\site-packages\feature_selector.py in identify_zero_importance(self, task, eval_metric, n_iterations, early_stopping)
304 if early_stopping:
305
--> 306 train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15, stratify=labels)
307
308 # Train the model with early stopping

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in train_test_split(*arrays, **options)
2119 random_state=random_state)
2120
-> 2121 train, test = next(cv.split(X=arrays[0], y=stratify))
2122
2123 return list(chain.from_iterable((safe_indexing(a, train),

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in split(self, X, y, groups)
1321 """
1322 X, y, groups = indexable(X, y, groups)
-> 1323 for train, test in self._iter_indices(X, y, groups):
1324 yield train, test
1325

D:\anaconda\lib\site-packages\sklearn\model_selection_split.py in _iter_indices(self, X, y, groups)
1634 class_counts = np.bincount(y_indices)
1635 if np.min(class_counts) < 2:
-> 1636 raise ValueError("The least populated class in y has only 1"
1637 " member, which is too few. The minimum"
1638 " number of groups for any class cannot"

ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class cannot be less than 2

It seems like the something goes wrong whenn it try to split the data into train&test in line 306? How can i fix it?

I solve this problem by removing the argument stratify in function train_test_split at the line 306.

same issue here. Tried all alternatives in task= and eval_metric=... > always same error when early_stopping is set to True.
Also tried to provide Y in different formats (array, pandas dataframe, pandas series) -> same error.
@DeckerDai: removing argument stratify did not solve it for me.
No error when early_stopping=False.
Also not sure why error even comes up given that I'm trying to do a regression problem (task='regression',eval_metric='l2'

I solve this problem by removing the argument stratify in function train_test_split at the line 306.

I explore that line and came up with this solution to keep the stratify argument for 'classification', but not for 'regression':

if early_stopping:
                if task == 'classification':
                    train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15, stratify=labels)
                
                else:
                    train_features, valid_features, train_labels, valid_labels = train_test_split(features, labels, test_size = 0.15)