mdipietro09 / DataScience_ArtificialIntelligence_Utils

Examples of Data Science projects and Artificial Intelligence use-cases

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

features_selection function is returning same features for different target classes

MathewKevin opened this issue · comments

Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?

#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)

Output:

features selection: from 10,000 to 7,026
 
# Curate:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
 
# Discard:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune

df_selection[df_selection['feature'] == 'protein']
feature score y
protein 1.0 Curate
protein 1.0 Discard