features_selection function is returning same features for different target classes

Question

features_selection function is returning same features for different target classes

MathewKevin opened this issue 2 years ago · comments

Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?

#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)

Output:

features selection: from 10,000 to 7,026
 
# Curate:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
 
# Discard:
  . selected features: 7026
  . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune

df_selection[df_selection['feature'] == 'protein']

feature	score	y
protein	1.0	Curate
protein	1.0	Discard

Mauro Di Pietro · Answer 1 · Tue May 10 2022 23:04:22 GMT+0800 (China Standard Time)

Hi, contact me on Linkedin please, I'll try to help you

…

On Tue, 3 May 2022 at 12:09, Mathew Kevin ***@***.***> wrote: Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes? #Feature Selection X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25) Output: features selection: from 10,000 to 7,026 # Curate: . selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune # Discard: . selected features: 7026 . top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune df_selection[df_selection['feature'] == 'protein'] feature score y protein 1.0 Curate protein 1.0 Discard — Reply to this email directly, view it on GitHub <#14>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AHUTRVUEOX64XORTHWBPHNTVID3PJANCNFSM5U6OPNLA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.*** .com>