features_selection function is returning same features for different target classes
MathewKevin opened this issue · comments
Mathew Kevin commented
Hi I'm building a binary classifier model that uses text data as input, I tried to generate features using features_selection function, but it was returning the same number of features for the two different target classes which is actually incorrect. Am I supposed to generate the features separately for the two classes?
#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)
Output:
features selection: from 10,000 to 7,026
# Curate:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
# Discard:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
df_selection[df_selection['feature'] == 'protein']
feature | score | y |
---|---|---|
protein | 1.0 | Curate |
protein | 1.0 | Discard |
Mauro Di Pietro commented
Hi,
contact me on Linkedin please, I'll try to help you
…On Tue, 3 May 2022 at 12:09, Mathew Kevin ***@***.***> wrote:
Hi I'm building a binary classifier model that uses text data as input, I
tried to generate features using features_selection function, but it was
returning the same number of features for the two different target classes
which is actually incorrect. Am I supposed to generate the features
separately for the two classes?
#Feature Selection
X_names, df_selection = features_selection(X_train, df_train["Target"], X_names, top=None, print_top=25)
Output:
features selection: from 10,000 to 7,026
# Curate:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
# Discard:
. selected features: 7026
. top features: aa, acid, acid sequence, activity, advance, affinity, allergen, alpha, amino, amino acid, antibody, antigen, antigenic, antimicrobial, antimicrobial peptide, application, approach, area, article, aspect, assay, assessment, attention, autoantibody, autoimmune
df_selection[df_selection['feature'] == 'protein']
feature score y
protein 1.0 Curate
protein 1.0 Discard
—
Reply to this email directly, view it on GitHub
<#14>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AHUTRVUEOX64XORTHWBPHNTVID3PJANCNFSM5U6OPNLA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***
.com>