alexandreguichet / MFS

Modular Feature Selection (Mutual-Information-based Feature Selection)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MFS

Modular Feature Selection (Mutual-Information-based Feature Selection)

This toolbox proposes mutual-information-based feature-selection modules to use in any machine-learning/deep-learning applications.

The mutual information is computed using Kraskov/Ross/Gao's k-neighbours approach.

This is a simple function that take N-by-Mx features with N-by-My labels and returns a Mx-by-My pd.DataFrame containing all features/labels mi.

The next focus would be:

  • Perform feature-selection by minimising feature-redundancies and maximizing mutual-information.
  • Implement partial-mutual-information / conditional-mutual-information

Run script:

  • check mifs_example.py for examples on hwo to use the framework for different tasks
  • check mifs.py and mutual_information.py's documentation on all parameter uses and possibilities

Usage Example:

mifs = MIFS()
mifs.load_file("datasets\\IPODataFull.csv")

# extract features/labels
features = mifs.df.drop(columns = ["Survived"])
labels = mifs.df["Survived"].to_frame()

#Convert categorical
cat_columns = features.select_dtypes(['category', 'object']).columns
features[cat_columns] = features[cat_columns].astype('category').apply(lambda x: x.cat.codes)

#Calculate mutual information, with results as a dictionary!
# check results['selected'] for the final answer: unique features with most redundancies removed (final value is normalized)
# check results['threshold'] for all features above a threshold (n = 50 here), redundancies are still present
# check results['all'] for the mutual information matrix of the 50 selected features (that are above a threshold)
# check results['labels'] for the mutual information value of all features
results = mifs.select_n_features(n = 50, downsample = True) 

References:

About

Modular Feature Selection (Mutual-Information-based Feature Selection)

License:MIT License


Languages

Language:Python 100.0%