susan1314 / feature_selection

This repo is to house all the algorithms related to feature selection

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature selection algorithm

This repository is to house all the algorithms related to feature selection. Feature selection have many benefits suchs as improve training/inference speed, reduce chance of overfitting and reduce chance of upstream data outages

Summary

Using of random forest and gradient boosting tree feature_importance is a good baseline and features can be further selected using more advance algorithm like (FCQ F-statistics/pearson-correlation)

Algorithms

  1. minimum redundancy Maximum relevances (mRMR)

References

minimum redundancy Maximum relevances (mRMR)

  1. https://towardsdatascience.com/mrmr-explained-exactly-how-you-wished-someone-explained-to-you-9cf4ed27458b
  2. https://eng.uber.com/optimal-feature-discovery-ml/

Papers

mRMR

  1. https://arxiv.org/pdf/1908.05376.pdf - Uber mRMR

About

This repo is to house all the algorithms related to feature selection


Languages

Language:Jupyter Notebook 97.0%Language:Python 2.7%Language:Shell 0.3%