This repository is to house all the algorithms related to feature selection. Feature selection have many benefits suchs as improve training/inference speed, reduce chance of overfitting and reduce chance of upstream data outages
Using of random forest and gradient boosting tree feature_importance is a good baseline and features can be further selected using more advance algorithm like (FCQ F-statistics/pearson-correlation)
- minimum redundancy Maximum relevances (mRMR)
- https://towardsdatascience.com/mrmr-explained-exactly-how-you-wished-someone-explained-to-you-9cf4ed27458b
- https://eng.uber.com/optimal-feature-discovery-ml/
- https://arxiv.org/pdf/1908.05376.pdf - Uber mRMR