A tool that use Genetic Algorithm for FEature Selection.
Feature selection is the process of finding the most relevant variables for a predictive model. These techniques can be used to identify and remove unneeded, irrelevant and redundant features that do not contribute or decrease the accuracy of the predictive model. In nature, the genes of organisms tend to evolve over successive generations to better adapt to the environment. The Genetic Algorithm is an heuristic optimization method inspired by that procedures of natural evolution. In feature selection, the function to optimize is the generalization performance of a predictive model. More specifically, we want to minimize the error of the model on an independent data set not used to create the model.
In this project we use deap for create the individuals with 'mutations' (subset of columns) and select the best individuals (highest accuracy) in sklearn models. You have to read your dataset with pandas, encode your class labels, create a Gafes object with X, y, number of population (as n_pop) and number of genneration (as n_gen) and run Gafes. In the end, you will have the subset of features that have best accuracy in the population created.
gafes has the following system requirements:
Please install all dependencies manually with:
curl https://raw.githubusercontent.com/anunciado/ICE1047-Gafes/master/requirements.txt | xargs -n 1 -L 1 pip install
Then install gafes:
!pip install git+https://github.com/anunciado/ICE1047-Gafes.git@master
import pandas as pd
from gafes.gafes import Gafes
from gafes.gafes import Utils
# read dataframe from csv
df = pd.read_csv('dataset.csv')
# encode labels
X, y = Utils(df).encode('class')
# initialize gafes
gf = Gafes(X=X, y=y, n_pop=20, n_gen=6)
gf.run()
See a full example of use in examples folder.
- Luís Eduardo Anunciado Silva (cruxiu@ufrn.edu.br)
- Sandro Jose De Souza (sandro@neuro.ufrn.br)
See also the list of contributors who participated in this project.
This project is licensed under the MIT - see the LICENSE file for details
Feel free to fork the repository, add your changes and give back by issuing a pull request.
- Genetic Algorithm For Feature Selection
- A program that search for the best feature subset for you classification mode
- Genetic Algorithm Feature Selection
- A program that search for the best feature subset for you classification mode
- Genetic algorithms for feature selection in Data Analytics
- A text that explain the use of genetic algorithms for feature selection
- Breast Cancer Wisconsin (Diagnostic) Data Set
- A data set of breast cancer wisconsin (with diagnostic)