simple-sklearn-classifiers

This repository contains a simple collection of classifiers applied using the sklearn package

Data Preparation

Data is split into a training (60%) and a testing (40%) dataset using stratified sampling and cross validation with k=5. These settings can be changed through the model_selection.train_test_split() function.

Classifiers

There are 7 algorithms implemented testing multiple configurations using grid search:

SVM with Linear Kernel. Configurations: C = [0.1, 0.5, 1, 5, 10, 50, 100].
SVM with Polynomial Kernel. Configurations: C = [0.1, 1, 3] and gamma = [0.1, 0.5].
SVM with RBF Kernel. Configurations: C = [0.1, 0.5, 1, 5, 10, 50, 100] and gamma = [0.1, 0.5, 1, 3, 6, 10].
Logistic Regression. Configurations: C = [0.1, 0.5, 1, 5, 10, 50, 100].
k-Nearest Neighbors. Configurations: n_neighbors = [1, 2, 3, ..., 50] and leaf_size = [5, 10, 15, ..., 60].
Decision Tree. Configurations: max_depth = [1, 2, 3, ..., 50] and min_samples_split = [2, 3, 4, ..., 10].
Random Forest. Configurations: max_depth = [1, 2, 3, ..., 50] and min_samples_split = [2, 3, 4, ..., 10].

Any configuration can be edited using the corresponding parameters variable. The best configuration scores for each algorithm are stacked and written into an output file.

Running the algorithms

All algorithms can be applied to any dataset provided through command line. A single input dataset Xy is expected as an input with the response variable in the last column. To run the classifiers use the following command:

$ python3 main.py Xy_train.csv output_filename.csv

lesquerra / simple-sklearn-classifiers

simple-sklearn-classifiers

Data Preparation

Classifiers

Running the algorithms

About

Languages