vhsven / vhsven-sklearn

A template for scikit-learn extensions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

vhsven-sklearn - My scikit-learn extensions

This python 3.6 package contains an extension class for scikit-learn's decision tree algorithms. Where scikit-learn by default uses early stopping criteria to prevent a tree from growing too large, this package contains a PruneableDecisionTreeClassifier which uses one of two well-known pruning techniques to keep the size in check. The two techniques are Reduced Error Pruning and Error Based Pruning.

Created based on project-template, a template project for scikit-learn compatible extensions. See also: http://contrib.scikit-learn.org/project-template/.

Installation and Usage

The package by itself comes with a single module and a classifier. Before installing the module you will need numpy and scipy. To install the module execute:

python setup.py install

or

pip install pruneabletree

or (when using Anaconda)

conda develop /path/to/vhsven-sklearn

If the installation is successful, and scikit-learn is correctly installed, you should be able to execute the following in Python:

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import cross_val_score
>>> from pruneabletree import PruneableDecisionTreeClassifier
>>> clf = PruneableDecisionTreeClassifier(random_state=0, prune='rep')
>>> iris = load_iris()
>>> cross_val_score(clf, iris.data, iris.target, cv=10)

Developers will need to install additional packages to contribute to the code. When using Anaconda, create a new environment based on the environment_dev.yml file. Else, take a look at the Makefile to see which packages were originally installed.

Documentation

The documentation is built using sphinx. It incorporates narrative documentation from the doc/ directory, standalone examples from the examples/ directory, and API reference compiled from estimator docstrings.

The online documentation can be found here.

To build the documentation locally, ensure that you have sphinx, sphinx-gallery and matplotlib by executing:

pip install sphinx matplotlib sphinx-gallery

You can also add code examples in the examples folder. All files inside the folder of the form plot_*.py will be executed and their generated plots will be available for viewing in the /auto_examples URL.

To build the documentation locally execute

cd doc
make html

The project uses CircleCI to build its documentation from the master branch and host it using GitHub Pages.

About

A template for scikit-learn extensions

License:BSD 3-Clause "New" or "Revised" License


Languages

Language:Jupyter Notebook 93.5%Language:Python 5.6%Language:Batchfile 0.4%Language:Shell 0.4%Language:Makefile 0.1%