sklearn-porter

Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
It's recommended for limited embedded systems and critical applications where performance matters most.

Machine learning algorithms

Algorithm	Programming language
Classifier	Java *	JS	C	Go	PHP	Ruby
svm.SVC	✓, ✓ ᴵ	✓	✓		✓	✓
svm.NuSVC	✓, ✓ ᴵ	✓	✓		✓	✓
svm.LinearSVC	✓, ✓ ᴵ	✓	✓	✓	✓	✓
tree.DecisionTreeClassifier	✓, ✓ ᴱ, ✓ ᴵ	✓, ✓ ᴱ	✓, ✓ ᴱ	✓, ✓ ᴱ	✓, ✓ ᴱ	✓, ✓ ᴱ
ensemble.RandomForestClassifier	✓ ᴱ, ✓ ᴵ	✓ ᴱ	✓ ᴱ	✓ ᴱ	✓ ᴱ	✓ ᴱ
ensemble.ExtraTreesClassifier	✓ ᴱ, ✓ ᴵ	✓ ᴱ	✓ ᴱ		✓ ᴱ	✓ ᴱ
ensemble.AdaBoostClassifier	✓ ᴱ, ✓ ᴵ	✓ ᴱ, ✓ ᴵ	✓ ᴱ
neighbors.KNeighborsClassifier	✓, ✓ ᴵ	✓, ✓ ᴵ
naive_bayes.GaussianNB	✓, ✓ ᴵ	✓
naive_bayes.BernoulliNB	✓, ✓ ᴵ	✓
neural_network.MLPClassifier	✓, ✓ ᴵ	✓, ✓ ᴵ
Regressor
neural_network.MLPRegressor		✓

✓ = is full-featured, ᴱ = with embedded model data, ᴵ = with imported model data, * = default language

Installation

$ pip install sklearn-porter

If you want the latest changes, you can install the module from the master branch:

$ pip uninstall -y sklearn-porter
$ pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master

Minimum requirements

The minimum requirements to use the module are defined in the requirements.txt:

- numpy>=1.8.2
- scipy>=0.14.0
- scikit-learn>=0.14.1

Usage

Export

The following example demonstrates how you can transpile a decision tree estimator to Java:

from sklearn.datasets import load_iris
from sklearn.tree import tree
from sklearn_porter import Porter

# load data and train the classifier:
samples = load_iris()
X, y = samples.data, samples.target
clf = tree.DecisionTreeClassifier()
clf.fit(X, y)

# export:
porter = Porter(clf, language='java')
output = porter.export(embed_data=True)
print(output)

The exported result matches the official human-readable version of the decision tree.

Prediction

Run the prediction(s) in the target programming language directly:

# ...
porter = Porter(clf, language='java')

# prediction(s):
Y_java = porter.predict(X)
y_java = porter.predict(X[0])
y_java = porter.predict([1., 2., 3., 4.])

Integrity

Always compute and check the integrity between the original and the transpiled estimator:

# ...
porter = Porter(clf, language='java')

# accuracy:
integrity = porter.integrity_score(X)
print(integrity)  # 1.0

Please note that the integrity check isn't supported on Windows operation systems.

Command-line interface

First of all have a quick view on the available arguments:

$ python -m sklearn_porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
                           [--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
                           [--c] [--java] [--js] [--go] [--php] [--ruby] \
                           [--export] [--checksum] [--data] [--pipe]

The following example shows how you can save an trained estimator to the pickle format:

# ...

# extract estimator:
joblib.dump(clf, 'estimator.pkl', compress=0)

After that the estimator can be transpiled to JavaScript by using the following command:

$ python -m sklearn_porter -i estimator.pkl --js

The target programming language is changeable on the fly:

$ python -m sklearn_porter -i estimator.pkl --c
$ python -m sklearn_porter -i estimator.pkl --java
$ python -m sklearn_porter -i estimator.pkl --php
$ python -m sklearn_porter -i estimator.pkl --java
$ python -m sklearn_porter -i estimator.pkl --ruby

For further processing the argument --pipe can be used to pass the result:

$ python -m sklearn_porter -i estimator.pkl --js --pipe > estimator.js

For instance the result can be minified by using UglifyJS:

$ python -m sklearn_porter -i estimator.pkl --js --pipe | uglifyjs --compress -o estimator.min.js

Further information will be shown by using the --help argument:

$ python -m sklearn_porter --help
$ python -m sklearn_porter -h

Tip: You can install a handy function to use the porter directly:

$ cat scripts/alias.sh >> ~/.bash_profile && source ~/.bash_profile

$ porter [-h] --input <PICKLE_FILE> [--output <DEST_DIR>] \
         [--class_name <CLASS_NAME>] [--method_name <METHOD_NAME>] \
         [--c] [--java] [--js] [--go] [--php] [--ruby] \
         [--export] [--checksum] [--data] [--pipe]

But don't forget to activate the right environment where the porter has been installed.

Development

Environment

Either you install just the minimum requirements (see requirements.txt) for testing:

$ conda create -n sklearn-porter python=2  # or python=3
$ source activate sklearn-porter
$ pip install -U pip
$ pip install -r requirements.txt

Or you install all recommended packages (see environment.yml) for broader development:

$ conda env create -n sklearn-porter -c conda-forge python=2 -f environment.yml  # for macOS users
$ # conda create -n sklearn-porter -c conda-forge python=2 scikit-learn pylint jupyter nb_conda twine
$ source activate sklearn-porter

Independently, the following compilers and intepreters are required to cover all tests:

Name	Version	Command
GCC	`>=4.2`	`gcc --version`
Java	`>=1.6`	`java -version`
PHP	`>=7`	`php --version`
Ruby	`>=2.4.1`	`ruby --version`
Go	`>=1.7.4`	`go version`
Node.js	`>=6`	`node --version`

Testing

The tests cover module functions as well as matching predictions of transpiled estimators. Run all tests:

$ bash scripts/test.sh

#!/usr/bin/env bash

# activate the relevant environment:
source activate sklearn-porter

# start local server which is required for the JavaScript tests:
if [[ $(python -c "import sys; print(sys.version_info[:1][0]);") == "2" ]]; then
  python -m SimpleHTTPServer 8080 &>/dev/null & serve_pid=$!
else
  python -m http.server 8080 &>/dev/null & serve_pid=$!
fi

# run all tests:
python -m unittest discover -vp '*Test.py'

# close the previous started server:
kill $serve_pid

# deactivate the previous activated environment:
source deactivate &>/dev/null

The test files have a specific pattern: '[Algorithm][Language]Test.py':

$ python -m unittest discover -vp 'RandomForest*Test.py'
$ python -m unittest discover -vp '*JavaTest.py'

While you are developing new features or fixes, you can reduce the test duration by changing the number of tests:

$ N_RANDOM_FEATURE_SETS=15 N_EXISTING_FEATURE_SETS=30 python -m unittest discover -vp '*Test.py'

Quality

It's highly recommended to ensure the code quality. For that I use Pylint. Run the linter:

$ bash scripts/lint.sh

#!/usr/bin/env bash

find sklearn_porter -name '*.py' -exec pylint {} \;

Citation

If you use this implementation in you work, please add a reference/citation to the paper. You can use the following BibTeX entry:

@unpublished{skpodamo,
  author = {Darius Morawiec},
  title = {sklearn-porter},
  note = {Transpile trained scikit-learn estimators to C, Java, JavaScript and others},
  url = {https://github.com/nok/sklearn-porter}
}

License

The module is Open Source Software released under the MIT license.

Questions?

Don't be shy and feel free to contact me on Twitter or Gitter.

supdizh / sklearn-porter