scikit-lego
We love scikit learn but very often we find ourselves writing custom transformers, metrics and models. The goal of this project is to attempt to consolidate these into a package that offers code quality/testing. This project is a collaboration between multiple companies in the Netherlands. Note that we're not formally affiliated with the scikit-learn project at all.
Installation
Install scikit-lego
via pip with
pip install scikit-lego
Alternatively, to edit and contribute you can fork/clone and run:
pip install -e ".[dev]"
python setup.py develop
Documentation
The documentation can be found here.
Usage
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
...
mod = Pipeline([
("scale", StandardScaler()),
("random_noise", RandomAdder()),
("model", GMMClassifier())
])
...
Features
Here's a list of features that this library currently offers:
sklego.preprocessing.PatsyTransformer
applies a patsy formulasklego.preprocessing.RandomAdder
adds randomness in trainingsklego.preprocessing.PandasTypeSelector
selects columns based on pandas typesklego.preprocessing.ColumnSelector
selects columns based on column namesklego.dummy.RandomRegressor
benchmark that predicts random valuessklego.mixture.GMMClassifier
classifies by training a GMM per classsklego.mixture.GMMOutlierDetector
detects outliers based on a trained GMMsklego.pandas_utils.log_step
a simple logger-decorator for pandas pipeline stepssklego.pandas_utils.add_lags
adds lag values of certain columns in pandassklego.pipeline.DebugPipeline
adds debug information to make debugging easiersklego.meta.GroupedEstimator
can split the data into runs and run a model on eachsklego.meta.EstimatorTransformer
adds a model output as a featuresklego.datasets.load_chicken
loads in the joyful chickweight dataset
New Features
We want to be rather open here in what we accept but we do demand three things before they become added to the project:
- any new feature contributes towards a demonstratable real-world usecase
- any new feature passes standard unit tests (we have a few for transformers and predictors)
- the feature has been discussed in the issue list beforehand