rubik / sklearn-pdtransform

Sklearn transformers that work with Pandas dataframes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

sklearn-pdtransform

Installation:

$ pip install pdtransform

A little package with a few transformers to work with Pandas dataframes in the Sklearn pipeline, which I found myself writing quite frequently. Example usage:

from pdtransform import DFTransform, DFFeatureUnion

pipeline = Pipeline([
    ('ordinal_to_nums', DFTransform(_ordinal_to_nums, copy=True)),
    ('union', DFFeatureUnion([
        ('categorical', Pipeline([
            ('select', DFTransform(lambda X: X.select_dtypes(include=['object']))),
            ('fill_na', DFTransform(lambda X: X.fillna('NA'))),
            ('one_hot', DFTransform(_one_hot_encode)),
        ])),
        ('numerical', Pipeline([
            ('select', DFTransform(lambda X: X.select_dtypes(exclude=['object']))),
            ('fill_median', DFTransform(lambda X: X.fillna(X.median()))),
            ('add_features', DFTransform(_add_features, copy=True)),
            ('remove_skew', DFTransform(_remove_skew, copy=True)),
            ('find_outliers', DFTransform(_find_outliers, copy=True)),
            ('normalize', DFTransform(lambda X: X.div(X.max())))
        ])),
    ])),
])

For more information read this blog post.

About

Sklearn transformers that work with Pandas dataframes

License:MIT License


Languages

Language:Python 94.6%Language:Makefile 5.4%