kdubovikov / blazing-encoders

Blazing-fast categorical feature encoding

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

🔥 blazing-encoders

build

Blazing-fast categorical feature encoding.

This is a Python library written in Rust that allows you to encode categorical variables using smoothed mean target. In particular, this algorithm is used: A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems.

Installation

pip install blazing_encoders

Usage

from blazing_encoders import TargetEncoder_f64
import numpy as np

if __name__ == '__main__':
    np.random.seed(42)
    data = np.random.randint(0, 10, (5, 5)).astype('float')
    target = np.random.rand(5)

    encoder = TargetEncoder_f64.fit(data, target, smoothing=1.0, min_samples_leaf=1)
    encoded_data = encoder.transform(data)
    print(encoded_data)

You can use two of the available classes: TargetEncoder_f64, and TargetEncoder_f32 to control the balance between memory usage and numerical precision of your target encoding process.

Underneath, the library will share as much memory as possible so that overhead should be minimal. Also, it will parallelize target encoding computation so that the overall process will complete much faster.

Documentation and examples

You can find out how to use the library here. Also, you can read more at this blog post.

About

Blazing-fast categorical feature encoding

License:Apache License 2.0


Languages

Language:Rust 100.0%