4austinpowers / greenpyce

Utilities for the Python data analysis library Pandas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

greenpyce

Utilities for the Python data analysis library Pandas.

  1. Feature Engeering
    1. Categorical
      1. One-Hot
      2. Label Count
      3. Rank Categories
      4. Target Encoding
    2. Dates
      1. Time Delta
      2. Expand Date
      3. Day of Week
      4. Period of Day

Feature Engineering

Categorical

One-Hot Encoding

Apply onehot encoding to the passed columns

onehot(df, columns, new_column=False)
    names  names_George  names_John  names_Paul  names_Ringo
0    Paul             0           0           1            0
1  George             1           0           0            0
2   Ringo             0           0           0            1
3   Ringo             0           0           0            1
4    John             0           1           0            0
5    John             0           1           0            0
6    John             0           1           0            0

Label Count

Encodes categorical features as its count in the column.

lc = LabelCount(["names"])
lc.fit(df)
lc.transform(df)
    names  names_labelcount
0    Paul                 1
1  George                 1
2   Ringo                 2
3   Ringo                 2
4    John                 3
5    John                 3
6    John                 3

Rank Categories

Encodes categories as its count rank

rc = RankCategorical(["names"], inverse=False, new_column=False)
rc.fit(df)
rc.transform(df)
    names  names_rankcategorical
0    Paul                      4
1  George                      3
2   Ringo                      2
3   Ringo                      2
4    John                      1
5    John                      1
6    John                      1

Target Encoding

Encodes categories as its target mean

te = TargetEncoder(["names"], "target")
te.fit(df)
te.transform(df)
    names  target  names_target_encoding
0    Paul      10                   10.0
1  George       2                    2.0
2   Ringo       4                    4.5
3   Ringo       5                    4.5
4    John       1                    2.0
5    John       3                    2.0
6    John       2                    2.0

Dates

Feature creation based on date information

Time Delta

Expand Date

About

Utilities for the Python data analysis library Pandas


Languages

Language:Python 100.0%