pandas lightning

pandas-lightning is an API designed to abstract common patterns and idioms in pandas. You can expect the following:

Reduce repeated code
Make code more readable
Manage sequences of operations
Prototype features more quickly
Access methods intuitively using convenience functions

Install

pip install git+https://github.com/remykarem/pandas-lightning#egg=pandas-lightning

Features

>>> import pandas as pd
>>> import pandas_lightning

Dataframe accessors like :code:.lambdas and series accessors like :code:scaler will be available to your DataFrame and Series objects.

Some features include:

Change types
Create new features
Apply a sequence of functions to DataFrame
Drop columns with rules
Categorical binning

Change types

Previously:

>>> df = df.set_index("PassengerId")
>>> df["Name"] = df["Name"].astype(str)
>>> df["Sex"] = df["Sex"].astype("category")
>>> df["Embarked"] = df["Embarked"].astype("category")
>>> df["Pclass"] = df["Pclass"].astype(CategoricalDtype([3, 2, 1], ordered=True)

Now:

>>> df = df.cast(
...     PassengerId="index",
...     Name=str,
...     Sex="category",
...     Embarked="category",
...     Pclass=[3, 2, 1])

Create new features

Previously:

>>> df["Cabin"] = df["Cabin"].str[0]
>>> df["HasCabinCode"] = ~df["Cabin"].isna()
>>> df["HasDep"] = df["SibSp"] + df["Parch"] > 0
>>> df["HasLetters"] = df["Ticket"].str.startswith(tuple(string.ascii_letters))

Now:

>>> df = df.add_columns(
...   Cabin=lambda s: s.str[0],
...   HasCabinCode=("Cabin", lambda s: ~s.isna()),
...   HasDep=(["SibSp", "Parch"], lambda s, t: (s+t) > 0),
...   HasLetters=("Ticket", lambda s: s.str.startswith(tuple(string.ascii_letters)))

Apply a sequence of functions

Define some functions

>>> def drop_some_columns(data):
...     ...
...     return data
>>> def reindex(data):
...     ...
...     return data
>>> def rename_columns(data):
...     ...
...     return data

Previously:

>>> df = drop_some_columns(df)
>>> df = rename_columns(df)
>>> df = reindex(df)

Now:

>>> df = df.lambdas.dapply(
...     drop_some_columns,
...     rename_columns,
...     reindex
... )

Drop columns with rules

>>> df = pd.DataFrame({"X": [np.nan, np.nan, np.nan, np.nan, "hey"],
...                    "Y": [0, np.nan, 0, 0, 1],
...                    "Z": [1, 9, 5, 4, 2]})

>>> df.lambdas.drop_columns_with_rules(
...     lambda s: s.pctg.nans > 0.75,
...     lambda s: s.pctg.zeros > 0.5)
   Z
0  1
1  9
2  5
3  4
4  2

Categorical binning

>>> sr = pd.Series(["apple", "spinach", "cashew", "pear", "kailan",
...                 "macadamia", "orange"])
>>> sr
0        apple
1      spinach
2       cashew
3         pear
4       kailan
5    macadamia
6       orange
dtype: object

>>> GROUPS = {
...     "fruits": ["apple", "pear", "orange"],
...     "vegetables": ["kailan", "spinach"],
...     "nuts": ["cashew", "macadamia"]}
>>> sr.map_categorical_binning(GROUPS)
0        fruits
1    vegetables
2          nuts
3        fruits
4    vegetables
5          nuts
6        fruits
dtype: category
Categories (3, object): [fruits, vegetables, nuts]

Roadmap

Hashing
Pipelining

remykarem / pandas-lightning

pandas lightning

Install

Features

Change types

Create new features

Apply a sequence of functions

Drop columns with rules

Categorical binning

Roadmap

About

Languages