rfsan / summo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Summo

Summo is a Python package to summarize a dataset information

import summo
import pandas as pd

df = pd.DataFrame(
    {
        "a": [1, 2, None, 2, None],
        "b": [4, 5, 6, 5, None],
        "c": ["a", "b", None, "d", None],
    }
)
summary = summo.summary(df)

summary is a dict that looks like

{
    "table": {
        "rows": 5,
        "columns": 3,
        "rows_duplicated": 0,
        "rows_all_na_count": 1,
        "rows_all_na_pct": 0.2,
    },
    "columns": {
        "a": {
            "na_count": 2,
            "na_pct": 0.4,
            "unique": False,
            "dtype": "float64",
            "median": 2.0,
            "mean": 1.666666, 
        },
        "b": {
            "na_count": 1,
            "na_pct": 0.2,
            "unique": False,
            "dtype": "float64",
            "median": 5.0,
            "mean": 5.0, 
        },
        "c": {
            "na_count": 2,
            "na_pct": 0.4,
            "unique": False,
            "dtype": "object",
        },
    },
}

Installation

  • pip install summo

About

License:MIT License


Languages

Language:Python 100.0%