bigsnarfdude / kleaner

Some simple utilities for cleaning data automatically

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kleaner

A set of simple utilities for cleaning up data frame..

Install with pip

sudo pip install git+git://github.com/sketchytechky/kleaner.git

Getting started

import pandas as pd
from kleaner.kleaner import Kleaner

df = pd.read_csv('kaggle.csv')

kdf = Kleaner(df)

# get the healthiness of the kaggle.csv file
kdf.healthiness()

NOTES

  • Completeness - Referring to missing key information

    • % of nulll values of a column
  • Consistency - Referring to single representation of data

    • % of diversity of value
  • TimeSeries with Anomoly Detection would be great for DQ Stats

About

Some simple utilities for cleaning data automatically


Languages

Language:Python 100.0%