ericfourrier / auto-clean

Provide helpers in python (pandas and numpy) to perform automated cleaning and pre-processing steps.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

auto-clean

Travis-CI Build Status

Provide helpers in python (pandas, numpy, scipy) to perform automated cleaning and pre-processing steps.

CONTENTS OF THIS FILE

  • Introduction
  • Installation
  • Version
  • Usage
  • Thanks

INTRODUCTION

This package provide different classes for data cleaning

In this module you have :

  • An DataExploration class with methods to count missing values, detect constant col the DataExploration.structure provide a nice exploration summary.

  • An OutliersDetection class which is a simple class designed to detect 1d outliers

  • An NaImputer class which is a simple class designed to focus on missing values exploration and comprehension (Rubbin theory MCAR/MAR/MNAR)

  • A examples folder to find notebooks illustrating the package

  • A test.py file containing tests (you can run the test with $python -m unittest -v test)

INSTALLATION

Installation via pip is not available now (coming soon)

  1. Clone the project on your local computer.

  2. Run the following command

    • $ python setup.py install

VERSION

The current version is 0.1 (early release version). The module will be improved over time.

USAGE

To complete

Contributing to auto-clean

Anybody is welcome to do pull-request and check the existing issues for bugs or enhancements to work on. If you have an idea for an extension to auto-clean, create a new issue so we can discuss it.

THANKS

Thanks to all the creator and contributors of the package we are using.

About

Provide helpers in python (pandas and numpy) to perform automated cleaning and pre-processing steps.

License:MIT License


Languages

Language:Python 100.0%