a-ozbek / impyute

Data imputations library to preprocess datasets with missing data

Home Page:http://impyute.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

https://travis-ci.org/eltonlaw/impyute.svg?branch=master

Impyute

Impyute is a library of missing data imputation algorithms written in Python 3. This library was designed to be super lightweight, here's a sneak peak at what impyute can do.

>>> from impyute.datasets import random_uniform
>>> raw_data = random_uniform(shape=(5, 5), missingness="mcar", th=0.2)
>>> print(raw_data)
[[  1.   0.   4.   0.   1.]
 [  1.  nan   6.   4.  nan]
 [  5.   0.  nan   1.   3.]
 [  2.   1.   5.   4.   6.]
 [  2.   1.   0.   0.   6.]]
>>> from impyute.imputations.cs import mean_imputation
>>> complete_data = random_imputation(raw_data)
>>> print(complete_data)
[[ 1.    0.    4.    0.    1.  ]
 [ 1.    0.5   6.    4.    4.  ]
 [ 5.    0.    3.75  1.    3.  ]
 [ 2.    1.    5.    4.    6.  ]
 [ 2.    1.    0.    0.    6.  ]]

Feature Support

  • Imputation of Cross Sectional Data
    • Multivariate Imputation by Chained Equations
    • Expectation Maximization
    • Mean Imputation
    • Mode Imputation
    • Median Imputation
    • Random Imputation
  • Imputation of Time Series Data
    • Last Observation Carried Forward
    • Autoregressive Integrated Moving Average (WIP)
    • Expectation Maximization with the Kalman Filter (WIP)
  • Dataset Generation
    • Datasets
      • MNIST
      • Random uniform distribution
      • Random gaussian distribution
    • Missingness Corruptors
      • MCAR
      • MAR (WIP)
      • MNAR (WIP)
  • Diagnostic Tools
    • Loggers
    • Distribution of Null Values
    • Comparison of imputations
    • Little's MCAR Test (WIP)

Installation

To install impyute, run the following:

$ pip install impyute

Or to get the most current version:

$ git clone https://github.com/eltonlaw/impyute
$ cd impyute
$ python setup.py install

Documentation

Documentation is available here: http://impyute.readthedocs.io/

How to Contribute

Check out CONTRIBUTING

About

Data imputations library to preprocess datasets with missing data

http://impyute.readthedocs.io/

License:GNU General Public License v3.0


Languages

Language:Python 100.0%