eltonlaw / impyute

Data imputations library to preprocess datasets with missing data

Home Page:http://impyute.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[DDFG] Complete MNAR missingness generation

mm-abogdan opened this issue · comments

Complete mnar method in the Corruptor class.

Simplified, MNAR (Missing Not at Random) is a type of missingness in which the probability of a value being missing is conditional (in whole or in part) on unobserved data. Missingness may be simultaneously conditional on observed data in addition to unobserved data.

Implementation: Generate a random selection of new features and base missingness on these features. The number of features to generate may be based on some fraction of the existing features, or a random number between 1 - n_features. These features could (should?) be a mix of continuous & categorical; this could be based on the fraction of each respective feature type in the existing features. Once generated, impose missingness based on these new features.

Be sure that functions accept & return matrices.
Be sure to follow the 4 steps outlined in contributing.md

The below labels are for DDFG (Data Days for Good) participant reference:
Priority: High
Difficulty: Medium

def mnar(self):
""" Overwrites values with MNAR placed NaN's """
pass