[DDFG] Complete MNAR missingness generation
mm-abogdan opened this issue · comments
Complete mnar
method in the Corruptor
class.
Simplified, MNAR (Missing Not at Random) is a type of missingness in which the probability of a value being missing is conditional (in whole or in part) on unobserved data. Missingness may be simultaneously conditional on observed data in addition to unobserved data.
Implementation: Generate a random selection of new features and base missingness on these features. The number of features to generate may be based on some fraction of the existing features, or a random number between 1 - n_features. These features could (should?) be a mix of continuous & categorical; this could be based on the fraction of each respective feature type in the existing features. Once generated, impose missingness based on these new features.
Be sure that functions accept & return matrices.
Be sure to follow the 4 steps outlined in contributing.md
The below labels are for DDFG (Data Days for Good) participant reference:
Priority: High
Difficulty: Medium
impyute/impyute/dataset/corrupt.py
Lines 48 to 50 in 2c25368