HI-VAE on ICU data

This repository is a fork of the HI-VAE model built by Nazabal et al. and contains the modifed implementation of the Heterogeneous Incomplete Variational Autoendoder model (HI-VAE) for ICU data from the WiDS Datathon 2020.

Databse description

There are three different datasets considered in the experiments (Wine, Adult and Default Credit). Each dataset has each own folder, containing:

data.csv: the dataset
data_types.csv: a csv containing the types of that particular dataset. Every line is a different attribute containing three paramenters:
- type: real, pos (positive), cat (categorical), ord (ordinal), count
- dim: dimension of the variable
- nclass: number of categories (for cat and ord)
Missingxx_y.csv: a csv containing the positions of the different missing values in the data. Each "y" mask was generated randomly, containing a "xx" % of missing values.

You can add your own datasets as long as they follow this structure.

Files description

script_HIVAE.sh: A script with a simple example on how to run the models.
main_scripts.py: Contains the main code for the HIVAE models.
loglik_ models_ missing_normalize.py: In this file, the different likelihood models for the different types of variables considered (real, positive, count, categorical and ordinal) are included.
model_ HIVAE_inputDropout.py: Contains the HI-VAE with input dropout encoder model.
model_ HIVAE_factorized.py: Contains the HI-VAE with factorized encoder model
hospital/scripts.py: Generates the required files. Change line 136 for different sets of variables.

Code Pre-requisites

First,

$ git clone https://github.com/amirhk/mace.git
$ pip install virtualenv
$ cd mace
$ virtualenv -p python3 _venv
$ source _venv/bin/activate
$ pip install -r pip_requirements.txt
$ chmod +x script_HIVAE.sh

Then, run

$ ./script_HIVAE.sh

jonatanvm / HI-VAE

HI-VAE on ICU data

Databse description

Files description

Code Pre-requisites

About

Languages