Oracen-zz / MIDAS

Multiple imputation utilising denoising autoencoder for approximate Bayesian inference

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

In regards to the background of the model

MarKo9 opened this issue · comments

Hi,

First of all thanks for your work.
I am in the process of doing some testings and if possible would like a quick clarification. Is your model the same or influenced by the one in the paper below? "MIDA: Multiple Imputation using Denoising Autoencoders" by Lovedeep Gondara and Ke Wang.
https://arxiv.org/pdf/1705.02737.pdf

No, although I discovered that paper about 2 months after I started working on MIDAS. That concept has more in common with a chained equation approach (ie. MICE, Hmisc's areg.impute function) than MIDAS. In fact, it's more just like an ensemble of denoising autoencoders, meaning your training time scales with the number of imputations you require. MIDAS kind of draws on principles of variational inference to compute an approximate posterior. Train the model once, draw as many imputations as you require.

It's worth noting that Gondara and Wang don't exactly publicise their code or results. When I did eventually track down some code, it was on Gondara's github:
https://github.com/lgondara/loss-to-followup-DAE/blob/master/programs/Impute_LTF_DAE.py

If you look there, there's just an oblique Keras model, and it looks like noise isn't even dynamic for the representation they learn. Further, the loss function just appears to be MSE but the output layer is ReLU'd (ie. the output is constrained to be greater than zero). When combined with the StandardScaler transform they're using, they're going to have some difficulties. In the paper you discussed, they don't even address the issue of how missingness affects loss generation, which (from experience) is a key challenge. No mention of how they managed to bypass the challenge with softmax loss.

Also, it's worth noting; they changed the name of their paper in Feb 2018 by the looks of things. Their paper used to be called "Multiple Imputation Using Deep Denoising Autoencoders". Here's some correspondence with Andrew Gelman we had in Jan:
http://andrewgelman.com/2018/01/10/python-program-multivariate-missing-data-imputation-works-large-datasets/

In short; no we're not associated with them.

Hi there @Oracen and @ranjitlall !

Your repo is awesome! Really appreciate the work you put into it

Following-up with the above question, I'm wondering if you have a peer-reviewed (or preprint) paper about MIDAS? I'm wondering how I can cite your research to give you credit

Thanks!

Hii @Oracen ,

Is the paper you both were authoring finalized? I could not find it online.

Hi @asthallTD, @charuj,

We have now migrated MIDAS to this new repo where all future releases will be made available.

The current paper explaining the background of the model can be found on APSA Preprints here: https://doi.org/10.33774/apsa-2020-3tk40-v3

Best,
Tom