In regards to the background of the model

Question

In regards to the background of the model

MarKo9 opened this issue 6 years ago · comments

Hi,

First of all thanks for your work.
I am in the process of doing some testings and if possible would like a quick clarification. Is your model the same or influenced by the one in the paper below? "MIDA: Multiple Imputation using Denoising Autoencoders" by Lovedeep Gondara and Ke Wang.
https://arxiv.org/pdf/1705.02737.pdf

Oracen-zz · Answer 1 · Sun Jun 03 2018 20:48:06 GMT+0800 (China Standard Time)

No, although I discovered that paper about 2 months after I started working on MIDAS. That concept has more in common with a chained equation approach (ie. MICE, Hmisc's areg.impute function) than MIDAS. In fact, it's more just like an ensemble of denoising autoencoders, meaning your training time scales with the number of imputations you require. MIDAS kind of draws on principles of variational inference to compute an approximate posterior. Train the model once, draw as many imputations as you require.

It's worth noting that Gondara and Wang don't exactly publicise their code or results. When I did eventually track down some code, it was on Gondara's github:
https://github.com/lgondara/loss-to-followup-DAE/blob/master/programs/Impute_LTF_DAE.py

If you look there, there's just an oblique Keras model, and it looks like noise isn't even dynamic for the representation they learn. Further, the loss function just appears to be MSE but the output layer is ReLU'd (ie. the output is constrained to be greater than zero). When combined with the StandardScaler transform they're using, they're going to have some difficulties. In the paper you discussed, they don't even address the issue of how missingness affects loss generation, which (from experience) is a key challenge. No mention of how they managed to bypass the challenge with softmax loss.

Also, it's worth noting; they changed the name of their paper in Feb 2018 by the looks of things. Their paper used to be called "Multiple Imputation Using Deep Denoising Autoencoders". Here's some correspondence with Andrew Gelman we had in Jan:
http://andrewgelman.com/2018/01/10/python-program-multivariate-missing-data-imputation-works-large-datasets/

In short; no we're not associated with them.

Charu Jaiswal · Answer 2 · Thu Oct 25 2018 09:30:11 GMT+0800 (China Standard Time)

Hi there @Oracen and @ranjitlall !

Your repo is awesome! Really appreciate the work you put into it

Following-up with the above question, I'm wondering if you have a peer-reviewed (or preprint) paper about MIDAS? I'm wondering how I can cite your research to give you credit

Thanks!

Oracen-zz · Answer 3 · Thu Oct 25 2018 23:03:39 GMT+0800 (China Standard Time)

Hahaha good question! My coauthor Ranjit and myself are currently authoring a paper on it. Ranjit is a perfectionist and so it's taking a little longer than expected, but it should be in preprint soon. Sorry I can't be more specific at this stage. Regards, Alex

…

On Thu, Oct 25, 2018 at 11:30 AM Charu Jaiswal ***@***.***> wrote: Hi there @Oracen <https://github.com/Oracen> and @ranjitlall <https://github.com/ranjitlall> ! Your repo is awesome! Really appreciate the work you put into it Following-up with the above question, I'm wondering if you have a peer-reviewed (or preprint) paper about MIDAS? I'm wondering how I can cite your research to give you credit Thanks! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AYqrbBLTbhJEumA24vpaUypc67qUqST3ks5uoRQjgaJpZM4T0aPf> .

asthaIITD · Answer 4 · Sun Nov 03 2019 14:35:24 GMT+0800 (China Standard Time)

Hii @Oracen ,

Is the paper you both were authoring finalized? I could not find it online.

Tom Robinson · Answer 5 · Mon Apr 27 2020 20:48:24 GMT+0800 (China Standard Time)

Hi @asthallTD, @charuj,

We have now migrated MIDAS to this new repo where all future releases will be made available.

The current paper explaining the background of the model can be found on APSA Preprints here: https://doi.org/10.33774/apsa-2020-3tk40-v3

Best,
Tom