Using MANN in A3C (reinforcement learning model)

Question

Using MANN in A3C (reinforcement learning model)

TheMnBN opened this issue 6 years ago · comments

Hi there,

I'm trying to integrate a memory network into an A3C agent. For reference, I followed closely this implementation of A3C: https://github.com/awjuliani/DeepRL-Agents/blob/master/A3C-Doom.ipynb

My aim is to replace the LSTM layer with a MANN module. This might be a far-fetched question but do you have any advice for me when refactoring your MANN implementation for my particular purpose?

Snowkylin Lazarus · Answer 1 · Wed Jan 16 2019 22:31:43 GMT+0800 (China Standard Time)

Generally speaking, MANN is not so easy to get converged as other RNN models are, and a blind combination can result in severe instability of training. I take a lot of time to finally get it converged on the omniglot dataset demonstrated in the original paper. So please prepare enough time and patience, and you may need to adjust the model to fit your task. Good luck!

Dat Nguyen · Answer 2 · Wed Jan 16 2019 22:47:06 GMT+0800 (China Standard Time)

Thanks so much for replying!
You're absolutely correct. RL by itself can already go horribly wrong under various (and usually unknown) circumstances. I couldn't find any working implementation of memory-augmented RL models (open-sourced or from authors of original papers) so I have to do it myself. Naively combining memory net to RL is not technically a well-motivated approach but I'm still implementing it as baseline for my research.

If you don't mind keeping this issue thread open, I would like to continue this discussion here.

Dat Nguyen · Answer 3 · Sat Jan 19 2019 15:25:48 GMT+0800 (China Standard Time)

I have 1 operation tf.nn.dynamic_rnn in my computation graph. I'm thinking of replacing that op with tf.while_loop whose body is the MANN operations. Do you think this approach makes sense?
I'm aware that you used 'for' loop in your model so I will try both and see which works. Either way, I need to find a way to terminate the loop, i.e. define a condition for tf.while_loop or a sequence length for 'for' loop.