MarcoMeter / episodic-transformer-memory-ppo

Clean baseline implementation of PPO using an episodic TransformerXL memory

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Baseline results on GRU and LSTM on MemoryGym

subho406 opened this issue · comments

Hi,

Thanks for the amazing implementation. I was wondering if would be possible to release the baseline implementations along with the hyperparameters used for GRU and LSTM on the Memory Gym environment (https://openreview.net/pdf?id=jHc8dCx6DDr)? I hoping to use MemoryGym for my thesis work and it will be extremely helpful. Thanks!

Hello!

The results are produced using neroRL (develop branch).
We are currently updating our GRU baseline repository to support Memory Gym. The develop branch should be functional, but we still need to reproduce our results, which is the last step toward merging it to the main branch. This should be done within the next two weeks. So feel free to use neroRL for training now and you can use the other repository to follow our implementation concept more easily.

Also we found better hyperparameters for MMGrid and MPGrid using optuna that we have just implemented in neroRL (develop):

  MM Grid MP Grid
Worker 32 32
Worker steps 512 512
Epochs 3 3
Num Minibatches 8 8
gamma 0.995 0.995
lamda 0.95 0.95
value loss coefficient 0.5 0.5
advantage normalization batch none
max grad norm 0.25 0.25
clip range 0.1 0.2
init learning rate 2.50E-04 2.75E-04
fina learning rate 1.00E-05 1.00E-05
init entropy coefficient 0.0001 0.001
fina entropy coefficient 0.000001 0.000001
RECURRENCE    
num layers 1 1
layer type GRU GRU
sequence length -1 -1
hidden state size 512 512
residual TRUE FALSE
updates 5000 10000