Default setting doesn't seem to be learning

Question

Default setting doesn't seem to be learning

nickuncaged1201 opened this issue 3 years ago · comments

Thanks for the updated release. I just downloaded the code and made a fresh environment as detailed in the readme. I tried to train the script with everything set to default by simply running "python dreamerv2/train.py --logdir ./logdir/atari_pong --configs defaults atari --task atari_pong". After 50k steps, the return doesn't seem to increase at all. The atari pong task should have a random reward of around -20 and what I got so far is just that. Any suggestion on why this is the case?

Here is the configs.yaml just in case you need it. The only place I changed in the code is the steps in line 8 and 77 where I reduce them to 1e7. Even at a fewer number of steps, I think I should be expecting some improvements in return.

defaults:

Train Script

logdir: /dev/null
seed: 0
task: dmc_walker_walk
num_envs: 1
steps: 1e7
eval_every: 1e5
action_repeat: 1
time_limit: 0
prefill: 10000
image_size: [64, 64]
grayscale: False
replay_size: 2e6
dataset: {batch: 50, length: 50, oversample_ends: True}
train_gifs: False
precision: 16
jit: True

Agent

log_every: 1e4
train_every: 5
train_steps: 1
pretrain: 0
clip_rewards: identity
expl_noise: 0.0
expl_behavior: greedy
expl_until: 0
eval_noise: 0.0
eval_state_mean: False

World Model

pred_discount: True
grad_heads: [image, reward, discount]
rssm: {hidden: 400, deter: 400, stoch: 32, discrete: 32, act: elu, std_act: sigmoid2, min_std: 0.1}
encoder: {depth: 48, act: elu, kernels: [4, 4, 4, 4], keys: [image]}
decoder: {depth: 48, act: elu, kernels: [5, 5, 6, 6]}
reward_head: {layers: 4, units: 400, act: elu, dist: mse}
discount_head: {layers: 4, units: 400, act: elu, dist: binary}
loss_scales: {kl: 1, reward: 1, discount: 1}
kl: {free: 0.0, forward: False, balance: 0.8, free_avg: True}
model_opt: {opt: adam, lr: 3e-4, eps: 1e-5, clip: 100, wd: 1e-6}

Actor Critic

actor: {layers: 4, units: 400, act: elu, dist: trunc_normal, min_std: 0.1}
critic: {layers: 4, units: 400, act: elu, dist: mse}
actor_opt: {opt: adam, lr: 1e-4, eps: 1e-5, clip: 100, wd: 1e-6}
critic_opt: {opt: adam, lr: 1e-4, eps: 1e-5, clip: 100, wd: 1e-6}
discount: 0.99
discount_lambda: 0.95
imag_horizon: 15
actor_grad: both
actor_grad_mix: '0.1'
actor_ent: '1e-4'
slow_target: True
slow_target_update: 100
slow_target_fraction: 1

Exploration

expl_extr_scale: 0.0
expl_intr_scale: 1.0
expl_opt: {opt: adam, lr: 3e-4, eps: 1e-5, clip: 100, wd: 1e-6}
expl_head: {layers: 4, units: 400, act: elu, dist: mse}
disag_target: stoch
disag_log: True
disag_models: 10
disag_offset: 1
disag_action_cond: True
expl_model_loss: kl

atari:

task: atari_pong
time_limit: 108000 # 30 minutes of game play.
action_repeat: 4
steps: 1e7
eval_every: 1e5
log_every: 1e5
prefill: 200000
grayscale: True
train_every: 16
clip_rewards: tanh
rssm: {hidden: 600, deter: 600, stoch: 32, discrete: 32}
actor.dist: onehot
model_opt.lr: 2e-4
actor_opt.lr: 4e-5
critic_opt.lr: 1e-4
actor_ent: 1e-3
discount: 0.999
actor_grad: reinforce
actor_grad_mix: 0
loss_scales.kl: 0.1
loss_scales.discount: 5.0
.*.wd$: 1e-6

dmc:

task: dmc_walker_walk
time_limit: 1000
action_repeat: 2
eval_every: 1e4
log_every: 1e4
prefill: 5000
train_every: 5
pretrain: 100
pred_discount: False
grad_heads: [image, reward]
rssm: {hidden: 200, deter: 200}
model_opt.lr: 3e-4
actor_opt.lr: 8e-5
critic_opt.lr: 8e-5
actor_ent: 1e-4
discount: 0.99
actor_grad: dynamics
kl.free: 1.0
dataset.oversample_ends: False

debug:

jit: False
time_limit: 100
eval_every: 300
log_every: 300
prefill: 100
pretrain: 1
train_steps: 1
dataset.batch: 10
dataset.length: 10

Deleted user · Answer 1 · Sun May 23 2021 21:53:03 GMT+0800 (China Standard Time)

Can confirm we're seeing the same issue. @nickuncaged1201 please report back if you figure out any settings that actually learn... Thanks.

Danijar Hafner · Answer 2 · Sun May 23 2021 22:03:50 GMT+0800 (China Standard Time)

Hi, you need to train for more than 50k steps. Try at least a few million steps. In case it still doesn't train, report back and I'll reopen the issue.

Nick Li · Answer 3 · Mon May 24 2021 05:40:08 GMT+0800 (China Standard Time)

By the time of reporthing this, I have trained it for 5 millions steps. The training return right now is about -20, -19 is the highest I have seen so far. The training setting essentials are still the same, with only minor changes to log and eval frequency. Is this considered an improvement with steps?

Deleted user · Answer 4 · Mon May 24 2021 08:44:28 GMT+0800 (China Standard Time)

Here is what mine looks like after 2.9M steps. My returns are consistent with what you're reporting @nickuncaged1201 :

Deleted user · Answer 5 · Mon May 24 2021 08:50:11 GMT+0800 (China Standard Time)

Not a bad strategy actually if it can start connecting. Will update if/when I get to 5M+

Deleted user · Answer 6 · Tue May 25 2021 04:18:27 GMT+0800 (China Standard Time)

@danijar If you wouldn't mind advising: my team and I have now trained two separate models to 8M+ steps with the default settings on Pong and are still seeing no improvement in game score. Inferring from the chart in Appendix F of the paper, it appears that by 8M steps we should be close to the slope of rapid improvement in Pong? Would you mind advising whether we are seeing the expected behavior? I realize we're still at only 4% of the 200M frames reported in the paper, however Appendix F makes it appear we should already be seeing results with Pong by this point. Would appreciate your input. Thank you. A shot of a few of the graphs and eval videos attached (at current timestep the agent has again begun holding the "down" button.)

Danijar Hafner · Answer 7 · Wed May 26 2021 22:08:04 GMT+0800 (China Standard Time)

Discussion continuing here: #8