No Improvement in Pong Scores after 18M+ Steps

Question

No Improvement in Pong Scores after 18M+ Steps

opened this issue 3 years ago · comments

[Edited to update to 18M steps; images below are from 12M]

Starting a new thread with more relevant detail here. Please feel free to close if you don't think it's appropriate.

We've now trained several instances to at least 10M+ steps with no improvement in Pong scores. This is using the default Pong settings on V100 machines in Colab Pro.

All training settings are the default in the repo, no modifications have been made to the code base as this was a first "test run" of dreamer.

Below are performance graphs. Happy to provide Colab copy or log files if it would be helpful. Would appreciate any insight, even if it's that we need to allow longer training (though the chart in Appendix F appears to show Pong improving by this point in training?).

Will keep training in the meantime and update if anything changes.

Thank you.

[Below images are from 12M steps. However issue persists beyond 18M+ steps]

Deleted user · Answer 1 · Tue May 25 2021 21:22:21 GMT+0800 (China Standard Time)

Update: no improvement in scores at 18M+ steps. (We have two separate training instances seeing this same result as well, so it's not isolated to a single system.)

Danijar Hafner · Answer 2 · Wed May 26 2021 22:04:42 GMT+0800 (China Standard Time)

You're right, it should have improved by now. Let's dig into this:

Are you using the latest commit of the repository? If not, which commit are you using?
Are you running the command line from the README or are you adding any additional flags?
I see that the FPS suddenly jumped up at 6M steps. What happened there?
Could you try a Pong run without mixed precision by setting --precision 32?
Could you give atari_name_this_game a try? Similar to Pong, it should learn pretty quickly.

Deleted user · Answer 3 · Thu May 27 2021 01:42:58 GMT+0800 (China Standard Time)

Great!

Should be. Importing via !git clone https://github.com/danijar/dreamerv2.git
Yes I'm running via !python3 dreamerv2/train.py --logdir '/content/drive/MyDrive/logdir/atari_pong/dreamerv2/1' --configs defaults atari --task atari_pong
I noticed that as well. I suspect that that was when I switched to a V100 instance on Colab, but I can't be positive. Given that I'm training in Colab (Pro), there were several times when I had to reconnect and resume training from the cache/log files. (Perhaps something in that process caused some issues with the model?)
Sure I will try today
Yes will also attempt that

If it's helpful, here's a copy of the notebook (very straightforward): https://colab.research.google.com/drive/1iB9G5fNnrxfWZfplynU70RMkjbKxSv_k?usp=sharing

Danijar Hafner · Answer 4 · Thu May 27 2021 05:08:46 GMT+0800 (China Standard Time)

Sounds good, let me know how it goes. I didn't see anything suspicious in your colab except that the section for the train_openl image summary shows no images for me.

Looking at the episode length plot, it seems like the agent is learning something. Maybe it's really just taking a while to start making progress on Pong. Of course Pong can be solved much faster, but the hyper parameters were tuned to work well across all games at 200M steps, without focus on data-efficiency or easy games.

If the above ideas don't help find the problem, the next idea would be to train an agent at the first commit of the repository rather than after the refactoring. That said, I've tested the refactoring on Google machines and everything works fine there.

Deleted user · Answer 5 · Thu May 27 2021 06:05:00 GMT+0800 (China Standard Time)

Early results (~2M steps) appear to be that the system is behaving more stably with --precision 32 flag. Will report back.

Deleted user · Answer 6 · Thu May 27 2021 21:08:02 GMT+0800 (China Standard Time)

Update at 8M steps: despite some promising early behavior the agent has now settled into a zero-score, zero-hit behavior for the past ~4M steps:

Danijar Hafner · Answer 7 · Fri May 28 2021 03:35:13 GMT+0800 (China Standard Time)

Here is what I'm getting when training DreamerV2 on Pong 10 times (this uses mixed precision, so all flags at their defaults):

Deleted user · Answer 8 · Fri May 28 2021 03:54:21 GMT+0800 (China Standard Time)

Yeah so there's definitely something going on. None of our training runs in Colab (up to 18M+ steps) achieved returns greater than about -19...

Danijar Hafner · Answer 9 · Fri May 28 2021 10:11:18 GMT+0800 (China Standard Time)

If you have a MuJoCo license, you could try running on a simple DMC task, e.g. dmc_walker_walk to see if the general algorithm works for you in Colab.

Olli Huotari · Answer 10 · Fri May 28 2021 20:59:39 GMT+0800 (China Standard Time)

I probably have the same problem when running on my own computer:

Latest commit from git. Running python3 dreamerv2/train.py --logdir logdir/atari_pong/dreamerv2/3 --configs defaults atari --task atari_pong => single run to 10M steps with eval/train_return -21; 2 runs to 2M also with return -21.

Initial commit from git. Running python3 dreamer.py --logdir logdir/atari_pong/dreamerv2/04_initial_commit --configs defaults atari --task atari_pong and so far single run to 2M steps with return -7, short run but considering that from the latest commit returns never went above -19.

Quite few small runs so might be a random glitch, but it would seem that for some reason at least the latest commit (after the refactoring) won't train. The conda environment changed between those commits (because other one had tf 2.3 and the other 2.5). I might run some more trials next week.

Deleted user · Answer 11 · Fri May 28 2021 21:31:47 GMT+0800 (China Standard Time)

Ok I at least made some progress here. I don't know if it's the full answer, but my agent is at least finally training (albeit slowly). Note this is Colab-specific:

Don't pip install anything except ruamel.yaml and elements. For everything else use the Colab default installs. (You'll have to install the Atari ROMS too).

The issue appears to be noted here: Tensorflow versions in Colab. It looks like Colab uses a custom-compiled version of tensorflow, so doing !pip3 install tensorflow can lead to poor-performing or non-functioning tf in Colab.

As noted above, my model is still training much slower than the results @danijar posted above (I've now trained to 10M frames with mean eval return of about -16). But this is the first time I've gotten the agent to escape -21 after many, many attempts. This suggests that the Colab tensorflow issue is a real one.

Again, this is still a far cry from positive Pong scores by 4M steps as shown in your plots above @danijar, but by using the built-in Colab installs my model finally at least appears to be learning.

Concretely, in Colab now my only imports are

!pip3 install ruamel.yaml
!pip3 install elements
# Install ROMs if necessary
!curl http://www.atarimania.com/roms/Roms.rar -O
!pip install unrar
!unrar x Roms.rar
!python -m atari_py.import_roms .

Danijar Hafner · Answer 12 · Sat May 29 2021 04:49:57 GMT+0800 (China Standard Time)

@holli That's good to know, thanks!

To both of you, if you could, it'd be great to know if the commit right before the refactoring commit still works for you (i.e. train at commit 1d4868f).

Olli Huotari · Answer 13 · Mon May 31 2021 03:22:38 GMT+0800 (China Standard Time)

@danijar yep that commit just before refactoring works well, trains to +15 after 4M steps. So before refactoring everything trains like in your #8 (comment) example, but after refactoring nothing seems to train in my computer. Either the refactoring or some library change. All the stats in tensorboard seemed to start from similar points so not sure if those help.

Danijar Hafner · Answer 14 · Tue Jun 01 2021 00:09:45 GMT+0800 (China Standard Time)

I think I found the reason. Could you both retry with the current commit, please?

Olli Huotari · Answer 15 · Tue Jun 01 2021 05:40:54 GMT+0800 (China Standard Time)

Yay, after a quick test it seems to train now.

What was the problem/fix?

Danijar Hafner · Answer 16 · Wed Jun 02 2021 08:51:09 GMT+0800 (China Standard Time)

Awesome!

It was a stupid mistake that sneaked in when I simplified the configs for the Github codebase. The default KL scale was defined as an integer so that the Atari config that sets it to 0.1 got floored to 0. This commit fixed it. I also updated elements to raise an error instead of converting floats to ints so the same mistake doesn't happen in the future.

Deleted user · Answer 17 · Fri Jun 04 2021 10:04:06 GMT+0800 (China Standard Time)

Sorry for the slow reply @danijar. I have also tested and verified that the new codebase works in Colab as well and learns as expected!

(note this was using Colab versions of tensorflow, etc, as noted above).

Thank you for your help!