danijar / dreamerv2

Mastering Atari with Discrete World Models

Home Page:https://danijar.com/dreamerv2

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

No Improvement in Pong Scores after 18M+ Steps

opened this issue · comments

[Edited to update to 18M steps; images below are from 12M]

Starting a new thread with more relevant detail here. Please feel free to close if you don't think it's appropriate.

We've now trained several instances to at least 10M+ steps with no improvement in Pong scores. This is using the default Pong settings on V100 machines in Colab Pro.

All training settings are the default in the repo, no modifications have been made to the code base as this was a first "test run" of dreamer.

Below are performance graphs. Happy to provide Colab copy or log files if it would be helpful. Would appreciate any insight, even if it's that we need to allow longer training (though the chart in Appendix F appears to show Pong improving by this point in training?).

Will keep training in the meantime and update if anything changes.

Thank you.

[Below images are from 12M steps. However issue persists beyond 18M+ steps]

Screen Shot 2021-05-24 at 9 12 14 PM

Screen Shot 2021-05-24 at 9 12 23 PM

Screen Shot 2021-05-24 at 9 12 32 PM

Screen Shot 2021-05-24 at 9 12 38 PM

Screen Shot 2021-05-24 at 9 12 50 PM

13M
13M_2

Update: no improvement in scores at 18M+ steps. (We have two separate training instances seeing this same result as well, so it's not isolated to a single system.)

Screen Shot 2021-05-25 at 6 21 30 AM

You're right, it should have improved by now. Let's dig into this:

  • Are you using the latest commit of the repository? If not, which commit are you using?
  • Are you running the command line from the README or are you adding any additional flags?
  • I see that the FPS suddenly jumped up at 6M steps. What happened there?
  • Could you try a Pong run without mixed precision by setting --precision 32?
  • Could you give atari_name_this_game a try? Similar to Pong, it should learn pretty quickly.

Great!

  • Should be. Importing via !git clone https://github.com/danijar/dreamerv2.git
  • Yes I'm running via !python3 dreamerv2/train.py --logdir '/content/drive/MyDrive/logdir/atari_pong/dreamerv2/1' --configs defaults atari --task atari_pong
  • I noticed that as well. I suspect that that was when I switched to a V100 instance on Colab, but I can't be positive. Given that I'm training in Colab (Pro), there were several times when I had to reconnect and resume training from the cache/log files. (Perhaps something in that process caused some issues with the model?)
  • Sure I will try today
  • Yes will also attempt that

If it's helpful, here's a copy of the notebook (very straightforward): https://colab.research.google.com/drive/1iB9G5fNnrxfWZfplynU70RMkjbKxSv_k?usp=sharing

Sounds good, let me know how it goes. I didn't see anything suspicious in your colab except that the section for the train_openl image summary shows no images for me.

Looking at the episode length plot, it seems like the agent is learning something. Maybe it's really just taking a while to start making progress on Pong. Of course Pong can be solved much faster, but the hyper parameters were tuned to work well across all games at 200M steps, without focus on data-efficiency or easy games.

If the above ideas don't help find the problem, the next idea would be to train an agent at the first commit of the repository rather than after the refactoring. That said, I've tested the refactoring on Google machines and everything works fine there.

Early results (~2M steps) appear to be that the system is behaving more stably with --precision 32 flag. Will report back.

Update at 8M steps: despite some promising early behavior the agent has now settled into a zero-score, zero-hit behavior for the past ~4M steps:
Screen Shot 2021-05-27 at 6 04 14 AM
Screen Shot 2021-05-27 at 6 06 33 AM

Here is what I'm getting when training DreamerV2 on Pong 10 times (this uses mixed precision, so all flags at their defaults):

image

Yeah so there's definitely something going on. None of our training runs in Colab (up to 18M+ steps) achieved returns greater than about -19...

If you have a MuJoCo license, you could try running on a simple DMC task, e.g. dmc_walker_walk to see if the general algorithm works for you in Colab.

I probably have the same problem when running on my own computer:

Latest commit from git. Running python3 dreamerv2/train.py --logdir logdir/atari_pong/dreamerv2/3 --configs defaults atari --task atari_pong => single run to 10M steps with eval/train_return -21; 2 runs to 2M also with return -21.

Initial commit from git. Running python3 dreamer.py --logdir logdir/atari_pong/dreamerv2/04_initial_commit --configs defaults atari --task atari_pong and so far single run to 2M steps with return -7, short run but considering that from the latest commit returns never went above -19.

Quite few small runs so might be a random glitch, but it would seem that for some reason at least the latest commit (after the refactoring) won't train. The conda environment changed between those commits (because other one had tf 2.3 and the other 2.5). I might run some more trials next week.

Ok I at least made some progress here. I don't know if it's the full answer, but my agent is at least finally training (albeit slowly). Note this is Colab-specific:

Don't pip install anything except ruamel.yaml and elements. For everything else use the Colab default installs. (You'll have to install the Atari ROMS too).

The issue appears to be noted here: Tensorflow versions in Colab. It looks like Colab uses a custom-compiled version of tensorflow, so doing !pip3 install tensorflow can lead to poor-performing or non-functioning tf in Colab.

As noted above, my model is still training much slower than the results @danijar posted above (I've now trained to 10M frames with mean eval return of about -16). But this is the first time I've gotten the agent to escape -21 after many, many attempts. This suggests that the Colab tensorflow issue is a real one.

Again, this is still a far cry from positive Pong scores by 4M steps as shown in your plots above @danijar, but by using the built-in Colab installs my model finally at least appears to be learning.

Screen Shot 2021-05-28 at 6 28 02 AM

Screen Shot 2021-05-28 at 6 28 17 AM

Concretely, in Colab now my only imports are

!pip3 install ruamel.yaml
!pip3 install elements
# Install ROMs if necessary
!curl http://www.atarimania.com/roms/Roms.rar -O
!pip install unrar
!unrar x Roms.rar
!python -m atari_py.import_roms .

@holli That's good to know, thanks!

To both of you, if you could, it'd be great to know if the commit right before the refactoring commit still works for you (i.e. train at commit 1d4868f).

@danijar yep that commit just before refactoring works well, trains to +15 after 4M steps. So before refactoring everything trains like in your #8 (comment) example, but after refactoring nothing seems to train in my computer. Either the refactoring or some library change. All the stats in tensorboard seemed to start from similar points so not sure if those help.

I think I found the reason. Could you both retry with the current commit, please?

Yay, after a quick test it seems to train now.

What was the problem/fix?

Awesome!

It was a stupid mistake that sneaked in when I simplified the configs for the Github codebase. The default KL scale was defined as an integer so that the Atari config that sets it to 0.1 got floored to 0. This commit fixed it. I also updated elements to raise an error instead of converting floats to ints so the same mistake doesn't happen in the future.

Sorry for the slow reply @danijar. I have also tested and verified that the new codebase works in Colab as well and learns as expected!

(note this was using Colab versions of tensorflow, etc, as noted above).

Thank you for your help!