Low SPS with `run_impala.py`

Question

Low SPS with `run_impala.py`

vwxyzjn opened this issue 2 years ago · comments

Hello, I was trying to run the IMPALA example with Atari on my personal machine w/ two GPUs and a CPU with 24 cores, but the Steps per Second (SPS) looked suspiciously low to me (around 100 SPS) I am probably missing something obvious and would appreciate your help.

Reproduction

I cloned the latest repo and ran

pip install .[jax,tf,testing,envs]
pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
XLA_PYTHON_CLIENT_MEM_FRACTION=0.7 python examples/baselines/rl_discrete/run_impala.py

The output is as follows:

XLA_PYTHON_CLIENT_MEM_FRACTION=0.7 python examples/baselines/rl_discrete/run_impala.py                 
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/tensorflow_probability/python/__init__.py:57: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if (distutils.version.LooseVersion(tf.__version__) <
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/sonnet/src/types.py:34: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  BoolLike = Union[bool, np.bool, TensorLike]
I0925 18:51:13.223871 140563283464576 xla_bridge.py:350] Unable to initialize backend 'tpu_driver': NOT_FOUND: Unable to find driver in registry given worker: 
I0925 18:51:13.294728 140563283464576 xla_bridge.py:350] Unable to initialize backend 'rocm': NOT_FOUND: Could not find registered platform with name: "rocm". Available platform names are: CUDA Interpreter Host
I0925 18:51:13.294956 140563283464576 xla_bridge.py:350] Unable to initialize backend 'tpu': module 'jaxlib.xla_extension' has no attribute 'get_tpu_client'
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/gym/envs/registration.py:592: UserWarning: WARN: The environment PongNoFrameskip-v0 is out of date. You should consider upgrading to version `v4`.
  logger.warn(
A.L.E: Arcade Learning Environment (version 0.7.5+db37282)
[Powered by Stella]
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/gym/core.py:329: DeprecationWarning: WARN: Initializing wrapper in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/gym/wrappers/step_api_compatibility.py:39: DeprecationWarning: WARN: Initializing environment in old step API which returns one bool instead of two. It is recommended to set `new_step_api=True` to use new step API. This will be the default behaviour in future.
  deprecation(
[reverb/cc/platform/tfrecord_checkpointer.cc:162]  Initializing TFRecordCheckpointer in /tmp/tmps0ey6yu4.
[reverb/cc/platform/tfrecord_checkpointer.cc:552] Loading latest checkpoint from /tmp/tmps0ey6yu4
[reverb/cc/platform/default/server.cc:71] Started replay server on port 41887
[reverb/cc/client.cc:165] Sampler and server are owned by the same process (3559406) so Table priority_table is accessed directly without gRPC.
I0925 18:51:14.566777 140563283464576 csv.py:76] Logging to /home/costa/acme/20220925-185110/logs/learner/logs.csv
I0925 18:51:14.567045 140563283464576 learning.py:63] Learner process id: 0. Devices passed: None
I0925 18:51:14.567092 140563283464576 learning.py:65] Learner process id: 0. Local devices from JAX API: [GpuDevice(id=0, process_index=0)]
I0925 18:51:24.940803 140563283464576 csv.py:76] Logging to /home/costa/acme/20220925-185110/logs/actor/logs.csv
I0925 18:51:24.941244 140563283464576 savers.py:164] Attempting to restore checkpoint: None
I0925 18:51:24.942273 140563283464576 csv.py:76] Logging to /home/costa/acme/20220925-185110/logs/evaluator/logs.csv
/home/costa/.cache/pypoetry/virtualenvs/acme-s3qBA_6h-py3.9/lib/python3.9/site-packages/gym/utils/passive_env_checker.py:227: DeprecationWarning: WARN: Core environment is written in old step API which returns one bool instead of two. It is recommended to rewrite the environment with new step API. 
  logger.deprecation(
I0925 18:51:33.700813 140563283464576 terminal.py:91] [Evaluator] Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 1 | Evaluator Steps = 765 | Steps Per Second = 87.362
I0925 18:51:34.939988 140563283464576 savers.py:155] Saving checkpoint: /home/costa/acme/20220925-185110/checkpoints/default
I0925 18:51:52.013082 140563283464576 terminal.py:91] [Learner] Critic Loss = 0.14232924580574036 | Entropy = -2.742913007736206 | Entropy Loss = -0.01371456403285265 | Evaluator Episodes = 1 | Evaluator Steps = 765 | Learner Steps = 1 | Learner Time Elapsed = 10.717 | Loss = -0.727422833442688 | Param Norm = 48.737998962402344 | Param Updates Norm = 0.1969769448041916 | Policy Loss = -0.856037437915802
I0925 18:51:54.062487 140563283464576 terminal.py:91] [Actor] Actor Episodes = 1 | Actor Steps = 825 | Episode Length = 825 | Episode Return = -21.0 | Evaluator Episodes = 1 | Evaluator Steps = 765 | Learner Steps = 1 | Learner Time Elapsed = 10.717 | Steps Per Second = 40.519
I0925 18:52:01.527921 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 1 | Actor Steps = 825 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 2 | Evaluator Steps = 1530 | Learner Steps = 1 | Learner Time Elapsed = 10.717 | Steps Per Second = 102.483
I0925 18:52:06.416487 140563283464576 terminal.py:91] [Learner] Actor Episodes = 1 | Actor Steps = 825 | Critic Loss = 0.09712472558021545 | Entropy = -2.76863431930542 | Entropy Loss = -0.013843171298503876 | Evaluator Episodes = 2 | Evaluator Steps = 1530 | Learner Steps = 2 | Learner Time Elapsed = 10.830 | Loss = -0.37379246950149536 | Param Norm = 48.738494873046875 | Param Updates Norm = 0.19555175304412842 | Policy Loss = -0.45707404613494873
I0925 18:52:10.536827 140563283464576 terminal.py:91] [Actor] Actor Episodes = 2 | Actor Steps = 1678 | Episode Length = 853 | Episode Return = -21.0 | Evaluator Episodes = 2 | Evaluator Steps = 1530 | Learner Steps = 2 | Learner Time Elapsed = 10.830 | Steps Per Second = 94.689
I0925 18:52:18.137068 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 2 | Actor Steps = 1678 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 3 | Evaluator Steps = 2295 | Learner Steps = 2 | Learner Time Elapsed = 10.830 | Steps Per Second = 100.660
I0925 18:52:20.667684 140563283464576 terminal.py:91] [Learner] Actor Episodes = 2 | Actor Steps = 1678 | Critic Loss = 0.125137060880661 | Entropy = -2.776602268218994 | Entropy Loss = -0.013883009552955627 | Evaluator Episodes = 3 | Evaluator Steps = 2295 | Learner Steps = 3 | Learner Time Elapsed = 10.930 | Loss = -0.27460673451423645 | Param Norm = 48.73942184448242 | Param Updates Norm = 0.18237090110778809 | Policy Loss = -0.385860800743103
I0925 18:52:26.613179 140563283464576 terminal.py:91] [Actor] Actor Episodes = 3 | Actor Steps = 2490 | Episode Length = 812 | Episode Return = -21.0 | Evaluator Episodes = 3 | Evaluator Steps = 2295 | Learner Steps = 3 | Learner Time Elapsed = 10.930 | Steps Per Second = 95.804
I0925 18:52:33.834763 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 3 | Actor Steps = 2490 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 4 | Evaluator Steps = 3060 | Learner Steps = 3 | Learner Time Elapsed = 10.930 | Steps Per Second = 105.939
I0925 18:52:34.521620 140563283464576 terminal.py:91] [Learner] Actor Episodes = 3 | Actor Steps = 2490 | Critic Loss = 0.09985552728176117 | Entropy = -2.751511812210083 | Entropy Loss = -0.013757558539509773 | Evaluator Episodes = 4 | Evaluator Steps = 3060 | Learner Steps = 4 | Learner Time Elapsed = 11.038 | Loss = -0.14621829986572266 | Param Norm = 48.74060821533203 | Param Updates Norm = 0.1674019694328308 | Policy Loss = -0.232316255569458
I0925 18:52:41.398951 140563283464576 terminal.py:91] [Learner] Actor Episodes = 3 | Actor Steps = 2490 | Critic Loss = 0.03946797549724579 | Entropy = -2.6504082679748535 | Entropy Loss = -0.01325204037129879 | Evaluator Episodes = 4 | Evaluator Steps = 3060 | Learner Steps = 5 | Learner Time Elapsed = 11.150 | Loss = -0.08941790461540222 | Param Norm = 48.74118423461914 | Param Updates Norm = 0.09982965141534805 | Policy Loss = -0.11563383042812347
I0925 18:52:42.866006 140563283464576 terminal.py:91] [Actor] Actor Episodes = 4 | Actor Steps = 3328 | Episode Length = 838 | Episode Return = -20.0 | Evaluator Episodes = 4 | Evaluator Steps = 3060 | Learner Steps = 5 | Learner Time Elapsed = 11.150 | Steps Per Second = 92.794
I0925 18:52:50.691852 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 4 | Actor Steps = 3328 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 5 | Evaluator Steps = 3825 | Learner Steps = 5 | Learner Time Elapsed = 11.150 | Steps Per Second = 97.758
I0925 18:52:55.859575 140563283464576 terminal.py:91] [Learner] Actor Episodes = 4 | Actor Steps = 3328 | Critic Loss = 0.055803023278713226 | Entropy = -2.689145565032959 | Entropy Loss = -0.013445727527141571 | Evaluator Episodes = 5 | Evaluator Steps = 3825 | Learner Steps = 6 | Learner Time Elapsed = 11.260 | Loss = 0.10277498513460159 | Param Norm = 48.74171829223633 | Param Updates Norm = 0.16001644730567932 | Policy Loss = 0.06041767820715904
I0925 18:53:00.496995 140563283464576 terminal.py:91] [Actor] Actor Episodes = 5 | Actor Steps = 4253 | Episode Length = 925 | Episode Return = -20.0 | Evaluator Episodes = 5 | Evaluator Steps = 3825 | Learner Steps = 6 | Learner Time Elapsed = 11.260 | Steps Per Second = 94.342
I0925 18:53:07.796396 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 5 | Actor Steps = 4253 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 6 | Evaluator Steps = 4590 | Learner Steps = 6 | Learner Time Elapsed = 11.260 | Steps Per Second = 104.809
I0925 18:53:09.856697 140563283464576 terminal.py:91] [Learner] Actor Episodes = 5 | Actor Steps = 4253 | Critic Loss = 0.026257839053869247 | Entropy = -2.663625717163086 | Entropy Loss = -0.013318127952516079 | Evaluator Episodes = 6 | Evaluator Steps = 4590 | Learner Steps = 7 | Learner Time Elapsed = 11.371 | Loss = 0.17126703262329102 | Param Norm = 48.741798400878906 | Param Updates Norm = 0.15110263228416443 | Policy Loss = 0.15832732617855072
I0925 18:53:16.609760 140563283464576 terminal.py:91] [Learner] Actor Episodes = 5 | Actor Steps = 4253 | Critic Loss = 0.017853129655122757 | Entropy = -2.6683619022369385 | Entropy Loss = -0.013341809622943401 | Evaluator Episodes = 6 | Evaluator Steps = 4590 | Learner Steps = 8 | Learner Time Elapsed = 11.478 | Loss = 0.02541973441839218 | Param Norm = 48.7418212890625 | Param Updates Norm = 0.07357873767614365 | Policy Loss = 0.02090841718018055
I0925 18:53:18.058462 140563283464576 terminal.py:91] [Actor] Actor Episodes = 6 | Actor Steps = 5230 | Episode Length = 977 | Episode Return = -21.0 | Evaluator Episodes = 6 | Evaluator Steps = 4590 | Learner Steps = 8 | Learner Time Elapsed = 11.478 | Steps Per Second = 95.209
I0925 18:53:25.833591 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 6 | Actor Steps = 5230 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 7 | Evaluator Steps = 5355 | Learner Steps = 8 | Learner Time Elapsed = 11.478 | Steps Per Second = 98.396
I0925 18:53:30.970637 140563283464576 terminal.py:91] [Learner] Actor Episodes = 6 | Actor Steps = 5230 | Critic Loss = 0.017343368381261826 | Entropy = -2.6426947116851807 | Entropy Loss = -0.013213474303483963 | Evaluator Episodes = 7 | Evaluator Steps = 5355 | Learner Steps = 9 | Learner Time Elapsed = 11.579 | Loss = -0.03186701238155365 | Param Norm = 48.74181365966797 | Param Updates Norm = 0.07371831685304642 | Policy Loss = -0.03599691018462181
I0925 18:53:37.760166 140563283464576 terminal.py:91] [Actor] Actor Episodes = 7 | Actor Steps = 6360 | Episode Length = 1130 | Episode Return = -19.0 | Evaluator Episodes = 7 | Evaluator Steps = 5355 | Learner Steps = 9 | Learner Time Elapsed = 11.579 | Steps Per Second = 94.753
I0925 18:53:45.083409 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 7 | Actor Steps = 6360 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 8 | Evaluator Steps = 6120 | Learner Steps = 9 | Learner Time Elapsed = 11.579 | Steps Per Second = 104.469
I0925 18:53:45.203235 140563283464576 terminal.py:91] [Learner] Actor Episodes = 7 | Actor Steps = 6360 | Critic Loss = 0.01546546071767807 | Entropy = -2.646545648574829 | Entropy Loss = -0.013232728466391563 | Evaluator Episodes = 8 | Evaluator Steps = 6120 | Learner Steps = 10 | Learner Time Elapsed = 11.676 | Loss = -0.07646849751472473 | Param Norm = 48.74181365966797 | Param Updates Norm = 0.0658826231956482 | Policy Loss = -0.07870122790336609
I0925 18:53:52.000295 140563283464576 terminal.py:91] [Learner] Actor Episodes = 7 | Actor Steps = 6360 | Critic Loss = 0.03162887319922447 | Entropy = -2.635777473449707 | Entropy Loss = -0.013178886845707893 | Evaluator Episodes = 8 | Evaluator Steps = 6120 | Learner Steps = 11 | Learner Time Elapsed = 11.786 | Loss = -0.08587456494569778 | Param Norm = 48.74217224121094 | Param Updates Norm = 0.13578905165195465 | Policy Loss = -0.10432454943656921
I0925 18:53:53.628688 140563283464576 terminal.py:91] [Actor] Actor Episodes = 8 | Actor Steps = 7153 | Episode Length = 793 | Episode Return = -21.0 | Evaluator Episodes = 8 | Evaluator Steps = 6120 | Learner Steps = 11 | Learner Time Elapsed = 11.786 | Steps Per Second = 92.806
I0925 18:54:01.624238 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 8 | Actor Steps = 7153 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 9 | Evaluator Steps = 6885 | Learner Steps = 11 | Learner Time Elapsed = 11.786 | Steps Per Second = 95.686
I0925 18:54:06.669786 140563283464576 terminal.py:91] [Learner] Actor Episodes = 8 | Actor Steps = 7153 | Critic Loss = 0.038241975009441376 | Entropy = -2.6358108520507812 | Entropy Loss = -0.013179054483771324 | Evaluator Episodes = 9 | Evaluator Steps = 6885 | Learner Steps = 12 | Learner Time Elapsed = 11.897 | Loss = 0.020664770156145096 | Param Norm = 48.742374420166016 | Param Updates Norm = 0.09126840531826019 | Policy Loss = -0.004398146644234657
I0925 18:54:11.105047 140563283464576 terminal.py:91] [Actor] Actor Episodes = 9 | Actor Steps = 8038 | Episode Length = 885 | Episode Return = -21.0 | Evaluator Episodes = 9 | Evaluator Steps = 6885 | Learner Steps = 12 | Learner Time Elapsed = 11.897 | Steps Per Second = 93.353
I0925 18:54:18.346851 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 9 | Actor Steps = 8038 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 10 | Evaluator Steps = 7650 | Learner Steps = 12 | Learner Time Elapsed = 11.897 | Steps Per Second = 105.643
I0925 18:54:20.439688 140563283464576 terminal.py:91] [Learner] Actor Episodes = 9 | Actor Steps = 8038 | Critic Loss = 0.028006331995129585 | Entropy = -2.6305150985717773 | Entropy Loss = -0.013152575120329857 | Evaluator Episodes = 10 | Evaluator Steps = 7650 | Learner Steps = 13 | Learner Time Elapsed = 12.010 | Loss = 0.022677822038531303 | Param Norm = 48.742523193359375 | Param Updates Norm = 0.0655471459031105 | Policy Loss = 0.007824068889021873
I0925 18:54:26.436676 140563283464576 terminal.py:91] [Actor] Actor Episodes = 10 | Actor Steps = 8849 | Episode Length = 811 | Episode Return = -21.0 | Evaluator Episodes = 10 | Evaluator Steps = 7650 | Learner Steps = 13 | Learner Time Elapsed = 12.010 | Steps Per Second = 100.254
I0925 18:54:34.040755 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 10 | Actor Steps = 8849 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 11 | Evaluator Steps = 8415 | Learner Steps = 13 | Learner Time Elapsed = 12.010 | Steps Per Second = 100.609
I0925 18:54:34.543608 140563283464576 terminal.py:91] [Learner] Actor Episodes = 10 | Actor Steps = 8849 | Critic Loss = 0.02078539878129959 | Entropy = -2.614872932434082 | Entropy Loss = -0.013074364513158798 | Evaluator Episodes = 11 | Evaluator Steps = 8415 | Learner Steps = 14 | Learner Time Elapsed = 12.120 | Loss = 0.01250423863530159 | Param Norm = 48.7426643371582 | Param Updates Norm = 0.05942913889884949 | Policy Loss = 0.004793208092451096
I0925 18:54:41.282555 140563283464576 terminal.py:91] [Learner] Actor Episodes = 10 | Actor Steps = 8849 | Critic Loss = 0.023472841829061508 | Entropy = -2.6376442909240723 | Entropy Loss = -0.013188222423195839 | Evaluator Episodes = 11 | Evaluator Steps = 8415 | Learner Steps = 15 | Learner Time Elapsed = 12.234 | Loss = -0.0007724124006927013 | Param Norm = 48.742557525634766 | Param Updates Norm = 0.06477858126163483 | Policy Loss = -0.011057032272219658
I0925 18:54:44.060443 140563283464576 terminal.py:91] [Actor] Actor Episodes = 11 | Actor Steps = 9824 | Episode Length = 975 | Episode Return = -21.0 | Evaluator Episodes = 11 | Evaluator Steps = 8415 | Learner Steps = 15 | Learner Time Elapsed = 12.234 | Steps Per Second = 97.312
I0925 18:54:51.395720 140563283464576 terminal.py:91] [Evaluator] Actor Episodes = 11 | Actor Steps = 9824 | Episode Length = 765 | Episode Return = -21.0 | Evaluator Episodes = 12 | Evaluator Steps = 9180 | Learner Steps = 15 | Learner Time Elapsed = 12.234 | Steps Per Second = 104.296
^C[reverb/cc/platform/default/server.cc:84] Shutting down replay server
E0925 18:55:09.683425 140563283464576 base.py:130] Timeout (10000 ms) exceeded when flushing the writer before deleting it. Caught Reverb exception: Flush call did not complete within provided timeout of 0:00:10

Neofetch

neofetch
             /////////////                costa@pop-os 
         /////////////////////            ------------ 
      ///////*767////////////////         OS: Pop!_OS 21.10 x86_64 
    //////7676767676*//////////////       Kernel: 5.17.5-76051705-generic 
   /////76767//7676767//////////////      Uptime: 45 days, 7 hours, 27 mins 
  /////767676///*76767///////////////     Packages: 3025 (dpkg), 6 (flatpak), 2 (snap) 
 ///////767676///76767.///7676*///////    Shell: zsh 5.8 
/////////767676//76767///767676////////   Terminal: poetry 
//////////76767676767////76767/////////   CPU: AMD Ryzen 9 3900X (24) @ 3.800GHz 
///////////76767676//////7676//////////   GPU: NVIDIA GeForce RTX 3060 Ti 
////////////,7676,///////767///////////   GPU: NVIDIA GeForce RTX 3060 Ti 
/////////////*7676///////76////////////   Memory: 5500MiB / 64237MiB 
///////////////7676////////////////////
 ///////////////7676///767////////////                            
  //////////////////////'////////////                             
   //////.7676767676767676767,//////
    /////767676767676767676767/////
      ///////////////////////////
         /////////////////////
             /////////////

Thank you!

Bobak Shahriari · Answer 1 · Mon Sep 26 2022 17:54:24 GMT+0800 (China Standard Time)

Hi vwxyzjn,

By default the Impala agent is rate limited so that (in expectation) every sampled experience is trained on. This is what the samples_per_insert = 1.0 parameter does. I'm wondering if this is what is limiting you in this case. Can you try reducing it to, e.g., 0.25 to see if it has any effect on your training speed?

Thanks for your interest and question, happy Acming!

Bobak

Costa Huang · Answer 2 · Thu Sep 29 2022 23:52:47 GMT+0800 (China Standard Time)

Hi @bshahr, thanks for the suggestion. I gave it a try, but it did not make a difference. My collaborator suggested maybe this was because the actors only run in CPUs, and therefore the action sampling through resnet could be slow. Do you think this may be the case?

On a related note, were the IMPALA results shown in the new paper generated by run_impala.py? If not, how could I reproduce it?

Btw I really liked how the preliminary and background section are written in the new paper :)

Bobak Shahriari · Answer 3 · Mon Oct 03 2022 17:40:37 GMT+0800 (China Standard Time)

Hi vwxyzjn,

Thanks for the kind words re: the papers!

As for your Impala question, our results were indeed run using the run_impala.py script but in a distributed way, so with the --run_distributed flag. If the number of actors is too high, you could potentially reduce it without impacting speed too much.

Hope this helps reproduce our results!

Bobak

Amine Elameri · Answer 4 · Fri Dec 02 2022 22:21:17 GMT+0800 (China Standard Time)

Hello, I run into the same problem, even with the --run_distributed flag (which is enabled by default in the latest version), I have something like 120 sps, did you find any solution for this ?
Thank you.

Costa Huang · Answer 5 · Mon Dec 05 2022 03:53:27 GMT+0800 (China Standard Time)

Hey @Elameri I did try with --run_distributed but encountered low SPS as well.

@bshahr thank you for your reply. Quick follow-up question — what are the hardware resources used to run the IMPALA experiments? Maybe IMPALA will produce different results depending on the machine configurations.

Bobak Shahriari · Answer 6 · Wed Jan 04 2023 18:30:37 GMT+0800 (China Standard Time)

Hi vwxyzjn and Elameri,

So the --run_distributed will distribute the computation but that'll only help if you have a corresponding amount of compute. Roughly speaking, you should be able to get to 50M actor steps (200M frames) in 2 days if you have 256 actors running on 60 dedicated CPUs and a learner with a dedicated modern GPU (e.g. V100). (If you're running actors and the learner on the same machine, make sure the actors are not using the GPU.)

As for producing different results on different hardware capabilities, we've tried very hard to minimize this. This is why we use Reverb's rate limitation feature in our agents (see the paper for a more detailed discussion). This ensures that you get similar results no matter the relative speeds of your learner/actors.

Hope this helps! I'll close the issue but feel free to reopen.

Bobak

Costa Huang · Answer 7 · Thu Jan 05 2023 01:04:44 GMT+0800 (China Standard Time)

Hi @bshahr, happy new year! Thanks for your response.

I put up a table below comparing ACME's results with the original IMPALA paper. The original IMPALA paper reported finishing 200M frames in under an hour (the shallow model that uses Nature DQN net) and reached a similar level of results as reported in ACME papers. Would you mind looking into the runtime difference?

env	Espeholt et al., 2018 (IMPALA shallow model, "Note that the shallow IMPALA experiment completes training over 200 million frames in less than one hour")	Espeholt et al., 2018 (IMPALA deep model, runtime unspecified, maybe a bit more than 1 hour?)	ACME (20 Sep 2022 arxiv version)	ACME ( Jun 2020 arxiv version)
Asterix	29692.50	300732.00	~200000 ± 50000	-
Breakout	640.43	787.34	~450 ± 50	700 (12 hours)
Mspacman	6501.71	7342.32	2000 ± 300	5000 (12 hours)
SpaceInvaders	1726.28	43595.78	12000 ± 12000	20000 (12 hours)