takuseno / d3rlpy

An offline deep reinforcement learning library

Home Page:https://takuseno.github.io/d3rlpy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[BUG] examples/distributed_offline_training.py fails when runing with GPUs

wenxuhaskell opened this issue · comments

Describe the bug

First of all, I am not sure whether it is a bug or not.

I was training my own model with DDP enabled which ran into this error. Then I took examples/distributed_offline_training.py and tried to run GPU version of it, i.e., enabling "nccl" and comment out "gloo", as below,

It ran into the same error as I encountered while running my own models.

To Reproduce

Environment: d3rlpy 2.3.0, pytorch 2.0.1, nvidia_nccl_cu11-2.14.3, nvidia_nccl_cu12-2.18.1, python 3.8

Step 1: enable GPU version in examples/distributed_offline_training.py (as below)

......

def main() -> None:
# GPU version:
rank = d3rlpy.distributed.init_process_group("nccl")
#rank = d3rlpy.distributed.init_process_group("gloo")
print(f"Start running on rank={rank}.")
# GPU version:
device = f"cuda:{rank}"
#device = "cpu:0"
print(f"device: {device}")
# setup algorithm
cql = d3rlpy.algos.CQLConfig(
actor_learning_rate=1e-3,
critic_learning_rate=1e-3,
alpha_learning_rate=1e-3,
).create(device=device)

......

step 2: run the command below from a terminal,
(.venv) root@:/home/code/xxx# torchrun --nnodes=1 --nproc_per_node=3 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=localhost:29400 distributed_offline_training.py

step 3: failure as below
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
IndexError: tuple index out of range

The complete error output is added at the bottom.

Expected behavior
Distributed training should work.

Additional context
Complete output including the error.

(.venv) root@:/home/code/xxx# torchrun --nnodes=1 --nproc_per_node=3 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=localhost:29400 distributed_offline_training.py

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Start running on rank=1.
device: cuda:1
Start running on rank=2.
device: cuda:2
Start running on rank=0.
device: cuda:0
2024-01-01 10:42.13 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-01 10:42.13 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.13 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.13 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-01 10:42.13 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-01 10:42.13 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-01 10:42.13 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-01 10:42.13 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-01 10:42.13 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-01 10:42.14 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.14 [info ] Directory is created at d3rlpy_logs/CQL_20240101104214 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.14 [debug ] Building models... distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.14 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-01 10:42.14 [debug ] Building models... distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-01 10:42.14 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-01 10:42.14 [debug ] Building models... distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-01 10:42.17 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-01 10:42.17 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-01 10:42.17 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-01 10:42.20 [info ] Parameters distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-01 10:42.20 [info ] Parameters distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-01 10:42.20 [info ] Parameters distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
Epoch 1/10: 0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 412, in fit
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 412, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 537, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 838, in update
loss = self._impl.inner_update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 537, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 838, in update
loss = self._impl.inner_update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 77, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 175, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha().exp().clamp(0, 1e6)[0][0]
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
IndexError: tuple index out of range
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 77, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 175, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha().exp().clamp(0, 1e6)[0][0]
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
IndexError: tuple index out of range
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 89038 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 89039) of binary: /home/code/xxx/.venv/bin/python
Traceback (most recent call last):
File "/home/code/xxx/.venv/bin/torchrun", line 8, in
sys.exit(main())
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

distributed_offline_training.py FAILED

Failures:
[1]:
time : 2024-01-01_10:42:30
host :
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 89040)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-01-01_10:42:30
host :
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 89039)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

@wenxuhaskell Thanks for reporting this. In my environment, I actually have only one GPU. So I tested the following script and command:

from typing import Dict

import d3rlpy

def main() -> None:
    # GPU version:
    rank = d3rlpy.distributed.init_process_group("nccl")
    print(f"Start running on rank={rank}.")

    # GPU version:
    device = f"cuda:{rank}"

    # setup algorithm
    cql = d3rlpy.algos.CQLConfig(
        actor_learning_rate=1e-3,
        critic_learning_rate=1e-3,
        alpha_learning_rate=1e-3,
    ).create(device=device)

    # prepare dataset
    dataset, env = d3rlpy.datasets.get_pendulum()

    # disable logging on rank != 0 workers
    logger_adapter: d3rlpy.logging.LoggerAdapterFactory
    evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
    if rank == 0:
        evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
        logger_adapter = d3rlpy.logging.FileAdapterFactory()
    else:
        evaluators = {}
        logger_adapter = d3rlpy.logging.NoopAdapterFactory()

    # start training
    cql.fit(
        dataset,
        n_steps=10000,
        n_steps_per_epoch=1000,
        evaluators=evaluators,
        logger_adapter=logger_adapter,
        show_progress=rank == 0,
        enable_ddp=True,
    )

    d3rlpy.distributed.destroy_process_group()


if __name__ == "__main__":
    main()

and execute:

torchrun \
   --nnodes=1 \
   --nproc_per_node=1 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   examples/distributed_offline_training.py

This works in my environment. Could you share what you see with this? If you see a different result when you set --nproc_per_node=3, it'll be another useful datapoint for me to investigate. Thanks!

@takuseno, thanks for your quick support!

I ran your script using the command below,

torchrun \
   --nnodes=1 \
   --nproc_per_node=1 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   examples/distributed_offline_training.py

This worked correctly!

Terminal output:

2024-01-03 08:11.54 [debug ] Building models... distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
2024-01-03 08:11.56 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
2024-01-03 08:11.57 [info ] Parameters distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
Epoch 1/10: 100%|?| 1000/1000 [00:42<00:00, 23.44it/s, critic_loss=20, conservative_loss=19.8, alpha=1
/home/code/xxx/.venv/lib/python3.8/site-packages/gym/utils/passive_env_checker.py:233: DeprecationWarning: np.bool8 is a deprecated alias for np.bool_. (Deprecated NumPy 1.24)
if not isinstance(terminated, (bool, np.bool8)):
2024-01-03 08:12.42 [info ] CQL_20240103081154: epoch=1 step=1000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=1 metrics={'time_sample_batch': 0.012395153760910034, 'time_algorithm_update': 0.030102912187576294, 'critic_loss': 20.15855849933624, 'conservative_loss': 19.929117486953736, 'alpha': 1.9045982695817947, 'actor_loss': 2.203475551314652, 'temp': 0.9521893063187599, 'temp_loss': 1.5981101567745208, 'time_step': 0.04258691430091858, 'environment': -1533.6259341015468} step=1000
2024-01-03 08:12.42 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_1000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 2/10: 100%|?| 1000/1000 [00:39<00:00, 25.34it/s, critic_loss=83.8, conservative_loss=83.7, alpha
2024-01-03 08:13.23 [info ] CQL_20240103081154: epoch=2 step=2000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=2 metrics={'time_sample_batch': 0.011816077709197998, 'time_algorithm_update': 0.027487682580947876, 'critic_loss': 84.58266665267945, 'conservative_loss': 84.43833563613892, 'alpha': 9.096701746702195, 'actor_loss': 6.113099158763886, 'temp': 0.8653962141871452, 'temp_loss': 1.4112674503326417, 'time_step': 0.03938366413116455, 'environment': -1276.2902138216282} step=2000
2024-01-03 08:13.23 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_2000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 3/10: 100%|?| 1000/1000 [00:39<00:00, 25.34it/s, critic_loss=471, conservative_loss=471, alpha=6
2024-01-03 08:14.04 [info ] CQL_20240103081154: epoch=3 step=3000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=3 metrics={'time_sample_batch': 0.011869361400604248, 'time_algorithm_update': 0.027448972702026366, 'critic_loss': 476.25039488220216, 'conservative_loss': 475.9792143096924, 'alpha': 61.736044958114626, 'actor_loss': 9.908996294498444, 'temp': 0.7919914703965187, 'temp_loss': 1.1301461706757545, 'time_step': 0.039396677732467654, 'environment': -591.845528793355} step=3000
2024-01-03 08:14.04 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_3000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 4/10: 100%|?| 1000/1000 [00:39<00:00, 25.62it/s, critic_loss=3.38e+3, conservative_loss=3.38e+3,
2024-01-03 08:14.45 [info ] CQL_20240103081154: epoch=4 step=4000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=4 metrics={'time_sample_batch': 0.011302191495895385, 'time_algorithm_update': 0.02758713960647583, 'critic_loss': 3420.142323059082, 'conservative_loss': 3419.6362731933596, 'alpha': 509.28681718444824, 'actor_loss': 12.969302095890045, 'temp': 0.7301937859654427, 'temp_loss': 0.8906212852597236, 'time_step': 0.03896606421470642, 'environment': -308.5507707269207} step=4000
2024-01-03 08:14.45 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_4000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 5/10: 100%|?| 1000/1000 [00:40<00:00, 24.90it/s, critic_loss=2.76e+4, conservative_loss=2.76e+4,
2024-01-03 08:15.28 [info ] CQL_20240103081154: epoch=5 step=5000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=5 metrics={'time_sample_batch': 0.011790130138397217, 'time_algorithm_update': 0.028200243949890135, 'critic_loss': 28059.851187988283, 'conservative_loss': 28058.72673388672, 'alpha': 4575.802740356446, 'actor_loss': 15.330485224723816, 'temp': 0.67492482483387, 'temp_loss': 0.723234123468399, 'time_step': 0.04007934713363647, 'environment': -264.5921235686523} step=5000
2024-01-03 08:15.28 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_5000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 6/10: 100%|?| 1000/1000 [00:41<00:00, 24.02it/s, critic_loss=2.2e+5, conservative_loss=2.2e+5, a
2024-01-03 08:16.11 [info ] CQL_20240103081154: epoch=6 step=6000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=6 metrics={'time_sample_batch': 0.011838379859924317, 'time_algorithm_update': 0.029622458934783935, 'critic_loss': 221924.761796875, 'conservative_loss': 221922.75696484375, 'alpha': 40970.942463867184, 'actor_loss': 17.77835116481781, 'temp': 0.6240606836676598, 'temp_loss': 0.5938283017277718, 'time_step': 0.041547907114028934, 'environment': -334.650864965825} step=6000
2024-01-03 08:16.11 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_6000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 7/10: 100%|?| 1000/1000 [00:44<00:00, 22.39it/s, critic_loss=1.68e+6, conservative_loss=1.68e+6,
2024-01-03 08:16.58 [info ] CQL_20240103081154: epoch=7 step=7000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=7 metrics={'time_sample_batch': 0.013400982618331909, 'time_algorithm_update': 0.031069987773895264, 'critic_loss': 1704878.5362890626, 'conservative_loss': 1704875.40825, 'alpha': 344611.88071875, 'actor_loss': 19.32239887905121, 'temp': 0.5774227921962738, 'temp_loss': 0.4852319558560848, 'time_step': 0.04456610012054443, 'environment': -355.3276103991667} step=7000
2024-01-03 08:16.58 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_7000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 8/10: 100%|?| 1000/1000 [00:42<00:00, 23.57it/s, critic_loss=4.46e+6, conservative_loss=4.46e+6,
2024-01-03 08:17.43 [info ] CQL_20240103081154: epoch=8 step=8000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=8 metrics={'time_sample_batch': 0.011540153741836548, 'time_algorithm_update': 0.030728346824645997, 'critic_loss': 4436083.412925782, 'conservative_loss': 4436079.434464844, 'alpha': 1009635.9241875, 'actor_loss': 21.114278861045836, 'temp': 0.5341905326843261, 'temp_loss': 0.39875492970645426, 'time_step': 0.04235520720481872, 'environment': -160.24287454195152} step=8000
2024-01-03 08:17.43 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_8000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 9/10: 100%|?| 1000/1000 [02:58<00:00, 5.62it/s, critic_loss=4.16e+6, conservative_loss=4.16e+6,
2024-01-03 08:20.43 [info ] CQL_20240103081154: epoch=9 step=9000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=9 metrics={'time_sample_batch': 0.012782118320465087, 'time_algorithm_update': 0.16487976717948913, 'critic_loss': 4143829.9341894533, 'conservative_loss': 4143825.091899414, 'alpha': 1020683.6875, 'actor_loss': 22.620734875679016, 'temp': 0.49467102319002154, 'temp_loss': 0.32190170960128306, 'time_step': 0.17786574125289917, 'environment': -280.6733508634595} step=9000
2024-01-03 08:20.43 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_9000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)
Epoch 10/10: 100%|?| 1000/1000 [00:40<00:00, 24.79it/s, critic_loss=3.59e+6, conservative_loss=3.59e+6
2024-01-03 08:21.26 [info ] CQL_20240103081154: epoch=10 step=10000 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1) epoch=10 metrics={'time_sample_batch': 0.01116110110282898, 'time_algorithm_update': 0.029050002098083495, 'critic_loss': 3611097.8822402344, 'conservative_loss': 3611091.5501923827, 'alpha': 1020683.6875, 'actor_loss': 24.451555154800413, 'temp': 0.45856087732315065, 'temp_loss': 0.2581713905483484, 'time_step': 0.040280909061431884, 'environment': -243.6361065123586} step=10000
2024-01-03 08:21.26 [info ] Model parameters are saved to d3rlpy_logs/CQL_20240103081154/model_10000.d3 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=1)

I then ran it with --nproc_per_node=3 (as below),

torchrun \
   --nnodes=1 \
   --nproc_per_node=3 \
   --rdzv_id=100 \
   --rdzv_backend=c10d \
   --rdzv_endpoint=localhost:29400 \
   examples/distributed_offline_training.py

It ended up with the similar error as I reported at beginning.

Terminal output:

(.venv) root@xxx:/home/code/xxx# torchrun --nnodes=1 --nproc_per_node=3 --rdzv_id=100 --rdzv_backend=
c10d --rdzv_endpoint=localhost:29400 distributed_offline_training.py

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:

Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.

Start running on rank=0.
device: cuda:0
Start running on rank=2.
device: cuda:2
Start running on rank=1.
device: cuda:1
2024-01-03 08:34.24 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-03 08:34.24 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-03 08:34.24 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-03 08:34.24 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-03 08:34.24 [debug ] Building models... distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] Directory is created at d3rlpy_logs/CQL_20240103083424 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.24 [debug ] Building models... distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.24 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-03 08:34.24 [debug ] Building models... distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-03 08:34.26 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-03 08:34.26 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-03 08:34.26 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-03 08:34.28 [info ] Parameters distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-03 08:34.28 [info ] Parameters distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-03 08:34.28 [info ] Parameters distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
Epoch 1/10: 0%| | 0/1000 [00:00<?, ?it/s]Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 412, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 537, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 838, in update
loss = self._impl.inner_update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 77, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 175, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha().exp().clamp(0, 1e6)[0][0]
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
IndexError: tuple index out of range
Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 412, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 537, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 838, in update
loss = self._impl.inner_update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 77, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 175, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha().exp().clamp(0, 1e6)[0][0]
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1156, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/parallel/distributed.py", line 1110, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0]) # type: ignore[index]
IndexError: tuple index out of range
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 33424 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 33425) of binary: /home/code/xxx/.venv/bin/python
Traceback (most recent call last):
File "/home/code/xxx/.venv/bin/torchrun", line 8, in
sys.exit(main())
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
distributed_offline_training.py FAILED

Failures:
[1]:
time : 2024-01-03_08:34:31
host :
rank : 2 (local_rank: 2)
exitcode : 1 (pid: 33426)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure):
[0]:
time : 2024-01-03_08:34:31
host :
rank : 1 (local_rank: 1)
exitcode : 1 (pid: 33425)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

I also ran it using --nproc_per_node=2 and it led to a similar error.

Please let me know if you want me to do more experiments for logging the behavior, thanks!

@wenxuhaskell Thanks for the test! It seems that forward calls without any arguments such as log_alpha have a problem here. I'll make a fix sometime soon and get back to you when it's done.

@takuseno Thank you!

@wenxuhaskell Sorry, it took time. But, in the latest master branch, I think I've fixed the issue.
9467793

You can try this by using the latest master source.

@takuseno : Thanks. I tried the lastest master branch but it ran into a new error

" File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'parameter'".

However I found a workaround for it (at the bottom of this thread). Please review it.


> root@:/home/code/xxx# torchrun --nnodes=1 --nproc_per_node=3 --rdzv_id=100 --rdzv_backend=c10d --rdzv_endpoint=localhost:29400 distributed_offline_training.py

Terminal output

master_addr is only used for static rdzv_backend and when rdzv_endpoint is not specified.
WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Start running on rank=0.
device: cuda:0
Start running on rank=1.
device: cuda:1
Start running on rank=2.
device: cuda:2
2024-01-08 08:37.31 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-08 08:37.31 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-08 08:37.31 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Signatures have been automatically determined. action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]) reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)])
2024-01-08 08:37.31 [info ] Action-space has been automatically determined. action_space=<ActionSpace.CONTINUOUS: 1> distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Action size has been automatically determined. action_size=1 distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-08 08:37.31 [debug ] Building models... distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-08 08:37.31 [debug ] Building models... distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] dataset info dataset_info=DatasetInfo(observation_signature=Signature(dtype=[dtype('float32')], shape=[(3,)]), action_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), reward_signature=Signature(dtype=[dtype('float32')], shape=[(1,)]), action_space=<ActionSpace.CONTINUOUS: 1>, action_size=1) distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.31 [info ] Directory is created at d3rlpy_logs/CQL_20240108083731 distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.31 [debug ] Building models... distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.32 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3)
2024-01-08 08:37.32 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3)
2024-01-08 08:37.32 [debug ] Models have been built. distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3)
2024-01-08 08:37.33 [info ] Parameters distributed=DistributedWorkerInfo(rank=1, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-08 08:37.33 [info ] Parameters distributed=DistributedWorkerInfo(rank=2, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
2024-01-08 08:37.33 [info ] Parameters distributed=DistributedWorkerInfo(rank=0, backend='nccl', world_size=3) params={'observation_shape': [3], 'action_size': 1, 'config': {'type': 'cql', 'params': {'batch_size': 256, 'gamma': 0.99, 'observation_scaler': {'type': 'none', 'params': {}}, 'action_scaler': {'type': 'none', 'params': {}}, 'reward_scaler': {'type': 'none', 'params': {}}, 'actor_learning_rate': 0.001, 'critic_learning_rate': 0.001, 'temp_learning_rate': 0.0001, 'alpha_learning_rate': 0.001, 'actor_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'critic_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'temp_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'alpha_optim_factory': {'type': 'adam', 'params': {'betas': [0.9, 0.999], 'eps': 1e-08, 'weight_decay': 0, 'amsgrad': False}}, 'actor_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'critic_encoder_factory': {'type': 'default', 'params': {'activation': 'relu', 'use_batch_norm': False, 'dropout_rate': None}}, 'q_func_factory': {'type': 'mean', 'params': {'share_encoder': False}}, 'tau': 0.005, 'n_critics': 2, 'initial_temperature': 1.0, 'initial_alpha': 1.0, 'alpha_threshold': 10.0, 'conservative_weight': 5.0, 'n_action_samples': 10, 'soft_q_backup': False}}}
Epoch 1/10: 0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 400, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 527, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 828, in update
loss = self._impl.update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
return f(self, *args, **kwargs) # type: ignore
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 66, in update
return self.inner_update(batch, grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 81, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 190, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha.parameter.exp().clamp(0, 1e6)[
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'parameter'
Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 400, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 527, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 828, in update
loss = self._impl.update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
return f(self, *args, **kwargs) # type: ignore
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 66, in update
return self.inner_update(batch, grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 81, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 190, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha.parameter.exp().clamp(0, 1e6)[
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'parameter'
Traceback (most recent call last):
File "distributed_offline_training.py", line 62, in
main()
File "distributed_offline_training.py", line 48, in main
cql.fit(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 400, in fit
results = list(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 527, in fitter
loss = self.update(batch)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 828, in update
loss = self._impl.update(torch_batch, self._grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/torch_utility.py", line 365, in wrapper
return f(self, *args, **kwargs) # type: ignore
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/base.py", line 66, in update
return self.inner_update(batch, grad_step)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 118, in inner_update
metrics.update(self.update_critic(batch))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/ddpg_impl.py", line 84, in update_critic
loss = self.compute_critic_loss(batch, q_tpn)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 81, in compute_critic_loss
conservative_loss = self._compute_conservative_loss(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/d3rlpy/algos/qlearning/torch/cql_impl.py", line 190, in _compute_conservative_loss
clipped_alpha = self._modules.log_alpha.parameter.exp().clamp(0, 1e6)[
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1614, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DistributedDataParallel' object has no attribute 'parameter'
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 40649 closing signal SIGTERM
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 40650 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 40648) of binary: /home/code/xxx/.venv/bin/python
Traceback (most recent call last):
File "/home/code/xxx/.venv/bin/torchrun", line 8, in
sys.exit(main())
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper
return f(*args, **kwargs)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 794, in main
run(args)
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
elastic_launch(
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/home/code/xxx/.venv/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

distributed_offline_training.py FAILED

### My workaround


#In cql_impl.py

# change all occurrences of
 
self._modules.log_alpha.parameter.exp()

#to 
self._modules.log_alpha.module.parameter.exp()

# In sac_impl.py,

# change all occurrences of
self._modules.log_temp.parameter

# to
self._modules.log_temp.module.parameter

This workaround would make it work. I think it is because DDP wraps the original torch model.

I only verified it with cql_impl.py and sac_impl.py. But I guess it might also be relevant to other files included in your fix.

@wenxuhaskell Thanks for the check and debugging the problem! I made another change to fix this issue at this latest commit: 7d18d16 . Now, I think that the master branch is ready for the distributed training.

@takuseno I took the master branch and verified it using 3 GPUs. I can confirm that your latest fix solved the reported error and distributed_offline_training.py worked!

One thing though, when I ran distributed_offline_training.py using 3 workers, each worker trained itself on the same complete dataset. For example, if the dataset consists of 3000 records for training, every worker was trained using all 3000 records parallelly. But isn't the case that, in order to facilitate the multi-GPU training, the dataset should usually be divided into non-overlapping partitions so that each worker should be trained using the partitions dedicated to it? PyTorch DDP introduced things like DistributedSampler for this purpose. Maybe d3rlpy already support it?

By the way, the error reported at the beginning of this thread is resolved, so the thread may be closed. Thanks again for your very quick support!

Glad it worked!

For the dataset division, the easiest solution is to simply divide the dataset based on rank.

import d3rlpy

# example
rank = 0
world_size = 3

dataset, env = d3rlpy.datasets.get_d4rl("hopper-expert-v2")

num_episodes = len(dataset.episodes)
num_episodes_per_worker = num_episodes // world_size
start = rank * num_episodes_per_worker
end = (rank + 1) * num_episodes_per_worker
partial_dataset = d3rlpy.dataset.create_infinite_replay_buffer(dataset.episodes[start:end])

As you mentioned, since the initial issue seems resolved, let me close this issue. Feel free to reopen this if there is any further discussion. Also, feel free to open a new issue to discuss another topic.

I appreciate all checks you've done!