AttributeError: module 'collections' has no attribute 'Container'

Question

AttributeError: module 'collections' has no attribute 'Container'

rahulxie opened this issue 2 years ago · comments

傻笑 · Answer 1 · Mon Apr 18 2022 16:41:29 GMT+0800 (China Standard Time)

I have met the follow problem when running the train.py in examples folder
Traceback (most recent call last):
File "/home/trojanzoo/examples/train.py", line 34, in
model._train(**trainer)
File "/home/trojanzoo/trojanvision/models/imagemodel.py", line 560, in _train
return super()._train(epochs=epochs, optimizer=optimizer, lr_scheduler=lr_scheduler,
File "/home/trojanzoo/trojanzoo/models.py", line 989, in _train
return train(module=self._model, num_classes=self.num_classes,
File "/home/trojanzoo/trojanzoo/utils/train.py", line 133, in train
loss.backward()
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/_tensor.py", line 363, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/autograd/init.py", line 173, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/autograd/function.py", line 253, in apply
return user_fn(self, *args)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/nn/parallel/_functions.py", line 34, in backward
return (None,) + ReduceAddCoalesced.apply(ctx.input_device, ctx.num_inputs, *grad_outputs)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/nn/parallel/functions.py", line 45, in forward
return comm.reduce_add_coalesced(grads, destination)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/nn/parallel/comm.py", line 143, in reduce_add_coalesced
flat_result = reduce_add(flat_tensors, destination)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/nn/parallel/comm.py", line 96, in reduce_add
nccl.reduce(inputs, output=result, root=root_index)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/cuda/nccl.py", line 72, in reduce
_check_sequence_type(inputs)
File "/home/itl/anaconda3/envs/trojan/lib/python3.10/site-packages/torch/cuda/nccl.py", line 51, in _check_sequence_type
if not isinstance(inputs, collections.Container) or isinstance(inputs, torch.Tensor):
AttributeError: module 'collections' has no attribute 'Container'

Ren Pang · Answer 2 · Sat Apr 23 2022 08:13:45 GMT+0800 (China Standard Time)

Sorry for missing the issue. This seems to be an issue about PyTorch version. Please make sure you are using the most up-to-date version.

傻笑 · Answer 3 · Mon Apr 25 2022 13:51:14 GMT+0800 (China Standard Time)

Sorry for missing the issue. This seems to be an issue about PyTorch version. Please make sure you are using the most up-to-date version.

I'm sure that my PyTorch version 1.11.0, it still don't work.

傻笑 · Answer 4 · Mon Apr 25 2022 13:56:26 GMT+0800 (China Standard Time)

Sorry for missing the issue. This seems to be an issue about PyTorch version. Please make sure you are using the most up-to-date version.

I'm sure that my PyTorch version 1.11.0, it still don't work.

And my python version is 3.10.4

Ren Pang · Answer 5 · Mon Apr 25 2022 22:42:15 GMT+0800 (China Standard Time)

Emmm, that is strange. But it’s obviously the PyTorch and python version issue to import container from collection. I can’t guarantee to solve it since it’s an upstream issue.

Maybe you can refer https://discuss.pytorch.org/t/issues-on-using-nn-dataparallel-with-python-3-10-and-pytorch-1-11/146745

but please note TrojanZoo doesn’t support python 3.9. So you can’t solve it by downgrading.

Ren Pang · Answer 6 · Mon Apr 25 2022 23:01:18 GMT+0800 (China Standard Time)

I just figured it out.
The fixed PR doesn't land on pytorch 1.11.0
pytorch/pytorch#72239

So currently you have 2 workarounds:

Use only 1 GPU to avoid DataParallel usage by setting CUDA_VISIBLE_DEVICES=0
Use a nightly pytorch version that uses collections.abc.Container rather than collections.Container.

傻笑 · Answer 7 · Tue Apr 26 2022 10:53:14 GMT+0800 (China Standard Time)

I just figured it out. The fixed PR doesn't land on pytorch 1.11.0 pytorch/pytorch#72239

So currently you have 2 workarounds:

Use only 1 GPU to avoid DataParallel usage by setting CUDA_VISIBLE_DEVICES=0

Use a nightly pytorch version that uses collections.abc.Container rather than collections.Container.

Thank you for your reply again. I tried the first solution. It did work!