librairy pytorch_lightning.utilities.distributed problem
pierre1618 opened this issue · comments
Issue Description
Hi,
After creating the ccrsr
virtual environment and running python3 inference_ccsr.py
, I encountered the following issue:
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'
Environment Information
- pytorch-lightning Version: 2.1.3
- torch Version: 2.0.1
- Python Version: 3.9.18
Resolution
To resolve the issue, I made the following modification in the code:
In CCSR/ldm/models/diffusion/ddpm_ccsr_stage2.py
and /home/pierre/CCSR/ldm/models/diffusion/ddpm_ccsr_stage1.py
, I changed:
from pytorch_lightning.utilities.distributed import rank_zero_only
to:
from pytorch_lightning.utilities.rank_zero import rank_zero_only
This modification allowed me to make it work.
Steps to Reproduce
- Create
ccrsr
virtual environment. - Run
python3 inference_ccsr.py
.
I've encountered an additional error that I suspect might be related to the previously mentioned issue:
OSError: /root/miniconda3/envs/ccsr/lib/python3.9/site-packages/torchaudio/lib/libtorchaudio.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I've also received this error after installing pytorch with: pip3 install torch torchvision torchaudio
from pytorch_lightning.utilities.distributed import rank_zero_only
ModuleNotFoundError: No module named 'pytorch_lightning.utilities.distributed'
Hi, the problem seems to be caused by the version of pytorch_lightning. My environment information: pytorch-lightning Version: 1.4.2, torch Version: 2.0.1+cu118, Python Version: 3.10.10. You can re-install the corresponding version to see if the problem can be resolved.
Hi @csslc
Thanks for your fast reply. I have installed PyTorch Lightning with pip install pytorch-lightning==1.4.2
and PyTorch with conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
in a Python 1.10.10 environment. However, I am still encountering the following error:
ImportError: cannot import name 'get_num_classes' from 'torchmetrics.utilities.data' (/mnt///anaconda3/envs/ccsr/lib/python3.10/site-packages/torchmetrics/utilities/data.py)
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0
Any assistance would be greatly appreciated.
I believe the following steps resolved the issue for me:
pip uninstall torchmetrics
pip install torchmetrics==0.7
The versions of torchmetrics and torchvision are 0.6.0 and 0.15.2+cu118. You can try this setting.
I could resolved the issue successfully by executing the following commands:
pip uninstall pytorch-lightning torch torchvision torchmetrics
pip install torch==2.0.1+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchvision==0.15.2+cu118 -f https://download.pytorch.org/whl/torch_stable.html pip install torchmetrics==0.6.0 pip install pytorch-lightning==1.4.2
pip3 install -U xformers --index-url https://download.pytorch.org/whl/cu118