ClearML does not find all packages
terbed opened this issue · comments
Describe the bug
I cannot reproduce experiments remotely, because the environment is improperly constructed.
The recognized packages:
# Python 3.11.8 (main, Feb 12 2024, 14:50:05) [GCC 13.2.1 20230801]
clearml == 1.15.1
kiwisolver == 1.4.5
lightning == 2.2.0.post0
torch == 2.2.0+cu118
Actual packages in the environment:
numpy==1.26.3
PyYAML==6.0.1
torch==2.2.0+cu118
torchmetrics==1.3.1
torchvision==0.17.0+cu118
tqdm==4.66.2
lightning==2.2.0.post0
lightning[pytorch-extra]
matplotlib
pandas
So the remotely reproduced training fails because torchvision is not installed in the env:
- nvidia-nccl-cu12==2.19.3
- nvidia-nvjitlink-cu12==12.4.127
- nvidia-nvtx-cu12==12.1.105
- orderedmultidict==1.0.1
- packaging==24.0
- pathlib2==2.3.7.post1
- pillow==10.3.0
- platformdirs==4.2.0
- psutil==5.9.8
- PyJWT==2.8.0
- pyparsing==3.1.2
- python-dateutil==2.8.2
- pytorch-lightning==2.2.1
- PyYAML==6.0.1
- referencing==0.34.0
- requests==2.31.0
- rpds-py==0.18.0
- six==1.16.0
- sympy==1.12
- torch==2.2.0+cu121
- torchmetrics==1.3.2
- tqdm==4.66.2
- triton==2.2.0
- typing_extensions==4.11.0
- urllib3==1.26.18
- virtualenv==20.25.1
- yarl==1.9.4
Environment setup completed successfully
Starting Task Execution:
2024-04-11 22:42:01
Traceback (most recent call last):
File "/root/.clearml/venvs-builds/3.10/task_repository/PhaseReconstruction.git/main.py", line 2, in <module>
from src.data import PRDataModule
File "/root/.clearml/venvs-builds/3.10/task_repository/PhaseReconstruction.git/src/data.py", line 3, in <module>
import torchvision.transforms.functional as tvf
ModuleNotFoundError: No module named 'torchvision'
2024-04-11 22:42:01
Process failed, exit code 1
Environment
- Self-hosted
- WebApp: 1.15.0-472 • Server: 1.15.0-472 • API: 2.29
- Python Version 3.11
- Linux Ubuntu 22
Related Discussion
I have a similar issue #1198 which has also not been resolved yet. But in my case I found a very strange solution is to add a line import tmp
in my main.py
to execute, where tmp.py
is an empty file (no code inside it) in the same folder.
Hi @wxdrizzle,
Thanks for linking in your similar issue.
Some updates:
- I realized that the
__init__.py
file was missing in my src folder. Adding this file clearml successfully recognized the torchvision package and some additional missing packages. - It still could not recognize
lightning[pytorch-extra]
package. - @wxdrizzle's workaround does not help on this (that could be related to the missing
__init__.py
)
Hi @terbed @wxdrizzle ! We are using a pigar fork to auto-fetch the requirements. Only top-level imports will be fetched (see faqs): https://github.com/damnever/pigar#faq. Also, note that only packages
will be inspected, so the __init__.py
is mandatory if you wish local files to be inspected.
There are a few ways to specify other packages/other auto-detection machanism:
- Use one of these functions to explicitly mention packages:
- https://clear.ml/docs/latest/docs/references/sdk/task#taskadd_requirements
- https://clear.ml/docs/latest/docs/references/sdk/task#set_packages
- Use https://clear.ml/docs/latest/docs/references/sdk/task#taskforce_requirements_env_freeze to use
pip freeze
orconda list
for package detection (or setsdk.development.detect_with_pip_freeze
ordevelopment.detect_with_conda_freeze
to true inclearml.conf
to achieve the same thing)
Hi @eugen-ajechiloae-clearml,
Thank you for the information, this explains everything! :)
Best wishes,
Daniel
Thank you very much for the detailed explanation!