Unable to use StorageManager to cache files on NFS storage
mkurczew opened this issue · comments
Describe the bug
I am unable to use StorageManager to download and cache data from mounted NFS storage.
My use case:
I have a lot of data stored on quite slow NFS storage mounted under /mnt/xyz
,
I am using NFS to store datasets, because I manage them with our in-house tools and need them accessible on a per-file basis (I can't use Clearml Datasets because it stores files in chunks)
I would like to leverage local dataset caching by using StorageManager.download_folder
however it doesn't seem to download anything, even though it returns path to local cache where the files should be downloaded.
When I use StorageManager.download_files()
it just returns-back the NFS path, because it thinks the files are local and it skips download.
To reproduce
- Remove/comment the line:
{ url: "file://*" } # file-urls are always directly referenced
in my clearml.conf undersdk.storage.direct_access
- Open python terminal and try to download the directory (it has 700MiB):
from clearml import StorageManager
StorageManager.download_folder("/mnt/xyz/dataset_y")
download_folder()
will return my local cache path `~/.clearml/cache/storage_manager/global but no data is there, nothing was downloaded.
Expected behaviour
I expected the files to be copied from NFS share and locally cached.
Environment
- Server type - self hosted
- ClearML SDK Version - 1.14.4
- ClearML Server Version - 1.15.0-472
- Python Version - 3.11.8
- OS - Linux (ubuntu 22.04)
Related Discussion
If this continues a slack thread, please provide a link to the original slack thread.
@mkurczew Thanks for pointing this out - We'll take a look.