pull: "Fetching" step takes forever
zhf231298 opened this issue · comments
pull: "Fetching" takes forever
Description
Since the update to the version 3.45, dvc pull
started to spend a massive amount of time for "Fetching".
Can't tell precisely what is the reason, but at least the computation of the md5 of a large file is done repetitively within different dvc pull
executions, even though it is stated that the computation is done only once.
Reproduce
- dvc pull
Expected
The "Fetching" should last very short, which is the situation that I have from another device where DVC 3.38.1 is being used.
Environment information
Problematic environment:
- OS: macOS Sonoma 14.3
- DVC: 3.45.0 (brew)
- Remote storage: S3 bucket
Properly working environment:
- OS: Ubuntu 22.04.3 LTS
- DVC: 3.38.1 (pip)
- Remote storage: S3 bucket (the same of before)
Output of dvc doctor
:
$ dvc doctor
DVC version: 3.45.0 (brew)
--------------------------
Platform: Python 3.12.2 on macOS-14.3-arm64-arm-64bit
Subprojects:
dvc_data = 3.13.0
dvc_objects = 5.0.0
dvc_render = 1.0.1
dvc_task = 0.3.0
scmrepo = 3.1.0
Supports:
azure (adlfs = 2024.2.0, knack = 0.11.0, azure-identity = 1.15.0),
gdrive (pydrive2 = 1.19.0),
gs (gcsfs = 2024.2.0),
http (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
https (aiohttp = 3.9.3, aiohttp-retry = 2.8.3),
oss (ossfs = 2023.12.0),
s3 (s3fs = 2024.2.0, boto3 = 1.34.34),
ssh (sshfs = 2023.10.0),
webdav (webdav4 = 0.9.8),
webdavs (webdav4 = 0.9.8),
webhdfs (fsspec = 2024.2.0)
Config:
Global: /Users/zhf231298/Library/Application Support/dvc
System: /opt/homebrew/share/dvc
Could you also share dvc config -l
?
Could you also share
dvc config -l
?
Sure, the output of dvc config -l
is:
remote.s3-bucket.url=s3://bucket-name
remote.s3-bucket.version_aware=true
core.autostage=true
core.remote=s3-bucket
The bucket name here has been substituted by a dummy name.
Confirmed this is slow for version-aware remotes, although it seems like cache remotes are not impacted.