Metadata download error - OSError: Consistency check failed
ch-shin opened this issue · comments
Hi, team!
I am trying to download the medium-scale dataset of the filtering track, but I keep failing with the following error.
OSError: Consistency check failed: file should be of size 122218957 but has size 56690589 ((…)f11adbfc933c.parquet).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
It seems related to this issue huggingface/huggingface_hub#1498
Is there any bypass for downloading metadata, without using huggingface_hub?
Thanks.
Hi @ch-shin, do you also get this error with force_download=True, resume_download=False
? In the issue you linked it seems this could also be due to running out of storage, do you have enough? Alternatively, have you tried using snapshot_download
?
- It looks like
snapshot_download
is used indownload_upstream.py
by default, right? - I got the same error with
force_download=True, resume_download=False
as input arguments in snapshot_download. - Yes, storage is enough.
Oh, I found they fixed the force_download
flag very recently (huggingface/huggingface_hub#1549 (comment)). I will check it out and let you know how it goes 😇.
Hi @ch-shin sorry you're experiencing this issue. Maintainer of huggingface_hub
here. Which version of huggingface_hub
are you using? If the error is still happening, it would be good to update to latest release (0.16.4
) and retry. To be honest, we are actively tracking down this issue but we haven't got a reliable way to trigger it which makes it very hard to debug (I personally never experienced it, even after a lot of attempts 😕)
@Wauplin Hi! Thank you for the follow-up on this. I updated it to 0.17.0.dev0
and still got the same error. And if I put force_download=True, resume_download=False
option, I get the following error.
ValueError(
"We have no connection or you passed local_files_only, so force_download is not an accepted option."
)
from https://github.com/huggingface/huggingface_hub/blob/2940a65b22e9552b0dd40f0b61f502f66896d46d/src/huggingface_hub/file_download.py#L1253
I guess it happens when network bandwidth is not enough while downloading big files, losing etag. (but somehow proceed with some exception handlings, and then later make consistency check failure? I don't know 😇 )
@ch-shin Thanks for your feedback. Would you have time for another test? If possible, can you install huggingface_hub from this PR (huggingface/huggingface_hub#1561). It will not solve the error but the stacktrace will be more furnished.
To install it:
pip install githttps://github.com/jiamings/huggingface_hub@main
Then retry your failing script (btw, which file from which repo are you downloading?) and copy-paste the full error stacktrace printed in your terminal. Both with and without force_download
. Thanks a lot in advance!
@Wauplin Sorry that I missed your comment 😓. Actually, I just upgraded my internet (25mbps --> 500mbps) and the problem has gone.
we are also experiencing this bug in our company and have huggingface_hub 0.16.4 installed