Downloading the FLAN-v2 dataset
taehyunzzz opened this issue · comments
I am having a hard time downloading the flanv2 dataset provided in this repo.
The git clone command just downloads the metadata for each dataset as shown below.
Using huggingface load_dataset downloads a single unified dataset that comes from the P3 dataset (source).
Huggingface CLI method seems to have authentication issues.
I could manually go through the huggingface flanv2 dataset repo and download each dataset lfs, but that is time-consuming.
Is there any graceful way to download the datasets used?
For now, this did the trick. I did not understand how "git lfs pull" is used for the flanv2 dataset.
from huggingface_hub import hf_hub_download, HfApi
api = HfApi()
repo_id = "lorahub/flanv2"
files = api.list_repo_files(repo_id, repo_type="dataset")
for file in files:
if file.split(".")[-1] == "json":
print(file)
hf_hub_download(repo_id=repo_id, filename=file, repo_type="dataset", local_dir="flanv2/")