sail-sg / lorahub

The official repository of paper "LoraHub: Efficient Cross-Task Generalization via Dynamic LoRA Composition".

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Downloading the FLAN-v2 dataset

taehyunzzz opened this issue · comments

I am having a hard time downloading the flanv2 dataset provided in this repo.

The git clone command just downloads the metadata for each dataset as shown below.
image
Using huggingface load_dataset downloads a single unified dataset that comes from the P3 dataset (source).
Huggingface CLI method seems to have authentication issues.

I could manually go through the huggingface flanv2 dataset repo and download each dataset lfs, but that is time-consuming.

Is there any graceful way to download the datasets used?

For now, this did the trick. I did not understand how "git lfs pull" is used for the flanv2 dataset.

from huggingface_hub import hf_hub_download, HfApi
api = HfApi()
repo_id = "lorahub/flanv2"
files = api.list_repo_files(repo_id, repo_type="dataset")

for file in files:
    if file.split(".")[-1] == "json":
        print(file)
        hf_hub_download(repo_id=repo_id, filename=file, repo_type="dataset", local_dir="flanv2/")