I can't load data from Huggingface
deliciouscat opened this issue · comments
I'd run train commend :
python -m tevatron.driver.train --output_dir ./retriever_model_s1 --model_name_or_path Luyu/co-condenser-marco --save_steps 10000 --dataset_name Tevatron/msmarco-passage-corpus --train_dir ./marco/bert/train --fp16 --per_device_train_batch_size 8 --learning_rate 5e-6 --num_train_epochs 3 --dataloader_num_workers 2
but there are issue in loading datasets from Huggingface :
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/datasets/Tevatron/msmarco-passage-corpus/resolve/main/marco/bert/train
There are no data in that url, so I want to ask if there are alternative routes to get MSMARCO data.