texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.

Home Page:http://tevatron.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

I can't load data from Huggingface

deliciouscat opened this issue · comments

I'd run train commend :
python -m tevatron.driver.train --output_dir ./retriever_model_s1 --model_name_or_path Luyu/co-condenser-marco --save_steps 10000 --dataset_name Tevatron/msmarco-passage-corpus --train_dir ./marco/bert/train --fp16 --per_device_train_batch_size 8 --learning_rate 5e-6 --num_train_epochs 3 --dataloader_num_workers 2

but there are issue in loading datasets from Huggingface :
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/datasets/Tevatron/msmarco-passage-corpus/resolve/main/marco/bert/train

There are no data in that url, so I want to ask if there are alternative routes to get MSMARCO data.