n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Download music/books data in german version

suryapa1 opened this issue · comments

prepare_cls.py:

Could you share public URL to fetch cls books/music in german version please ??

def fetch_cls(url_prefix, cls_path="data/cls"):
""" Fetch CLS from server using basic auth
url_prefix should point to CLS stored as follow
"https://user:passwd@server/path/[en|fr|de|jp]/[dvd|music|books].[test|train|unlabeled].csv"
data/cls/de-music/models/sp15k
"""
def fetch(url, CLS):
CLS.parent.mkdir(parents=True, exist_ok=True)
print("fetching", url, CLS)
urllib.request.urlretrieve(url, CLS)
for code in lang_codes:
for category in [ 'music']:
dir = Path(cls_path)/f'{code}-{category}'
fetch(f"{url_prefix}/{code}/{category}/train.csv", dir / f"{code}.train.csv")
fetch(f"{url_prefix}/{code}/{category}/test.csv", dir / f"{code}.test.csv")
fetch(f"{url_prefix}/{code}/{category}/unlabeled.csv", dir / f"{code}.unsup.csv")

if name == "main":
fire.Fire(fetch_cls)