pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch

Home Page:https://pytorch.org/text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Wikitext-103 URL is down

albertz opened this issue · comments

URL = "https://s3.amazonaws.com/research.metamind.io/wikitext/wikitext-103-v1.zip"

All links to https://s3.amazonaws.com/research.metamind.io are not working anymore. I get "Access Denied".

For reference, one copy I found is via pardata:
https://github.com/CODAIT/pardata/blob/1d1600ad3eed6894da7dbddc451cd38aa03c770c/tests/schemata/datasets.yaml#L42C21-L42C99
But it's not exactly the same file (tar.gz instead of zip), but it looks like it has the same content (the files: LICENSE.txt README.txt wiki.test.tokens wiki.train.tokens wiki.valid.tokens).

Another copy of the data is on HuggingFace in various forms, for example: https://huggingface.co/datasets/wikitext

Hi Albertz, I faced exactly same issue on torchtext 0.17.2. Have you got a neat solution to this issue? I found datasets from other sources may need adaption 1by1.

I did not found the zip files anywhere. But I was using the tar.gz files instead which I linked above, which seem to contain the same content.