rogervaas / xtreme_demo

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark

Step to download and configure XTREME

  1. Clone the XTREME Repo
git clone https://github.com/google-research/xtreme.git
  1. Install the XTREME tools
    This step is really only needed if you are running experiements on XTREME, ie to evaluate your models. We will be just looking through the XTREME data to better understand its layout so we do not need most of the tools installed here. But its best to follow the steps as outlined in the XTREME repo so it is worth running this script.
    You can refer to the XTREME repo for more details on these steps.
cd xtreme
bash install_tools.sh
  1. Install dependencies
    XTREME has a few dependencies you will need to use their datasets.
    Check out their repo for the full list.
    I just needed to install the transofrmers library but you may need some more
pip install transformers
  1. Manually download Panx
    There is one dataset you need to manually download
    You then need to manually download panx_dataset (for NER) manually so
  • Create a download folder with mkdir -p download in the root of this project
  • Manually download the dataset from here.
    This will download as AmazonPhotos.zip and make sure this zip file is in the download directory within the XTREME repo.
  1. Download the remaining datasets
    And finally, run this in the root of the project to download the remaining datasets.
bash scripts/download_data.sh

Troubleshooting

If you have any issues you should try and download the individual dataseset and see what the issue is.
In the download script you can see the tasks called at the end:

download_xnli
download_pawsx
download_tatoeba
download_bucc18
download_squad
download_xquad
download_mlqa
download_tydiqa
download_udpos
download_panx

So try working through these one by one and see where the issues is.

About


Languages

Language:Jupyter Notebook 100.0%