BestActionNow / SemiSupBLI

The implementation of the paper accepted by EMNLP2020

Semi-Supervised Bilingual Lexicon Induction with Two-Way Message Passing Mechanisms

In this repository, We present the implementation of our two poposed semi-supervised approches CSS and PSS for BLI.

Dependencies

python 3.7
Pytorch
Numpy
Faiss

How to get the datasets

You need to download the MUSE dataset from here to the ./muse_data directory.

You need to download the VecMap dataset from here to the ./vecmap_data directory.

How to run

You can run the following command to evaluate CSS on the MUSE dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-CSS-muse-en-es-5kall.yaml

You can run the following command to evaluate PSS on the VecMap dataset with "5k all" annotated lexicon:

python main.py --config_file ./configs/config-PSS-vecmap-en-es-5kall.yaml

Configuration

Then we briefly discribe some important fields in the configuration file:

"method"" specifies the model to evaludate. "CSSBli" for CSS or "PSSBli" for PSS.
"src" and "tgt" indicate the source and target languages of BLI task.
"data_params/data_dir" specifies which dataset to use where "./muse_data/" for MUSE or "./vecmap_data/" for VevMap.
"supervised/max_count" indicates the size of annotated lexicon where "-1" for "5k all", "100" for "100 unique" and "5000" for "5000 unique".

Other fields specify the hyperparameters for CSS and PSS.

About

The implementation of the paper accepted by EMNLP2020

Languages

Language:Python 100.0%