Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

If you find anything useful in this work, please cite our paper:

@inproceedings{xu-koehn-2021-zero,
    title = "Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation",
    author = "Xu, Haoran  and
      Koehn, Philipp",
    booktitle = "Proceedings of the Second Workshop on Domain Adaptation for NLP",
    month = apr,
    year = "2021",
    address = "Kyiv, Ukraine",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.adaptnlp-1.21",
    pages = "204--213",
    abstract = "Linear embedding transformation has been shown to be effective for zero-shot cross-lingual transfer tasks and achieve surprisingly promising results. However, cross-lingual embedding space mapping is usually studied in static word-level embeddings, where a space transformation is derived by aligning representations of translation pairs that are referred from dictionaries. We move further from this line and investigate a contextual embedding alignment approach which is sense-level and dictionary-free. To enhance the quality of the mapping, we also provide a deep view of properties of contextual embeddings, i.e., the anisotropy problem and its solution. Experiments on zero-shot dependency parsing through the concept-shared space built by our embedding transformation substantially outperform state-of-the-art methods using multilingual embeddings.",
}

Prerequisites

First install the virtual environmemt including required packages.

conda create --name clce python=3.7
conda activate clce
pip install -r requirements.txt

Pre-Trained Model and Mapping

To reproduce the number in the paper, please find our pre-trained model and mappings in the following table. Note that the pre-trained model and mappings go through the iterative normalization prepreocessing and in a near-isotropic space.

Pre-Trained Parser

Pre-trained English parser: model.zip

Pre-Trained Cross-Lingual Space Mapping:

Language	word-level mapping	sense-level mapping
es	iter-norm-mean_es-en.th	iter-norm-multi_es-en.th
pt	iter-norm-mean_pt-en.th	iter-norm-multi_pt-en.th
ro	iter-norm-mean_ro-en.th	iter-norm-multi_ro-en.th
pl	iter-norm-mean_pl-en.th	iter-norm-multi_pl-en.th
fi	iter-norm-mean_fi-en.th	iter-norm-multi_fi-en.th
el	iter-norm-mean_el-en.th	iter-norm-multi_el-en.th

Data

The zero-shot depdency parsing task is evaluated on Universal Dependencies treebank 2.6, which is available for free download.

Zero-Shot Dependency Parsing

Before using English pre-trained model to parse treebanks in other languages, you have to point out the path for the pre-trained model, pre-trained mappings, and treebanks in evaluate.sh. After that, you can easily run:

./evaluate.sh lang   # e.g., ./evaluate.sh fi

If you want to train a English parser yourself

You may need to change the path location for the train and dev dataset in the config file allen_configs/enbert_IN.jsonnet.

allennlp train allen_configs/enbert_IN.jsonnet -s PATH/TO/STORE/MODEL  --include-package src

If you want to derive your own cross-lingual mappings

Please follow the instrcution here.

poaboagye / ZeroShot-CrossLing-Parsing