Lavine24 / ArbEngVec

Arabic-English Cross-Lingual Word Embedding Model(包含阿拉伯语预处理情况)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ArbEngVec is an open source ArbEngVec which provides several Arabic-English cross-lingual word embedding models. To train our bilingual models, we use a large dataset with more than 93 million pairs of Arabic-English parallel sentences mainly extracted from the Open Parallel Corpus Project (OPUS) (Tiedemann, 2012).

These scripts were used for training ArbEngVec variants. The used Arabic preprocessing is also provided alongside the different alignment methods. The used Arabic preprocessing is also provided alongside the different alignment methods.

To change alignment method before training, it is required to change the alignment function used while appending sentences to the trained documents list.

For further reading see full paper: https://hal.archives-ouvertes.fr/hal-02150003/file/Lachraf-el-al-WANLP.pdf

In further research usage of this script please use this citation:

@inproceedings{lachraf:hal-02150003, TITLE = {{ArbEngVec : Arabic-English Cross-Lingual Word Embedding Model}}, AUTHOR = {Lachraf, Raki; Nagoudi, El Moatez Billah; Ayachi, Youcef; Abdelali, Ahmed; Schwab, Didier}, URL = {https://hal.archives-ouvertes.fr/hal-02150003}, BOOKTITLE = {{The Fourth Arabic Natural Language Processing Workshop, co-located with ACL}}, ADDRESS = {Florence, Italy}, YEAR = {2019}, MONTH = Jul, PDF = {https://hal.archives-ouvertes.fr/hal-02150003/file/Lachraf-el-al-WANLP.pdf}, HAL_ID = {hal-02150003}, HAL_VERSION = {v1}, }

About

Arabic-English Cross-Lingual Word Embedding Model(包含阿拉伯语预处理情况)


Languages

Language:Python 100.0%