qcri / dialectal_arabic_pos_tagger

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dialectal Arabic POS Tagger

Dialectal Arabic POS Tagger is a freeware module developed by the ALT team at Qatar Computing Research Institute (QCRI) to process Dialectal Arabic. The tagger was trained on a collection of dialectal Arabic tweets collected from frour regions - Egypt, Gulf, Maghrib and Levantine.

Arabic Dialects POS Tagger implemented using Keras/BiLSTM/ChainCRF.

Requirements

The tagger requires the following packages:

Installation

You can install the Dialectal Arabic POS Tagger by cloning the repo:

Installing Dialectal Arabic POS Tagger from github

Clone the repo from the github using the following command:

git clone https://github.com/qcri/dialectal_arabic_pos_tagger

Or download the compressed file of the project, extract it.

Getting started

Dialectal Arabic POS Tagger reads an input Arabic text file and produces the POS tags, one segment per line. The tagger expects the input file encoded in UTF-8,

python arabic_pos_tagger.py -i [in-file] -o [out-file] 

using a specific model:

python arabic_pos_tagger.py -m [model-dir] -i [in-file] -o [out-file] 

For more details see:

python arabic_pos_tagger.py -h

Publications

Randah Alharbi, Walid Magdy, Kareem Darwish, Ahmed Abdelali and Hamdy Mubarak. (2018) Part-of-Speech Tagging for Arabic Gulf Dialect Using Bi-LSTM. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). May 7-12, 2018. Miyazaki, Japan. Pages 3925-3932.

Kareem Darwish, Hamdy Mubarak, Ahmed Abdelali, Mohamed Eldesouki, Younes Samih, Randah Alharbi, Mohammed Attia, Walid Magdy and Laura Kallmeyer. (2018) Multi-Dialect Arabic POS Tagging: A CRF Approach. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). May 7-12, 2018. Miyazaki, Japan. Pages 93-98.

Support

You can ask questions and join the development discussion:

You can also post bug reports and feature requests (only) in Github issues. Make sure to read our guidelines first.

License

Dialectal Arabic POS Tagger is covered by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.


About


Languages

Language:Python 100.0%