Maha-J-Althobaiti / AraNLP

A Java-based Library for the Processing of Arabic Text

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AraNLP

AraNLP library is a Java-based toolkit for the processing of Arabic text. It supports the most important preprocessing steps, such as diacritic and punctuation removal, tokenization, sentence segmentation, part-of-speech tagging, root stemming, light stemming, and word segmentation. These tools are usually required to prepare the text for more advanced NLP tasks.

The goal of AraNLP is to gather most of the vital Arabic text preprocessing tools into one library that can be accessed easily. Therefore, We incorporated missing tools and included existing algorithmic resources.

AraNLP has already been used in many experiments to prepare the Arabic text and it successfully preprocessed the corpus.

Paper

Available in http://www.lrec-conf.org/proceedings/lrec2014/pdf/621_Paper.pdf.

Citation

Please cite our paper in any published work using this resource:

@inproceedings{Althobaiti14AraNLP,
  title={{AraNLP: a Java-Based Library for the Processing of Arabic Text}},  
  author={M. Althobaiti and U. Kruschwitz and M. Poesio},
  booktitle={Proceedings of the 9th Language Resources and Evaluation Conference (LREC)},
  year={2014},
  address = {Reykjavik}
}

About

A Java-based Library for the Processing of Arabic Text

License:GNU General Public License v3.0


Languages

Language:Java 100.0%