linuxscout / mishkal

Mishkal is an arabic text vocalization software

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Comparison to Tashkeela-Model for diacritics

Kentoseth opened this issue · comments

Salam,

How does the diacritization of mishkal compare to: Tashkeela-Model

That project is using training data that you made from Kaggle: Tashkeela

Is the method you used here for diacritics more accurate than that trained model?

I was able to find the answer here:

"Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation", Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh and Mahmoud Al-Ayyoub.

Salam,
Mishkal uses Tashkeela Data set as an evaluation set only, Mishkal is rule based and not a Machine learning based. Mishkal is built on a stuff of libraries for Arabic langauge processing, which provide more tools and resources such as:

  • Stemmer: Tashaphyne
  • Morphology analyzer: Qalsadi
  • Tashkeela corpus
  • Qutrub verb conjugatoon
  • Arramooz dictionary
  • etc...

Tashkeela is made by us in order to help the use of diacritized texts in machine learning training or test.

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems

We used Tashkeela to develop another product ML based named Shakkala .
Thanks