Comparison to Tashkeela-Model for diacritics
Kentoseth opened this issue · comments
Salam,
How does the diacritization of mishkal compare to: Tashkeela-Model
That project is using training data that you made from Kaggle: Tashkeela
Is the method you used here for diacritics more accurate than that trained model?
I was able to find the answer here:
"Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation", Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh and Mahmoud Al-Ayyoub.
Salam,
Mishkal uses Tashkeela Data set as an evaluation set only, Mishkal is rule based and not a Machine learning based. Mishkal is built on a stuff of libraries for Arabic langauge processing, which provide more tools and resources such as:
- Stemmer: Tashaphyne
- Morphology analyzer: Qalsadi
- Tashkeela corpus
- Qutrub verb conjugatoon
- Arramooz dictionary
- etc...
Tashkeela is made by us in order to help the use of diacritized texts in machine learning training or test.
Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems
We used Tashkeela to develop another product ML based named Shakkala .
Thanks