Comparison to Tashkeela-Model for diacritics

Question

Comparison to Tashkeela-Model for diacritics

Kentoseth opened this issue 3 years ago · comments

Mohamed H. commented 3 years ago

Salam,

How does the diacritization of mishkal compare to: Tashkeela-Model

That project is using training data that you made from Kaggle: Tashkeela

Is the method you used here for diacritics more accurate than that trained model?

Mohamed H. · Answer 1 · Wed Jun 30 2021 02:57:29 GMT+0800 (China Standard Time)

I was able to find the answer here:

"Neural Arabic Text Diacritization: State of the Art Results and a Novel Approach for Machine Translation", Ali Fadel, Ibraheem Tuffaha, Bara' Al-Jawarneh and Mahmoud Al-Ayyoub.

Taha Zerrouki (طه زروقي ) · Answer 2 · Wed Jun 30 2021 06:09:06 GMT+0800 (China Standard Time)

Salam,
Mishkal uses Tashkeela Data set as an evaluation set only, Mishkal is rule based and not a Machine learning based. Mishkal is built on a stuff of libraries for Arabic langauge processing, which provide more tools and resources such as:

Stemmer: Tashaphyne
Morphology analyzer: Qalsadi
Tashkeela corpus
Qutrub verb conjugatoon
Arramooz dictionary
etc...

Tashkeela is made by us in order to help the use of diacritized texts in machine learning training or test.

Tashkeela: Novel corpus of Arabic vocalized texts, data for auto-diacritization systems

We used Tashkeela to develop another product ML based named Shakkala .
Thanks