marwah2001 / Arabic-Paraphrasing-Benchmark

Arabic paraphrasing benchmark consists of 1010 Arabic sentence pairs with label of similarity and paraphrasing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Arabic_paraphrasing_benchmark

This repository contains a csv file that represents Arabic paraphrasing benchmark

This dataset consists of 1010 Arabic sentence pairs where experts collected the first part in the sentence pairs from different Arabic books and some of these sentences are generated using words from AWSS dataset (Almarsoomi et al. 2013); the second part is transformed from the first part using six transformation rules for Arabic language. The six transformation rules are : permutation, replacement, deletion, addition, expansion and reduction.

Arabic paraphrasing benchmark

To refere to this dataset, use the citation of this article:

 @inproceedings{alian2019towards,
   title={Building Arabic Paraphrasing Benchmark based on Transformation Rules},
   author={Alian, Marwah and Awajan, Arafat and Al-Hasan, Ahmad and Akuzhia, Raeda},
   Journal={ACM Transactions on Asian and Low-Resource Language Information Processing},
   pages={1--17},
   year={2021},
 }

https://dl.acm.org/doi/10.1145/3446770
https://dl.acm.org/doi/abs/10.1145/3368691.3368708

About

Arabic paraphrasing benchmark consists of 1010 Arabic sentence pairs with label of similarity and paraphrasing