iWAN Research Group's repositories
ArabicSurvey
مستودع الأوراق المسحية في معالجة اللغة العربية (أسبر) A Repository for survey and review papers in Arabic Natural Language processing (ANLP).
Arabic-Topic-Modeling
BERT for Arabic Topic Modeling: An Experimental Study on BERTopic Technique
Saudi-Dialect-Irony-Dataset
The Saudi irony dataset was collected using Twitter API and it consists of 19,810 tweets, 8,089 of them are labeled as ironic tweets
Arabic-Humor
The Arabic humor dataset was collected using Twint and Sketch Engine and it consists of 10k tweets.
NLP-Patents
A repository for Patents in the field of Natural Language Processing (NLP).
Saudi-Bank-Sentiment-Dataset
This dataset contains customers’ sentiments on Twitter toward four Saudi Banks. A total of 12k tweets 8,669 of them is labeled as "Negative", 2,143 is labeled as "Positive", and 1,236 tweets is labeled as "Neutral".
Arabic-Paraphrased-Dataset
The Arabic paraphrased parallel dataset, sourced from diverse origins and expanded through data augmentation, is invaluable in NLP. It aids education, boosts search engines, supports content creation, aids social media and domain-specific applications, and advances language technology.
CLEANANERCorp
CLEANANERCorp, a corrected version of the classic Arabic NER benchmark ANERcorp with updated and more consistent NER labels
OpenTriviaQA
A creative commons dataset of trivia questions and answers
Saudi_Privacy_policy
Saudi Arabic Privacy Policy Dataset