Salha Salman's repositories
Author-Identification-2
Author Identification Project using NLP methods and Machine Learning
Author-Prediction-Using-Deep-Learning-Models
Six different machine learning models to compare performance accuracies of the training, validation and test sets.
AuthId-RNN
Code for "Authorship Identification using Recurrent Neural Networks"
Author-Identification-using-Text-Snippets
Author identification given multiple short text snippets via using stylometric and lexicographical features.
author-predictor
Author prediction from a paragraph using Deep Neural Nets
AuthorIdentification-SentimentAnalysis-TopicModeling
Project done for the NLP course at Sorbonne University
AuthorshipAttribution-1
Models for text classification and authorship attribution based on the Higher Criticism
DatabaseTools
The class MongoCollection allow an easy config of a MongoDB collection by providing an interface which handle authentication, indexes management, data conversion and pretty print of collections. It can work like a Python dict if you give at least one index.
deeplearning_twitterbotornot
Revisiting Twitter Bot or Not with Deep Learning
DeepStyle
DeepStyle provides pretrained models aiming to project text in a stylometric space. The base project consists in a new method of representation learning and a definition of writing style based on distributional properties. This repository contains datasets, pretrained models and other ressources that were used to train and test models.
h2database
H2 is an embeddable RDBMS written in Java.
HatefulUsersTwitter
Code for the paper "Characterizing and Detecting Hateful Users on Twitter"
NLP-AuthorPrediction-WebApp
Machine Learning application (to predict author of tweets) with its front-end developed using React which interacts with a Flask service as the back-end.
Obfuscation-Detection
Obfuscation detection tool. Given a document, it tells if it has been written by human or altered by an automated authorship obfuscation tool.
RusLit
📚 A small collection of Russian literature 📚
Social-Media-Disinformation-Network-BERT
# Social_Media_Disinformation_Network Twitter is a social networking platform where many political thoughts and views are exchanged between users. Some of the users are, in fact, nation state actors – individuals having close links to the military, intelligence or state control apparatus of their country – who share fake news to engage in espionage, propaganda or disinformation campaigns. Twitter has already identified many of these accounts and banned them from Twitter for violating Twitter policies. Our main goal is to build a classification Natural Language Processing (NLP) model by learning disinformation and fake news patterns from tweets and to classify them either as “Disinformation” or “Others.” This study makes use of state-linked information operations (“IO”) data published by Twitter in June 2020 covering operations attributed to Russia and Turkey. We narrowed our focus to the Turkish and Russian tweets which were involved in a range of manipulative and coordinated activities spreading geopolitical narratives favorable to their respective political parties in Turkey. For our classification model we also incorporated Twitter live stream data from the Twitter archives for the same time period. Using SQL queries, we isolated the 8,392 banned Turkish & Russian accounts from the archived live stream data to create our “Others” category data. Using a Bidirectional Encoder Representation from Transformers (BERT) model, with the “Turkey” & “Russia” information operations and “Others” live stream archive category data for training, we tested this model against archived Twitter tweets for the month time period following the time period of the training data. Our model predicted 43,568 tweets as “Turkey” disinformation out of 411,095 tweets with an accuracy of 89.4%. For the same time period Twitter banned only 26,259 disinformation tweets. Based on our prediction model it appears that Twitter may still be missing 17,309 information operations tweets for that time period, Similarly our model predicted 20,826 tweets as “Russia” disinformation out of 114,416 tweets with an accuracy of 81.79%.
TwitOff
A fun web application comparing and predicting tweet authorship.
twitoff-2
Web application for guessing a Tweet's author
Twitter-Authority
This is a competition involving the construction of a model through the practical and fundamental machine learning of natural language processing using data obtained from Twitter to detect hoax tweets.
Who-Tweeted-That
COMP90051 Project 1: Authorship Attribution with Limited Text on Twitter