Jessica Martin's starred repositories
corex_topic
Hierarchical unsupervised and semi-supervised topic models for sparse count data with CorEx
topic_modelling_demo
A workflow for CorEx-based topic modeling
nlp-resources
Natural language processing resources for multiple languages, with an eye towards use for digital humanities.
twitter-protest-analysis
Analysis of 10 million tweets
metis-project4
Investigating the impact of Twitter Bots on the 2020 U.S. Presidential Election's Twitter Discourse
Twitter_NLP
Metis NLP project on Twitter Customer Service data
tweet-clustering
Clustering analysis of one million tweets using scikit-learn, including basic benchmarking of various clustering algorithms
COVID-19-Arabic-Tweets-Dataset
The repository contains a collection of Arabic tweets IDs associated with the novel coronavirus COVID-19. The dataset contains Tweets' ids from 2020-01-01 to 2020-04-30. The Twitter search API was used to gather real-time tweets that contained specific keywords in the Arabic language. The dataset contains almost four millions and half Arabic tweets.
WIDH_2020_Arabic_Text_Analysis
Material for the Text Analysis of Arabic course taught at the NYU Abu Dhabi Winter Institute in Digital Humanities 2020.
arabic-stop-words
Largest list of Arabic stop words on Github. أكبر قائمة لمستبعدات الفهرسة العربية على جيت هاب
dldiy-practicals
Slides, Jupyter Notebooks and scripts for the Deep Learning: Do-It-Yourself! lectures at ENS
Topic-Modeling-of-Tweets-Related-to-NFL-and-National-Anthem
My fourth project that I completed at Metis uses topic modeling to detect structure in tweets related to the nfl and national anthem.
arabic_word_embeddings_CNN
Word Embeddings and Convolutional Neural Network for Arabic Sentiment Classification (Coling 2016)
Arabic-Image-Captioning
Generate Arabic captions for images using Deep Learning
Arabic-Image-Captioning
Generate Arabic captions for images using Deep Learning
Arabic-Empathetic-Chatbot
Seq2Seq-based open domain empathetic conversational model for Arabic: Dataset & Model
Arabic-named-entity-recognition
Arabic named entity recognition using AnerCorp corpus (location , organisation, person, Miscellaneous Word)
document_cluster
A guide to document clustering in Python
Text-Scraping-Document-Clustering-Topic-modeling
The objective of this project is to scrape a corpus of news articles from a set of web pages, pre-process the corpus, and then to apply unsupervised clustering algorithms to explore and summarise the contents of the corpus. Part 1. Text Data Scraping This part of the project should be implemented as a Python script 1. Identify the URLs for all news articles listed on the website: http://mlg.ucd.ie/modules/COMP41680/news/index.html 2. Retrieve all web pages corresponding to these article URLs. 3. From the web pages, extract the main body text containing the content of each news article. Save the body of each article as plain text. Part 2. Corpus Exploration Tasks to be completed in your IPython notebook: 1. Load the text corpus generated in Part 1. Apply any appropriate pre-processing steps and construct a document-term matrix representation of the corpus. 2. Summarise the overall corpus by identifying the most characteristic terms and phrases in the corpus. 3. Apply two alternative clustering algorithms of your choice to the document-term matrix to produce clusters of related documents. This might require applying each algorithm several times with different parameter values. 4. For each clustering generated in Step 3, summarise the contents of the clusters. Based on your summary, suggest a topic/theme for each cluster.
04_biden_election_tweets_NLP
METIS PROJECT 4: NATURAL LANGUAGE PROCESSING & UNSUPERVISED LEARNING // Skills: NLTK, Sci-kit Learn NLP libraries (TF-IDF vectorizer, K-means clustering, PCA, t-SNE), Wordcloud library
nlp-in-python-tutorial
comparing stand up comedians using natural language processing
gt-nlp-class
Course materials for Georgia Tech CS 4650 and 7650, "Natural Language"
open-data-registry
A registry of publicly available datasets on AWS