There are 1 repository under text-data topic.
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/
Cleans Reddit Text Data :scroll: :broom:
Question Classification for the dataset CogComp QC Dataset - [ http://cogcomp.org/Data/QA/QC/ ].
Presents an optimized Apache Beam pipeline for generating sentence embeddings (runnable on Cloud Dataflow).
Old book pages (with groundtruth), formerly used for OCR studies. There are several versions of the set (concerning resolution and binarization). Noised and denoised sets (done by several methods) are eventually going to be uploaded.
How Will Your Tweet Be Received? Predicting theSentiment Polarity of Tweet Replies
Scrape EDGAR filings from https://www.sec.gov/
A dataset which contains 30k+ so called "self-help" tweets from 100+ authors.
For reading from and writing to parallel data files in Python
Dataset of League of Legends Voice Lines
A machine learning model that predicts tags for a given question and body.
A tutorial on using regular expressions in R
The aim of this work is to predict number of instagram likes. The text vectorization is done using TF-IDF Vectorizer.
Can you spot automatically generated scientific excerpts?
Analysis of text data by extracting the main topics from airbnb dataset using Latent Dirichlet Allocation (LDA) and then Linear Regression to interpret the topics.
Rank 3/85 MachineHack
Rank 16/98 MachineHack
Applying NLP techniques on WhatsApp text to gain insights.
13-Modules-Entity-Name-Single-sentence-Annotation-Data
13000000-Groups-Man-Machine-Conversation-Interactive-Text-Data
28237-Intent-type-single-sentence-annotation-data
80000-sets-Multi-domain-Customer-Service-Dialogue-Text-Data
8178-Chinese-Social-Comments-Events-Annotation-Data