Redwing Brands's repositories
booknlp
BookNLP, a natural language processing pipeline for books
ClipShots
ClipShots is the first large-scale dataset for shot boundary detection collected from Youtube and Weibo covering more than 20 categories, including sports, TV shows, animals, etc.
gutenberg-dialog
Build a dialog dataset from online books in many languages
ScriptWriter
ACL 2020: ScriptWriter: Narrative-Guided Script Generation
bookcorpus
Crawl BookCorpus
C4_200M-synthetic-dataset-for-grammatical-error-correction
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
CondensedMovies
Story-Based Retrieval with Contextual Embeddings. Largest freely available movie video dataset.
CPM-Generate
Chinese Pre-Trained Language Models (CPM-LM) Version-I
ctrl
Conditional Transformer Language Model for Controllable Generation
data
Interesting datasets for personal projects or submissions to #TidyTuesday
DialoGPT
Large-scale pretraining for dialogue
doccano
Open source text annotation tool for machine learning practitioner.
FacEval
EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization
Genre-Based-Story-Generator
A web application that generates stories based on genres. Created by fine-tuning GPT2 on genre-based stories.
google_books_crawler
Python crawler for getting books' metadata from the Google Books API using asyncio and aiohttp
mica-riskybehavior-identification
Code companion to Joint Estimation and Analysis of Risk Behavior Ratings in Movie Scripts
mica-text-script-parser
Code to parse movie screenplays
ner-annotator
Named Entity Recognition (NER) Annotation tool for SpaCy. Generates Traning Data as a JSON which can be readily used.
power-of-great-datasets
Materials for rstudio::global(2021) lightning talk
screenpy-1
Screenplay pattern base for Python automated UI test suites.
Script-Generation
Generating movie scripts by genre using CTRL framework and GPT-2
syuzhet
An R package for the extraction of sentiment and sentiment-based plot arcs from text
text_summurization_abstractive_methods
Multiple implementations for abstractive text summurization , using google colab
theming
The core repository for the Theme Ontology project.