Redwing Brands's repositories
sumy
Module for automatic summarization of text documents and HTML pages.
book-dataset
This dataset contains 207,572 books from the Amazon.com, Inc. marketplace.
Book-Recommender-System
The goal of this project is to build a recommendation engine that aims to help users find books which might be interesting for them based on their summaries. We'll do this by applying Latent Dirichlet Allocation - LDA algorithm.
csi-corpus
annotated screenplays for 39 CSI:Crime Scene Investigation episodes for paper "Whodunnit? Crime Drama as a Case for Natural Language Understanding"
gpt-2-simple
Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts
bert-extractive-summarizer
Easy to use extractive text summarization with BERT
gpt-2
Code for the paper "Language Models are Unsupervised Multitask Learners"
Serial-Speakers
Companion toolkit of the 'Serial Speakers' dataset.
Narrative-Smoothing
Python scripts to build a dynamic network of interacting speakers within TV series.
The_FSD_dataset
This repository contains all data and code to reproduce the results demonstrated in the paper: Towards story-based classification of movie scenes, which is submitted to the Collection, "Science of Stories", in PLOS ONE journal.
shot-type-classifier
Detecting cinema shot types using a ResNet-50
github-typo-corpus
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors
bert_recommendation
Respository for Project
pytorch-pretrained-BERT
A PyTorch implementation of Google AI's BERT model provided with Google's pre-trained models, examples and utilities.
SFGram-dataset
SFGram (Science-Fiction Gram) is a dataset of public science-fiction novels, books and movie covers. It is designed to be used by researchers to study the evolution of the science-fiction literature over time and to test machine learning algorithms on authorship attribution and document classification tasks. All the documents are now published on the public domain and were obtained from the Gutenberg project or the archive.org website.
jsLDA
An implementation of latent Dirichlet allocation in javascript
litcliches
Code for the paper "Cliche expressions in literary and genre novels"
scriptbase
The ScriptBase Corpus
WikiPlots
A dataset containing story plots from Wikipedia (books, movies, etc.) and the code for the extractor.
rights-profile-tool
CBP Rights Profile Tool