hobg0blin / MachinesArePoetsToo

An introductory workshop on generative text.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machines Are Poets Too: An Introduction To Generative Text

Brent Bailey. Accompanying slides can be viewed here, along with a recording of the most recent version of the workshop given at the Electronic Literature Conference in July 2020.

This repo contains resources for learning about a small set of the myriad avilable methods to create generative text with code. Below are a few code samples inside the p5.js editor to toy with, plus a longer list of available resources to explore - this is just a small taste of the wide world of text generation tools available.

For some of the code samples (especially the ml5 stuff), it’s easiest if you have Python installed because running them in the p5 web editor gets tricky. Don’t worry, you don’t have to learn Python for this workshop! We’ll just be using it to run a local server. If you do have Python installed, just run python -m SimpleHTTPServer -8000 (for Python 2) or python -m http.server -8000 (for Python 3). From there, navigate to localhost:8000 in your browser (all work best in Chrome).

Basic Javascript Text Generation

A simple way of generating random text strings with p5 and an Amiri Baraka poem: p5 sketch

And a slightly more complicated one that loads an external text corpus taken from the works of James Baldwin: p5 sketch.

Tracery

Tracery is a JavaScript library used to create “grammars”: basically a top-level sentence structure then sets of words that meet each sentence component. We’ll use Allison Parrish’s p5 example here.

Word vectors with ml5

Word vectors is basiclly the use of complex math to determine the similarity of different words. ml5 has a simple-to-use model built on top of Tensorflow that we’ll use here. The code sample is located in the word2vec folder.

Rita.js

Rita.js is an incredible tool for any kind of computational work with text, but we’ll be focusing specifically on some potential generative applications of it.

If you want to mess around with the examples, you may find its documentation useful, as well as its list of PENN part-of-speech tags.

These examples are made with Rita’s “full” lexicon - if you get into doing more serious work with it, you may want a smaller version.

Code samples are located in the rita folder. A few of them are also online if you have issues running them locally:

LSTM (CharRNN)

CharRNN is a LSTM (Long Short-Term Memory) neural network available in the ml5 library. RNNs are, uh, hard to explain, but there’s more information in the slides and resources. You can find code samples in the char-rnn folder, or play with a model I trained on James Baldwin here.

Transformers (GPT-2)

Transformers are even harder to explain! You can read more about them here.

The easiest way to train your own GPT-2 model is with Runway. If you want to get closer to the metal, you can use Max Woolf’s Colab Notebook or GPT-2-simple python package.

Resources

“Simple” Stuff

How To Make A Dadaist Poem

Botnik - a predictive keyboard generator.

Dadaist NLP text from in-class exercise

ConceptNet, a semantic network

Cheap Bots Done Quick - quick bot creation with Tracery.

Free AIML Bots

Runway ML - basically PhotoShop for AI - plug and play machine learning with minimal setup. Currently in free and open beta!

Markov Chains

A visual explanation of Markov chains

Towards data science on Markov chains - honestly just start reading Towards Data Science if you’re into ML/AI.

Machine Learning

Allison’s Understanding Word Vectors - it’s in Python, but the principles are the same.

Publicly available nlp datasets - if you don’t want to deal with making your own, there are tons out there!

Recurrent Neural Networks Tutorial - the most beginner-friendly explanation of RNNs I’ve seen

The Unreasonable Effectiveness Of Recurrent Neural Networks - really useful intro to RNNs

Better Language Models and Their Implications (GPT-2) - scary but cool!

GPT-2 - source code for GPT-2

Can A Machine Write For The New Yorker? - article on GPT-2 and the future of writing.

Text Resources

Project Gutenberg - hella books.

Gutenberg-dammit - A download of everything on Project Gutenberg.

Gutenberg-poetry - The above, but for poetry.

Libgen - The Pirate Bay for books. I’m not linking here because they change URLs constantly, but look hard and you’ll find it. There’s a scraper for it in the post linked below.

Data Gathering and Preparation

Inside this very GitHub, you can use the python script clean_text.py to quickly remove nonalphanumeric characters from a file. Simply type python clean_text.py {your_file_name} in your working directory. This script is a handy little way to get started with text manipulation in Python.

My quick and dirty web scraping resources - contains links to a couple potentially useful scrapers I’ve written, plus a guide to cracking DRM protection if you’d like to get a protected corpus (fOr ReSeArcH pUrpOseS OnLy).

My intro to web scraping with scrapy

An are.na channel dedicated to web scraping

Effectively Pre-Processing Text Data

Other Fun Stuff

NaNoGenMo - national novel generation month. Every November. Generate a novel!

About

An introductory workshop on generative text.


Languages

Language:JavaScript 99.8%Language:HTML 0.2%Language:Python 0.0%Language:CSS 0.0%