EleutherAI's repositories
pile-literotica
Download, parse, and filter data from Literotica. Data-ready for The-Pile.
pile-cc-filtering
The code used to filter CC data for The Pile
pile-uspto
A script for collecting the USPTO Backgrounds dataset in a language modelling friendly format.
pile-allpoetry
Scraper to gather poems from allpoetry.com
pile-ubuntu-irc
A script for collecting the Ubuntu IRC dataset in a language modelling friendly format.
bucket-cleaner
A small utility to clear out old model checkpoints in Google Cloud Buckets whilst keeping tensorboard event files
lang-filter
Filter text files or archives by language
pile-cord19
A script for collecting the CORD-19 dataset in a language modelling friendly format.
discord-role-bot
Control Discord Roles with Reactions
000