A bot that posts questions without context. The questions are drawn from Project Gutenberg texts.
Currently posting several times a day to Mastodon. (It used to post to Twitter, but I don't use Twitter anymore and neither do my bots.)
π β π€ β @obliquestions@botsin.space on Mastodon
Taking inspiration from Hugo van Kemenade's gutengrep project, the initial corpus was derived from books in the Project Gutenberg 'August 2003' CD. To make the dataset cleaner to begin with, I removed almost 200 books from the collection manually before building my corpus. These included non-English texts, poetry and dramatic texts, texts heavy with dialect, and religious, mathematical, encyclopedic, and political texts.
This left me with about 400 books. I used gutengrep to tokenize the texts into sentences.
Once tokenized, I cleaned up the corpus a bit:
- deleted duplicate lines (with Sublime Text's
Edit β Permute Lines β Unique
command) - deleted empty lines (found
\n\n
and replaced it with\n
).
Then I wrote a script (build-corpus.js) to format and filter the sentences into a set of postable questions. In order:
-
Removed beginning and trailing quotation marks, such that questions that were quotations in the original text would be posted as though they were prose.
-
Capitalized the first letter of the question, in case it wasn't already capitalized.
-
Filtered out any question longer than 140 characters.
-
Filtered out any question that included a proper noun. (I felt this would provide too much context.) I did this with a regular expression that searched for words preceded by a space and starting with a capitalized letter. This doesn't capture proper nouns at the beginning of sentences, but that's fine.
-
Filtered out any question that contained non-letter characters (excluding apostrophes), as they often indicated weird formatting and non-questions:
1 2 3 4 5 6 7 8 9 0 : ; . " β β β β < > [ ] ( ) { } ` ~ # $ % ^ & _ + - = \ / |
-
Filtered out any question that contained archaic language (like
thine
anddost
andprithee
). -
Filtered out any question that contained religious language (like
moses
andbuddha
andclergy
). -
Filtered out any question that would relate to the text itself or Project Gutenberg itself (like
gutenberg
anddonate
andchapter
andsection
). -
Filtered out the bad words listed in Darius Kazemi's wordfilter.
-
Filtered out any question that contained some additional oppressive language not covered by wordfilter and words that tend to appear in problematic sentences.
If a sentence passed all the filters, I added it to a giant JSON file.
After refining the script, I ended up with a JSON file of about 66K questions.
I then wrote a script (bot.js) that reads the JSON file, chooses a question from it at random, and posts the question.
I couldn't have created this bot without the help of the following:
-
Sarah Kuehnle's 'Creating a Twitter bot with Node.js' series
-
Darius Kazemi provided inspiration and personal technical assistance. I also referenced his projects examplebot and grunt-init-twitter-bot and his posts How to make a Twitter bot and Basic Twitter bot etiquette.
-
Hugo van Kemenade's gutengrep project was instrumental in both providing my corpus and tokenizing it into sentences.
-
Justin Falcone provided inspiration, encouragement, and personal technical assistance.
-
This project was inspired in name and in concept by Brian Eno and Peter Schmidt's Oblique Strategies.
-
This project was also inspired by Allison Parrish's Deep Question Bot.
-
This project was also inspired by Jeremy P. Bushnell's 'Notes Minus Context' Twitter account.
This is my first bot. The idea for Oblique Questions came to me while walking the dog on Saturday, October 31, 2015. I started working on it the next day and launched the working bot on the morning of Wednesday, November 4, 2015.