amrosnik / adventure_scrape

An adventure in text mining.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

adventure_scrape

An adventure in text mining.

Here please find the code to create a pandas DataFrame of Adventure Time transcripts, applied in this order: -- adventure_scrape.py -- adventure_dataframe.py -- adventure_cleaning.py

.pkl files of the DataFrames are included for your convenience; the ones with the most recent commit dates are the most relevant!

From there, you can explore the data via text mining statistics and plots: -- adventure_analysis.py

To generate data for the character RNN InFINNerator, use: -- get_finn_data.py -- format_finn.py

to create the finn-related .pkl pickles. finn_just_dialogue.pkl is the initial dataset for InFINNerator.

Finally, the Google Colabs notebook INFINNERATOR_example.ipynb is the way to create the char RNN. It's a mini-batched LSTM network. I tried several different architectures and didn't get excellent performance, but please write me if you do!

If you'd like to read a more in-depth look at the data, I wrote a Medium post: https://medium.com/@catcherinthepi/adventure-scrape-text-mining-on-adventure-time-transcripts-8a50d09c2b6d?sk=b6a9bdd006ed9c1644ebe638ba3f01eb

Cheers, Andreana

About

An adventure in text mining.


Languages

Language:HTML 99.9%Language:Jupyter Notebook 0.1%Language:Python 0.1%