flashback-scraper

scraper for flashback.org to download all posts in a thread

usage:

PS C:\dev\Python\flashback-scraper\src> python .\main.py scrape thread {{THREAD_ID}} --output-path ../output.ndjson --sleep_ms 0.4

It will output JSON entries for each post, delimited by newline.

The scraper process is recoverable, if the script fails due to rate limit or other errors (will most likely happen), then you can rerun the script with the same output file. It will find the last sucessfully scraped page & start from there, and make sure no duplicates occur.

Todo

Handle posts that have been changed somehow
Make simple client for viewing the output & search.

About

MIT License

Languages

Language:JavaScript 45.6%Language:Python 32.6%Language:HTML 15.0%Language:CSS 6.8%