mgrandi / furaffinity_scrape

scripts to assist in a scrape of furaffinity.net

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

furaffinity_scrape

Scripts to assist with the scraping of furaffinity.net

They are split into modules

main usage


> python cli.py --help

usage: cli.py [-h] --config CONFIG {scrape_users} ...

utilities for scraping furaffinity.net

positional arguments:
  {scrape_users}

optional arguments:
  -h, --help       show this help message and exit
  --config CONFIG  the HOCON config file

scrape_users subcommand

a subcommand to build up a database of users by downloading every furaffinity submission in order, parse it with lxml / beautifulsoup4, and then pick out the parts that could be usernames:

  • artist
  • art descriptions, often link to other users
  • usernames of users who comment

this will download a submission:

1: checks to see if the submission has been deleted or not

1.1: if the submission still exists

1.1.1: creates a beautifulsoup4 object out of the downloaded html

1.1.2: runs CSS selector queries on it to parse out any available usernames

1.2.3: see if the extracted users exist in the database, if not, add them

1.2.4: add submission HTML to the database

2: add row to the database marking that we have processed this submission

Eventually, we will have a sizeable list of users, after which we can iterate over them to download their user pages to get even more users, through user page comments, watch lists, profile information.

About

scripts to assist in a scrape of furaffinity.net


Languages

Language:Python 100.0%