jlev / parler-etl

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parler ETL

Scripts to extract, transform and load Parler data

Requirements:

  • Python3
  • pip install -r requirements.txt

Steps:

  • Download mirror of posts (34.2gb), users (1.1gb), metadata (200mb)
  • unzip
  • python create-data-tables.py --host --username --password
  • python load-files-to-database.py --input parler_posts --type posts --host --username --password
  • python load-files-to-database.py --input parler_users --type users --host --username --password
  • python transform-video-metadata-json-to-jsonl.py --input metadata.tar.gz --output metadata.jsonl
  • python load-metadata-to-database.py --input metadata.jsonl --host --username --password
  • do analysis, export csv with usernames, metadata_id
  • python export-posts-by-user.py --input export.csv --output export --host --username --password --aws_key --aws_secret

Acknowledgements:

About


Languages

Language:Python 100.0%