l1kw1d / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How do I use it?

  1. Download your Twitter archive (Settings > Your account > Download an archive of your data).
  2. Unzip to a folder.
  3. Right-click this link parser.py and select "Save Link as", and save into the folder where you extracted the archive.
    • Or with wget: wget https://raw.githubusercontent.com/timhutton/twitter-archive-parser/main/parser.py
  4. Run parser.py with Python3. e.g. python parser.py from a command prompt opened in that folder.

If you are having problems, the discussion here might be useful: https://mathstodon.xyz/@timhutton/109316834651128246

What does it do?

The Twitter archive gives you a bunch of data and an HTML file (Your archive.html). Open that file to take a look! It lets you view your tweets in a nice interface. It has some flaws but maybe that's all you need. If so then stop here, you don't need our script.

Flaws of the Twitter archive:

  • It shows you tweets you posted with images, but if you click on one of the images to expand it then it takes you to the Twitter website. If you are offline or have deleted your account or twitter.com is down then that won't work.
  • The tweets are stored in a complex JSON structure so you can't just copy them into your blog for example.
  • The images they give you are smaller than the ones you uploaded. I don't know why they would do this to us.
  • DMs are included but don't show you who they are from.
  • The links are all obfuscated in a short form using t.co, which hides their origin and redirects traffic to Twitter, giving them analytics. Also they will stop working if t.co goes down.

Our script does the following:

  • Converts the tweets to markdown and also HTML, with embedded images, videos and links.
  • Replaces t.co URLs with their original versions.
  • Copies used images to an output folder, to allow them to be moved to a new home.
  • Converts DMs to markdown, adds the user handles where known. Basic functionality for now, pending improvements.
  • Outputs lists of followers and following.
  • Afterwards, it asks if you want to try downloading the original size images.

TODO:

  • DM improvements (#80)
  • Identify more user handles (#79)
  • Likes (#22), ALT-text (#20)
  • Expand all URL shorteners (#42): bit.ly, goo.gl etc.
  • Handle reply-to-self threads (#23)

Related tools:

If our script doesn't do what you want then maybe a different tool will help:

About

Python code to parse a Twitter archive and output in various ways

License:GNU General Public License v3.0


Languages

Language:Python 100.0%