timhutton / twitter-archive-parser

Python code to parse a Twitter archive and output in various ways

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error parsing "http://[linux iso]"

Niklas974 opened this issue · comments

I have retweetet this tweet. It contains the following line:

curl http://[linux iso] | dd of=/dev/sdb

http://[linux iso] is tried to be parsed as an ipv6 URL, but this obviously fails:

Traceback (most recent call last):
  File "/home/niklas/Downloads/twitter-archive/parser.py", line 1074, in <module>
    main()
  File "/home/niklas/Downloads/twitter-archive/parser.py", line 1046, in main
    media_sources = parse_tweets(username, users, html_template, paths)
  File "/home/niklas/Downloads/twitter-archive/parser.py", line 422, in parse_tweets
    tweets.append(convert_tweet(tweet, username, media_sources, users, paths))
  File "/home/niklas/Downloads/twitter-archive/parser.py", line 157, in convert_tweet
    url = urlparse(word)
  File "/usr/lib/python3.10/urllib/parse.py", line 393, in urlparse
    splitresult = urlsplit(url, scheme, allow_fragments)
  File "/usr/lib/python3.10/urllib/parse.py", line 484, in urlsplit
    raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL

I think this could be fixed with some error handling, so the script doesn't crash if there's a 'broken' URL somewhere in the tweet text.

@Niklas974 can you have a look at my version from #134 and try if it works?