Error parsing "http://[linux iso]"
Niklas974 opened this issue · comments
I have retweetet this tweet. It contains the following line:
curl http://[linux iso] | dd of=/dev/sdb
http://[linux iso]
is tried to be parsed as an ipv6 URL, but this obviously fails:
Traceback (most recent call last):
File "/home/niklas/Downloads/twitter-archive/parser.py", line 1074, in <module>
main()
File "/home/niklas/Downloads/twitter-archive/parser.py", line 1046, in main
media_sources = parse_tweets(username, users, html_template, paths)
File "/home/niklas/Downloads/twitter-archive/parser.py", line 422, in parse_tweets
tweets.append(convert_tweet(tweet, username, media_sources, users, paths))
File "/home/niklas/Downloads/twitter-archive/parser.py", line 157, in convert_tweet
url = urlparse(word)
File "/usr/lib/python3.10/urllib/parse.py", line 393, in urlparse
splitresult = urlsplit(url, scheme, allow_fragments)
File "/usr/lib/python3.10/urllib/parse.py", line 484, in urlsplit
raise ValueError("Invalid IPv6 URL")
ValueError: Invalid IPv6 URL
I think this could be fixed with some error handling, so the script doesn't crash if there's a 'broken' URL somewhere in the tweet text.
@Niklas974 can you have a look at my version from #134 and try if it works?