NonXMLContentType exception in donthitsave crawler
jodal opened this issue · comments
Stein Magnus Jodal commented
The donthitsave
crawler fails regularly with this unhandled exception.
ERROR donthitsave/2021-03-20: text/html; charset=UTF-8 is not an XML media type
Traceback (most recent call last):
File "/srv/comics/app/comics/comics/aggregator/command.py", line 18, in inner
return func(*args, **kwargs)
File "/srv/comics/app/comics/comics/aggregator/command.py", line 66, in _crawl_one_comic_one_date
crawler_release = crawler.get_crawler_release(pub_date)
File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 106, in get_crawler_release
results = self.crawl(pub_date)
File "/srv/comics/app/comics/comics/comics/donthitsave.py", line 19, in crawl
feed = self.parse_feed("http://www.donthitsave.com/donthitsavefeed.xml")
File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 183, in parse_feed
self.feed = FeedParser(feed_url)
File "/srv/comics/app/comics/comics/aggregator/feedparser.py", line 15, in __init__
raise self.raw_feed["bozo_exception"]
NonXMLContentType: text/html; charset=UTF-8 is not an XML media type