NonXMLContentType exception in donthitsave crawler

Question

NonXMLContentType exception in donthitsave crawler

jodal opened this issue 4 years ago · comments

Stein Magnus Jodal commented 4 years ago

The donthitsave crawler fails regularly with this unhandled exception.

ERROR    donthitsave/2021-03-20: text/html; charset=UTF-8 is not an XML media type
Traceback (most recent call last):
  File "/srv/comics/app/comics/comics/aggregator/command.py", line 18, in inner
    return func(*args, **kwargs)
  File "/srv/comics/app/comics/comics/aggregator/command.py", line 66, in _crawl_one_comic_one_date
    crawler_release = crawler.get_crawler_release(pub_date)
  File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 106, in get_crawler_release
    results = self.crawl(pub_date)
  File "/srv/comics/app/comics/comics/comics/donthitsave.py", line 19, in crawl
    feed = self.parse_feed("http://www.donthitsave.com/donthitsavefeed.xml")
  File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 183, in parse_feed
    self.feed = FeedParser(feed_url)
  File "/srv/comics/app/comics/comics/aggregator/feedparser.py", line 15, in __init__
    raise self.raw_feed["bozo_exception"]
NonXMLContentType: text/html; charset=UTF-8 is not an XML media type