jodal / comics

🗞️ Comics is a webcomics aggregator.

Home Page:https://comics.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NonXMLContentType exception in donthitsave crawler

jodal opened this issue · comments

The donthitsave crawler fails regularly with this unhandled exception.

ERROR    donthitsave/2021-03-20: text/html; charset=UTF-8 is not an XML media type
Traceback (most recent call last):
  File "/srv/comics/app/comics/comics/aggregator/command.py", line 18, in inner
    return func(*args, **kwargs)
  File "/srv/comics/app/comics/comics/aggregator/command.py", line 66, in _crawl_one_comic_one_date
    crawler_release = crawler.get_crawler_release(pub_date)
  File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 106, in get_crawler_release
    results = self.crawl(pub_date)
  File "/srv/comics/app/comics/comics/comics/donthitsave.py", line 19, in crawl
    feed = self.parse_feed("http://www.donthitsave.com/donthitsavefeed.xml")
  File "/srv/comics/app/comics/comics/aggregator/crawler.py", line 183, in parse_feed
    self.feed = FeedParser(feed_url)
  File "/srv/comics/app/comics/comics/aggregator/feedparser.py", line 15, in __init__
    raise self.raw_feed["bozo_exception"]
NonXMLContentType: text/html; charset=UTF-8 is not an XML media type