abrenaut / waybackscraper

Scrapes a website archives using Python's asyncio and aiohttp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UnicodeEncodeError on some requests

jpau opened this issue · comments

E.g. (note that is a placeholder for the URL I was downloading)

INFO:waybackscraper.wayback:Scraping the archive http://web.archive.org/web/20170118215122/<url here>

ERROR:waybackscraper.wayback:Error while scraping the archive http://web.archive.org/web/20120927070813/<url here> : 'charmap' codec can't encode character '\ufffd' in position 89238: character maps to <undefined> Traceback (most recent call last): File "...\lib\site-packages\waybackscraper\wayback.py", line 92, in scrape_archive scraping_result = await scrape_function(session, archive_url, archive_timestamp, archive_content) File "...\anaconda3\lib\site-packages\waybackscraper\scraper.py", line 60, in scrape output_file.write(result) File "...\anaconda3\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0]