claytron / ever2simple

Migrate from evernote to simplenote with markdown formatting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UTF-8 Problem

chris2fr opened this issue · comments

commented

On Mac

MacBook-Pro-de-Christopher:data chris$ ever2simple --output everout/ cmannnotes.enex
Traceback (most recent call last):
File "/usr/local/bin/ever2simple", line 9, in
load_entry_point('ever2simple==2.0', 'console_scripts', 'ever2simple')()
File "/Library/Python/2.7/site-packages/ever2simple/core.py", line 21, in main
converter.convert()
File "/Library/Python/2.7/site-packages/ever2simple/converter.py", line 80, in convert
notes = self.prepare_notes(xml_tree)
File "/Library/Python/2.7/site-packages/ever2simple/converter.py", line 63, in prepare_notes
converted_text = self._convert_html_markdown(title, raw_text)
File "/Library/Python/2.7/site-packages/ever2simple/converter.py", line 91, in _convert_html_markdown
html2plain.feed(text)
File "/Library/Python/2.7/site-packages/html2text/init.py", line 142, in feed
HTMLParser.HTMLParser.feed(self, data)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 117, in feed
self.goahead(0)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 163, in goahead
k = self.parse_endtag(i)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 401, in parse_endtag
self.handle_endtag(elem)
File "/Library/Python/2.7/site-packages/html2text/init.py", line 191, in handle_endtag
self.handle_tag(tag, None, 0)
File "/Library/Python/2.7/site-packages/html2text/init.py", line 474, in handle_tag
link_url(self, a['href'], title)
File "/Library/Python/2.7/site-packages/html2text/init.py", line 440, in link_url
title = ' "{0}"'.format(title) if title.strip() else ''
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 11: ordinal not in range(128)

Exactly the same issue for me!

Also getting this exact issue. I'm running Linux Mint 18.1 and python 2.7.9. Trying to output in json format.

Same issue here.

Far as I can tell, this problem mostly comes up when you're working with source files that have curly quotes and apostrophes (so you won't hit it every file). The exports from Evernote are UTF-8 encoded but when Python opens a file it will open it with the system default encoding (cl1252 ASCII in my case) and if it's expecting UTF-8 it will choke on those characters.

Far as I can tell, the script had the encoder handling in the wrong place, in the write step instead of the open step. I've got it working in my setup with the Python3 branch; I haven't tested it in 2 but as far as I can tell it's the same solution. When I've gotten all the stuff I'm trying to convert for myself handled I'll be PR-ing into my parent fork ( dougdiego/ever2simple ).