Encoding when printing utf-8 to windows console

Question

Encoding when printing utf-8 to windows console

anlx-sw opened this issue 7 years ago · comments

I have encoding problems with xkcdpass to create passphrases from a wordlist with utf-8 chars and printing them the the windows console.

This works without problems on Linux the problem is only on windows.

Maybe the output has to be prepared somehow to work on the windows console:
https://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html

Environment:

Python 3.6.4
xkcdpass installed via pip (xkcdpass-1.14.3)

I tested it with the compiled C:\Python36\Scripts\xkcdpass.exe which pip is installing.
The problem is the same in the normal cmd.exe - console as well as in the powershell.exe console.

Sample Output:
Herrscher Silber fÃ¶rdern PlÃ¤doyer verstehe AblÃ¶sung

I think that should read:
Herscher Silber fördern Plädoyer verstehe Ablösung

Update:
if i echo the "umlauts" with the python.exe directly started in the windows console i get no errors:

> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Umlaute ÄÖÜäöÜ")
Umlaute ÄÖÜäöÜ
>>>

If i try to use xkcdpass as a module the same error occures:

> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import xkcdpass
>>> from xkcdpass import xkcd_password as xp
>>> wordfile = xp.locate_wordfile("ger-anlx-sorted.txt")
>>> mywords = xp.generate_wordlist(wordfile=wordfile)
>>> print(xp.generate_xkcdpassword(mywords))
Georgios verlangte TÃ¶ne holte teilten unbekannt
>>>

This should read
Georgios verlangte Töne holte teilten unbekannt

florianjacob · Answer 1 · Tue Feb 20 2018 04:53:12 GMT+0800 (China Standard Time)

As your test with printing unicode directly succeeded, I suspect this is because of this open call. I assume that the word file is stored on Windows as utf-8 as well, but the open() call uses the platform-dependent default encoding. On Linux, this is utf-8, on Windows, this is ISO-8859-1 (I think), which would explain your findings.

Can you try what happens when you change that line to this?

    with open(wordfile, encoding='utf-8') as wlf:

Steven Tobin · Answer 2 · Tue Feb 20 2018 07:48:49 GMT+0800 (China Standard Time)

Quick test in Windows 10 suggests that @florianjacob fix above works. I've pushed the change, can you check if it works for you?

anlx-sw · Answer 3 · Thu Feb 22 2018 02:35:16 GMT+0800 (China Standard Time)

yes - i can confirm that this fix works for me. thanks.

Steven Tobin · Answer 4 · Sat Feb 24 2018 22:21:27 GMT+0800 (China Standard Time)

This fix is in the 1.16.1 release - thanks again