Encoding when printing utf-8 to windows console
anlx-sw opened this issue · comments
I have encoding problems with xkcdpass to create passphrases from a wordlist with utf-8 chars and printing them the the windows console.
This works without problems on Linux the problem is only on windows.
Maybe the output has to be prepared somehow to work on the windows console:
https://neurocline.github.io/dev/2016/10/13/python-utf8-windows.html
Environment:
Python 3.6.4
xkcdpass installed via pip (xkcdpass-1.14.3)
I tested it with the compiled C:\Python36\Scripts\xkcdpass.exe
which pip is installing.
The problem is the same in the normal cmd.exe - console as well as in the powershell.exe console.
Sample Output:
Herrscher Silber fördern Plädoyer verstehe Ablösung
I think that should read:
Herscher Silber fördern Plädoyer verstehe Ablösung
Update:
if i echo the "umlauts" with the python.exe directly started in the windows console i get no errors:
> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Umlaute ÄÖÜäöÜ")
Umlaute ÄÖÜäöÜ
>>>
If i try to use xkcdpass as a module the same error occures:
> python
Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:54:40) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import xkcdpass
>>> from xkcdpass import xkcd_password as xp
>>> wordfile = xp.locate_wordfile("ger-anlx-sorted.txt")
>>> mywords = xp.generate_wordlist(wordfile=wordfile)
>>> print(xp.generate_xkcdpassword(mywords))
Georgios verlangte Töne holte teilten unbekannt
>>>
This should read
Georgios verlangte Töne holte teilten unbekannt
As your test with printing unicode directly succeeded, I suspect this is because of this open call. I assume that the word file is stored on Windows as utf-8 as well, but the open()
call uses the platform-dependent default encoding. On Linux, this is utf-8, on Windows, this is ISO-8859-1 (I think), which would explain your findings.
Can you try what happens when you change that line to this?
with open(wordfile, encoding='utf-8') as wlf:
Quick test in Windows 10 suggests that @florianjacob fix above works. I've pushed the change, can you check if it works for you?
yes - i can confirm that this fix works for me. thanks.
This fix is in the 1.16.1 release - thanks again