Outputs UTF-8 characters on LATIN1 terminals

Question

Outputs UTF-8 characters on LATIN1 terminals

h3xx opened this issue 9 years ago · comments

Script does not do appropriate checking of terminal LC_CTYPE locale to test whether the terminal can accept UTF-8 characters. This results in corrupt output.

Suggestion: Transform strings into acceptable character set before outputting.

Steps to reproduce:

$ LANG=en_US LC_CTYPE=en_US rxvt -e bash -c 'doge;read'

OR

$ LANG=en_US LC_CTYPE=en_US xterm -e bash -c 'doge;read'

Results in:

Olivia 5000 · Answer 1 · Tue Jan 05 2016 19:39:40 GMT+0800 (China Standard Time)

Haha, that looks fun.

However, there is already some code handling non-UTF-8 cases. I am not sure why this isn't triggering for you, but it sure is for me when I run the snippets you specified.

Have you encountered this problem on a real-life machine or were you just mucking around with running non-UTF-8 stuff?

Cheers for finding inconsistent behavior anyway!

Dan Church · Answer 2 · Tue Jan 05 2016 23:41:06 GMT+0800 (China Standard Time)

Have you encountered this problem on a real-life machine or were you just mucking around with running non-UTF-8 stuff?

I ran it on the system and terminal I always use, using the configuration I use for pretty much all my computing.

The Python version I ran it under is 2.7.11. I know Python 2 has different string encoding handling than Python 3, so that may be the cause of it.