tr: Illegal byte sequence

Question

tr: Illegal byte sequence

sureshjoshi opened this issue 3 years ago · comments

On almost every invocation of ./pwd.sh w someusername 30, I receive a "tr: Illegal byte sequence" response. On the very few invocations that work (one in every 20), my random safe/filename is only 1-2 characters long.

It appears to be a MacOS problem (I'm on 11.6) - and the workaround here appears to work: https://unix.stackexchange.com/questions/45404/why-cant-tr-read-from-dev-urandom-on-osx

LC_CTYPE=C tr -dc "[:lower:]" < /dev/urandom | fold -w8 | head -n1

I don't have any locales in my environment by default, and my zshrc doesn't set any by default - so I think the data pulled from urandom is going haywire.

I'm also not sure if my workaround above is "the" solution, or just a workaround.

drduh · Answer 1 · Mon Oct 25 2021 02:51:49 GMT+0800 (China Standard Time)

Thanks for reporting this issue.

Can you let me know if it works with LC_CYPE=en_US.UTF-8? I recommend setting it in your zshrc for now like https://github.com/drduh/config/blob/master/zshrc#L25-L29

SJ · Answer 2 · Mon Oct 25 2021 07:21:32 GMT+0800 (China Standard Time)

Actually, it turns out that no, using that CTYPE doesn't work - which has me all kinds of surprised. I wasn't expecting it to fail. I've tried setting both LC_CTYPE and LC_ALL to that locale, and I still get the illegal byte sequence.

mrpudn · Answer 3 · Fri Jan 07 2022 09:54:29 GMT+0800 (China Standard Time)

I am also having this issue on macos with zsh.

Here's the offending command pipeline and sample output:

$ tr -dc "foobar" < /dev/urandom | fold -w8 | head -n1 
tr: Illegal byte sequence

Like @sureshjoshi, I was unable to get this line working by prefixing LC_CTYPE=en_US.UTF-8:

$ LC_CTYPE=en_US.UTF-8 tr -dc "foobar" < /dev/urandom | fold -w8 | head -n1
tr: Illegal byte sequence

LC_CTYPE=en_US.UTF8 seems to work though for some reason:

$ LC_CTYPE=en_US.UTF8 tr -dc "foobar" < /dev/urandom | fold -w8 | head -n1
orbrrrba

LC_CTYPE=C also works:

$ LC_CTYPE=C tr -dc "foobar" < /dev/urandom | fold -w8 | head -n1
ffrbrooo

As @drduh mentioned, this is probably something that should be configured in our .zshrc.

Edit: This probably explains the above:

$ LANG=en_US.utf8 locale                                               
LANG="en_US.utf8"
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL="C"

en_US.utf8 is probably invalid, so it just defaults to C for the locale, which works above. I can't seem to get en_US.utf-8 to work, though.

By default, my locale looks like this (nothing in .zshrc):

$ locale
LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL="en_US.UTF-8"

So I'm kind of stumped now. The only thing I can think to do is to patch this line in my copy of pwd.sh by prefixing it with LC_ALL=C. This works for me for the time being.

drduh · Answer 4 · Mon Aug 22 2022 02:18:43 GMT+0800 (China Standard Time)

We can make the same fix as drduh/Purse#3

drduh · Answer 5 · Tue Dec 27 2022 07:43:48 GMT+0800 (China Standard Time)

Fix pushed, thanks for reporting and investigating!