Vis and Unvis break on UTF-8
cyphar opened this issue · comments
Sigh. Okay, so if we have an especially well-named file such as AC_Raíz_Certicámara_S.A..pem
, go-mtree
will not handle it correctly when you call .Path()
on an entry which has its name set to the above.
Effectively what happens is that you have a multi-byte encoded character being passed to the lovely Vis
and Unvis
code -- which obviously break in horrible ways. The string is then mutated in a very ugly way.
IMO the only way of handling this is to rewrite Vis
and Unvis
in Go...
Thanks @vbatts for using C code written by BSD folks ~20 years ago. This is gonna be fun. 😸
wait. now the vis/unvis is in golang. only if you compile with the build tag cvis
.
@vbatts Right. The problem still exists, and it's because the port of Vis
/Unvis
is still holding on to the notion of bytes when doing a bunch of the operations...
Specifically, byte(some_rune)
will lose information. Because a rune can be longer than a single byte.
@vbatts Don't worry, I've got it working now. But IMO Vis
and Unvis
should be moved to a library. I'm also adding test cases.