Unicode not rendered correctly

Question

Unicode not rendered correctly

DrHyde opened this issue 5 years ago · comments

David Cantrell commented 5 years ago

Originally reported on the old repo, see Ovid/Test-Differences#10 for earlier discussion.

David Cantrell · Answer 1 · Wed Feb 20 2019 08:04:12 GMT+0800 (China Standard Time)

If as suggested I change the Data::Dumper::Dumper invocation to this:

[ split /^/, Data::Dumper::AutoEncode::eDumper($_) ],

that doesn't appear to help. This:

#   Failed test at t/unicode.t line 16.
# +----+---------------+----------------+
# | Elt|Got            |Expected        |
# +----+---------------+----------------+
# |   0|[              |[               |
# *   1|  "\x{2603}",  |  "\x{1f4a9}",  *
# *   2|  "\x{1f4a9}"  |  "\x{2603}"    *
# |   3|]              |]               |
# +----+---------------+----------------+

just gets turned into the equally unhelpful:

#   Failed test at t/unicode.t line 16.
# +----+----------------------+-----------------------+
# | Elt|Got                   |Expected               |
# +----+----------------------+-----------------------+
# |   0|[                     |[                      |
# *   1|  '\xe2\x98\x83',     |  '\xf0\x9f\x92\xa9',  *
# *   2|  '\xf0\x9f\x92\xa9'  |  '\xe2\x98\x83'       *
# |   3|]                     |]                      |
# +----+----------------------+-----------------------+

I don't understand how Unicode and encodings work, so I think the only way this is going to get fixed is if someone can provide a patch with tests. Such a patch needs to include a test to make sure that the encoding of data passed to eq_or_diff isn't pass-by-ref-polluted by having its encoding changed.

David Cantrell · Answer 2 · Fri Jun 04 2021 01:26:11 GMT+0800 (China Standard Time)

It has been pointed out elsewhere that trying to spit out non-ASCII characters will fail badly if there are things like zero-width spaces or unprintable control characters in the data. Outputting hexadecimal numbers is the best we can do.