Unicode not rendered correctly
DrHyde opened this issue · comments
Originally reported on the old repo, see Ovid/Test-Differences#10 for earlier discussion.
If as suggested I change the Data::Dumper::Dumper invocation to this:
[ split /^/, Data::Dumper::AutoEncode::eDumper($_) ],
that doesn't appear to help. This:
# Failed test at t/unicode.t line 16.
# +----+---------------+----------------+
# | Elt|Got |Expected |
# +----+---------------+----------------+
# | 0|[ |[ |
# * 1| "\x{2603}", | "\x{1f4a9}", *
# * 2| "\x{1f4a9}" | "\x{2603}" *
# | 3|] |] |
# +----+---------------+----------------+
just gets turned into the equally unhelpful:
# Failed test at t/unicode.t line 16.
# +----+----------------------+-----------------------+
# | Elt|Got |Expected |
# +----+----------------------+-----------------------+
# | 0|[ |[ |
# * 1| '\xe2\x98\x83', | '\xf0\x9f\x92\xa9', *
# * 2| '\xf0\x9f\x92\xa9' | '\xe2\x98\x83' *
# | 3|] |] |
# +----+----------------------+-----------------------+
I don't understand how Unicode and encodings work, so I think the only way this is going to get fixed is if someone can provide a patch with tests. Such a patch needs to include a test to make sure that the encoding of data passed to eq_or_diff
isn't pass-by-ref-polluted by having its encoding changed.
It has been pointed out elsewhere that trying to spit out non-ASCII characters will fail badly if there are things like zero-width spaces or unprintable control characters in the data. Outputting hexadecimal numbers is the best we can do.