google / go-cmp

Package for comparing Go values in tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Help with tests using cmp.Diff and chinese characters?

bashbunni opened this issue · comments

Hey there,

I'm trying to reproduce an issue in one of the projects I maintain and am using cmp.Diff to show what went wrong when a test fails. The issue I'm facing now is that I can't read the output, so I'm not able to do much with the information given.

I guess my question is, what can I do with the byte output shown below and would the // +|.[0m.[38;5;252m| be the string value of what's changed?

    glamour_test.go:279: got != want
        -want +got:
        diff:
          string{
          	... // 78204 identical bytes
          	0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x1b, 0x5b, 0x30, 0x6d, //  |m.[38;5;252m.[0m|
          	0x20, 0x20, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x31, 0x3a, 0x34, //  |  .[38;5;252m1:4|
        + 	0x1b, 0x5b, 0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d,       // +|.[0m.[38;5;252m|
          	0x3a, 0x39, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x20, 0x1b, 0x5b, //  |:9.[38;5;252m .[|
          	0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d, 0x20, 0x1b, 0x5b, //  |0m.[38;5;252m .[|
          	... // 340063 identical bytes
          }
--- FAIL: TestWrapping (0.33s)
    --- PASS: TestWrapping/english_short (0.00s)
    --- PASS: TestWrapping/chinese_short (0.00s)
    --- FAIL: TestWrapping/chinese_long (0.33s)
FAIL

cmp version: github.com/google/go-cmp v0.5.9
go version: go 1.17

Here's a link to the pull request and an example output we're comparing:

charmbracelet/glamour#249
testdata/issues/long-chinese-text.test

Thank you very much for your great project and I appreciate any guidance you're able to give :)

This output seems to be working as intended. You're comparing two strings that cmp.Diff has detected contains non-printable characters. For that reason, it switched to a mode where it diffs the raw byte values.

This particular output is saying that the got has an additional string injected at some offset after 78204:

0x1b, 0x5b, 0x30, 0x6d, 0x1b, 0x5b, 0x33, 0x38, 0x3b, 0x35, 0x3b, 0x32, 0x35, 0x32, 0x6d,

The best ASCII representation of this string is:

.[0m.[38;5;252m

(BTW, the output you are seeing is inspired by the hexdump utility, which prints the raw hex values on the left, and the best ASCII representation on the right.)

This happens to be an ANSI escape sequence that is common in terminals.

@dsnet ah amazing, thank you for your help!