lithammer / fuzzysearch

:pig: Tiny and fast fuzzy search in Go

Home Page:https://pkg.go.dev/github.com/lithammer/fuzzysearch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mishandling of utf8 replacement character

josharian opened this issue · comments

Add this test case to var fuzzyTests, and run the tests:

	{"\xffinvalid UTF-8\xff", "", false, -1},

Result:

--- FAIL: TestFuzzyMatchFold (0.00s)
panic: runtime error: slice bounds out of range [19:15] [recovered]
	panic: runtime error: slice bounds out of range [19:15]

goroutine 35 [running]:
testing.tRunner.func1.2({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/testing/testing.go:1526 +0x1c8
testing.tRunner.func1()
	/Users/josh/go/1.20/src/testing/testing.go:1529 +0x384
panic({0x1031423e0, 0x14000114018})
	/Users/josh/go/1.20/src/runtime/panic.go:884 +0x204
golang.org/x/text/transform.String({0x103153b28, 0x103297c60}, {0x1030cb139, 0xf})
	/Users/josh/pkg/mod/golang.org/x/text@v0.9.0/transform/transform.go:650 +0x9e4
github.com/lithammer/fuzzysearch/fuzzy.stringTransform({0x1030cb139, 0xf}, {0x103153b28?, 0x103297c60?})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:242 +0x64
github.com/lithammer/fuzzysearch/fuzzy.match({0x1030cb139?, 0x7?}, {0x0, 0x0}, {0x103153b28, 0x103297c60})
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:55 +0x38
github.com/lithammer/fuzzysearch/fuzzy.MatchFold(...)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy.go:41
github.com/lithammer/fuzzysearch/fuzzy.TestFuzzyMatchFold(0x1400011cb60)
	/Users/josh/x/fuzzysearch/fuzzy/fuzzy_test.go:65 +0xbc
testing.tRunner(0x1400011cb60, 0x103152088)
	/Users/josh/go/1.20/src/testing/testing.go:1576 +0x10c
created by testing.(*T).Run
	/Users/josh/go/1.20/src/testing/testing.go:1629 +0x368
exit status 2
FAIL	github.com/lithammer/fuzzysearch/fuzzy	0.131s

This existed prior to #53 (phew!). The root cause is that unicodeFoldTransformer.Transform is returning n, n, err, but when utf8.RuneError is present, nSrc may differ from nDst. I'll try to put together a fix sometime soonish.

(Found by fuzzing. Once the fuzz tests make it out of the gate without stumbling, I'll PR them.)

Though it is elegant and composes well, it is possible that moving away from package transform may end up making the code simpler, faster, and more robust. (Need to think about that a bit.)

{"Ⱦ", "", false, -1},

is another interesting test case because its lowercase form has a different UTF-8 encoded length than its uppercase form.