schollz / closestmatch

Golang library for fuzzy matching within a set of strings :page_with_curl:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Results vary on repeated calls for same query string [ClosestN]

raghur opened this issue · comments

commented

So I'm trying out your lib for building a fuzzy file matcher... input is 1000 filenames... I built a small python client that calls the server. The first time, the server builds the closestmatch.ClosestMatch structure and reuses it for subsequent calls. Interestingly, when I type in the same query, each time I'm getting 10 different results. Is this how its supposed to work?

Here's the server log:

fuzzy-denite\scratch>go run gopickle.go
INFO[0000] starting
ERRO[0013] /search: Context 12345 does not exist and no data passed
INFO[0013] Creating closestmatch context 12345
INFO[0014] Created new context 12345 of size 1000
INFO[0014] Searching for com in context 12345
INFO[0014] 10 matches for com. Will return max: 10 results
INFO[0018] Searching for com in context 12345
INFO[0018] 10 matches for com. Will return max: 10 results

Here's the client logs:

fuzzy-denite\scratch>python sender.py send closestmatch p1000.dat
['closestmatch', 'p1000.dat']
com
resending with data
200 OK
10
d:\code\go\src\github.com\josharian\impl\LICENSE.txt
d:\code\go\src\github.com\fatih\motion\main.go
d:\code\go\src\golang.org\x\text\LICENSE
d:\code\go\src\github.com\golang\dep\analyzer.go
d:\code\go\src\github.com\kisielk\gotool\go13.go
d:\code\go\src\google.golang.org\api\google-api-go-generator\clients_test.go
d:\code\go\src\github.com\tpng\gopkgs\LICENSE.txt
d:\code\go\src\gopkg.in\urfave\cli.v1\appveyor.yml
d:\code\go\src\github.com\nsf\gocode\scope.go
d:\code\go\pkg\dep\sources\https---github.com-sirupsen-logrus.git\formatter.go
com
200 OK
10
d:\code\go\src\github.com\BurntSushi\toml\COMPATIBLE
d:\code\go\pkg\dep\sources\https---github.com-onsi-gomega\matchers\be_closed_matcher_test.go
d:\code\go\pkg\dep\sources\https---github.com-spf13-cobra\command_notwin.go
d:\code\go\src\google.golang.org\api\examples\gopher.png
d:\code\go\src\github.com\BurntSushi\toml\doc.go
d:\code\go\pkg\dep\sources\https---github.com-sergi-go--diff\APACHE-LICENSE-2.0
d:\code\go\pkg\dep\sources\https---gopkg.in-yaml.v2\yamlh.go
d:\code\go\src\github.com\peterh\liner\output.go
d:\code\go\src\github.com\nsf\gocode\config.go
d:\code\go\src\github.com\sirupsen\logrus\doc.go

Sources and data are here - https://github.com/raghur/fuzzy-denite/tree/closestmatch/scratch

commented

Ok -I went through the code and it uses goroutines to match. However, this seems faulty as shouldn't the ranking be stable?
Also, after a few more iterations, I land on the following:

com
200 OK
10
d:\code\go\src\github.com\bytesparadise\libasciidoc\codecov.yml
d:\code\go\bin\megacheck.exe
d:\code\go\tags
...

Those aren't the 'best' ranked matches.. - if you see the second and third result.

commented

@raghur Can you explain what your file logs are for? What is the search list and what is your query?

commented

I'm trying to build an interactive fuzzy file filtering routine to filter file names under a directory as the user types.

In the client logs above, 'com' is the query.. the search list of files is a fixed list of 1000 files that I'm using as a test corpus.

I'm getting the same problem in Closest, where a fixed list and fixed search string return different results between invocations