schollz / closestmatch

Golang library for fuzzy matching within a set of strings :page_with_curl:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What is bagsize and how to use it?

dzpt opened this issue · comments

commented

I don't understand what's bagsize and why its []int{2} or []int{2,3,4}

@schollz: Any input here? I'm also a bit confused on how best to tweak this. The unittests have []int{4}, []int{5} but don't expand on what variable that's controlling.

I'll dig through the source, but an official explanation for dummies would be great. 😄

Looks like for each "sentence", the bagsizes control the size of sub-string chunks.

// "foo bar baz" @ []int{1,5}
[]string{"f", "o", "o", " ", "b", "a", "r", " ", "b", "a", "z"}
[]string{"foo b", "ar ba", "z"}

I think. But also not sure how best to tweak it for different datasets.