Vivino / go-autocomplete-trie

go-autocomplete-trie is a data structure for text auto completion that allows for fuzzy matching and configurable levenshtein distance limits

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Order of insertion of words with identical wordstem changes result set.

SirGrandmasterr opened this issue · comments

Hi,
while using this package to include a little form of an elasticsearch-like prefix-suggester, I've noticed that some words would be omitted in the results in some cases. A specific example:
"suspenseful" and "suspense".

I've created two testcases using only those two words that should, as far as I understand it, yield the same expected result.
For some reason, one of those tests will return the expected two strings that were inserted, but the other will only return the longer one.
Is this intended behavior?
{ name: "Word order small => big", dict: []string{"suspense", "suspenseful"}, trie: New(), search: "susp", expected: []string{ "suspense", "suspenseful", }, }, { name: "Word order big => small", dict: []string{"suspenseful", "suspense"}, trie: New(), search: "susp", expected: []string{ "suspense", "suspenseful", }, },

Best Regards,
Phillip

Hi Phillip,
That doesn't sound correct. Would you mind providing a PR with your tests?

Hi @glaslos,
I already tried to do so and seem to not have the repo permissions to create an upstream branch. Hence the wonkily copied code fragments in the original comment. :D

Hi @SirGrandmasterr ,
Yes, you would need to fork the project and then create a PR from the fork.
I have some time later today and I'll also try to reproduce the issue with your code snippet.

Hi @glaslos,
I've forked and made the PR.
Thanks for taking a look!

Just found your PR again, sorry for the silence. I had a look at the code and it seems this is intentional behavior (although questionable):

if you insert suspense and then suspenseful, the trie created looks like this:

suspense
        \ful

if you switch the order, you get

suspenseful

representing both provided values.
I think what should happen is the suspenseful node should be split up in two as in the previous example.

Would you be comfortable attempting to make that change?