Order of insertion of words with identical wordstem changes result set.
SirGrandmasterr opened this issue · comments
Hi,
while using this package to include a little form of an elasticsearch-like prefix-suggester, I've noticed that some words would be omitted in the results in some cases. A specific example:
"suspenseful" and "suspense".
I've created two testcases using only those two words that should, as far as I understand it, yield the same expected result.
For some reason, one of those tests will return the expected two strings that were inserted, but the other will only return the longer one.
Is this intended behavior?
{ name: "Word order small => big", dict: []string{"suspense", "suspenseful"}, trie: New(), search: "susp", expected: []string{ "suspense", "suspenseful", }, }, { name: "Word order big => small", dict: []string{"suspenseful", "suspense"}, trie: New(), search: "susp", expected: []string{ "suspense", "suspenseful", }, },
Best Regards,
Phillip
Hi Phillip,
That doesn't sound correct. Would you mind providing a PR with your tests?
Hi @glaslos,
I already tried to do so and seem to not have the repo permissions to create an upstream branch. Hence the wonkily copied code fragments in the original comment. :D
Hi @SirGrandmasterr ,
Yes, you would need to fork the project and then create a PR from the fork.
I have some time later today and I'll also try to reproduce the issue with your code snippet.
Hi @glaslos,
I've forked and made the PR.
Thanks for taking a look!
Just found your PR again, sorry for the silence. I had a look at the code and it seems this is intentional behavior (although questionable):
if you insert suspense
and then suspenseful
, the trie created looks like this:
suspense
\ful
if you switch the order, you get
suspenseful
representing both provided values.
I think what should happen is the suspenseful
node should be split up in two as in the previous example.
Would you be comfortable attempting to make that change?