forrestthewoods / lib_fts

single-file public domain libraries

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fts_fuzzy_match: Recursion count doesn't get decremented anywhere

arch4672 opened this issue · comments

The recursionCount variable can reach the recursionLimit very quickly as it is not being calculated correctly. It means in some cases not all of the possible matches get exhaustively searched.

I think recursionCount is meant to count at what level of recursion you're currently at, but because it doesn't get decremented once a recursive call has exited it's not correct.

I think decrementing recursionCount after the call to fuzzy_match_recursive() fixes the problem.

Thanks for sharing this library by the way - it's really useful.

It means in some cases not all of the possible matches get exhaustively searched.

That is my intent. My goal is not to limit recursion depth to prevent stack overflows. My goal is to keep fuzzy_match reliably performant. I don't want degenerate cases such as fuzzy_match("aaaaaaaaa", "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa") to freeze the application.

It's sacrificing a small amount of accuracy for more consistent performance. I think that's a fair trade-off.

Do you have a particular use case where this is causing you a problem?

Fair enough. For my application, the chances of having an item with the text "aaaaaaaaaaaaaaaaaaa" are pretty slim, so I've made recursionCount in my version count the recursion depth.

The use case that pointed me to this was if you searched for "CH CBN DECK SPKR SUPP" with "decksupp".

It found the item, but didn't give it the highest score possible as it matched it like this (where matches are between { }'s):

"CH CDB {DECK} {S}PKR S{UPP}"

The highest scoring match is actually:

"CH CDB {DECK} SPKR {SUPP}"

Modifying your code so it decrements the recusion counter sorted this out for me. I thought I'd point it out, but it sounds like your code is doing what you intended.

Interesting test case. I'll try to take a closer look using it. Thanks.

Indeed :)

You can also notice it in your demo, when searching the Unreal Engine 4 Filenames using the Mock pattern and ordering by Best Match... The first several matches now look like:

21 - MockAI.h
20 - MacroCallback.h
19 - MockAI.cpp
18 - MacroCallback.cpp
18 - MockAI_BT.h
16 - MockAI_BT.cpp
13 - MouseClickProcessor.cs
13 - MovieSceneColorTrack.h
11 - MockGameplayTasks.h
11 - MovieSceneColorTrack.cpp

and I would say MockAI.cpp is a better match than MacroCallback.h :)

It looks like proximity-from-the-start should be overridden by something like proximity-of-range-matched???

It's true that the demo still uses the older, non-exhaustive JavaScript search but when running the C++ version on similar inputs I got something very similar.

@arch4672 is right about his case. The recursion count accumulates pretty quickly over 10 the way the current code works and in the case given by @arch4672 the maximum "recursion" is reached before the 2nd match.

I understand the intent, but the implementation is not quite right.

I understand the intent, but the implementation is not quite right.

@bubnikv, Agreed.

When I did my original blog post in 2016 there was nothing public in the way of "sublime fuzzy matching". I took a bit of a stab in the dark.

Does anyone know what the latest and greatest in fuzzy matching algorithms is? There's been a definite increase in tools leverages fuzzy search over the past few years. But I'm not sure what algorithms are out there.