Allow late filtering or disable query modifications

Question

Allow late filtering or disable query modifications

danielbasso opened this issue 5 months ago · comments

Daniel Basso Ribas commented 5 months ago

Description

I'm working on a Typesense database and found a strange behavior. When filtering a int64 field (that represents a date) I get wildly different results hits:

5 term query:

No filter at all: 1 result;
Filtering for almost the entire dataset: 1 result;
Filtering the last decade: 3 results;
Filtering the last 4 years: 4 results;
Filtering for just the last year: 80+ results

I already understood what is happening: Typesense is filtering the results first and then deciding how to apply the query; If there is a great match (in this scenario the last century or no filter at all) it will return that document; otherwise, it will loosen the query, performing "derived queries", so to speak. If I limit the document set too much, it will only use fragments of the original query as input, returning lots of results.

Questions:

Is there a way to perform late filtering, meaning, fetch de documents first, then apply the filters? This would be the ideal scenario, but I couldn't find in the documentation;
Is there a way to disable any query modifications? I tried max_candidates: 0 and num_typos:0 but they don't affect the results. Only "exhaustive_search" works, but it goes the inverse way, trying every possibility of query it generates. I want to disable this behavior.

If the responses are in the documentation, please link them to me.

Possible related to #983

Jason Bosco · Answer 1 · Fri Mar 08 2024 03:45:55 GMT+0800 (China Standard Time)

@danielbasso Could you try setting max_candidates: 10000, that will fetch all possible candidate prefixes, before filtering.

You also want to set drop_tokens_threshold: 0 and if needed typo_tokens_threshold: 0.

Daniel Basso Ribas · Answer 2 · Fri Mar 08 2024 08:13:35 GMT+0800 (China Standard Time)

Hi @jasonbosco .

max_candidates: 10000 doesn't change the result set at all. exhaustive_search: true does, but I want to go the other way around. drop_tokens_threshold: 0 and typo_tokens_threshold:0 apparently did the trick, thanks!

Daniel Basso Ribas · Answer 3 · Fri Mar 08 2024 23:11:15 GMT+0800 (China Standard Time)

Although in my last post drop_tokens_threshold: 0 and typo_tokens_threshold:0 kinda solves the problem, late filtering would still be prefered, meaning:

Search first, make dropping tokens if no results, etc.
The filter the result set by the filters applied.

Seems more flexible and consistent.