Allow late filtering or disable query modifications
danielbasso opened this issue · comments
Description
I'm working on a Typesense database and found a strange behavior. When filtering a int64 field (that represents a date) I get wildly different results hits:
5 term query:
No filter at all: 1 result;
Filtering for almost the entire dataset: 1 result;
Filtering the last decade: 3 results;
Filtering the last 4 years: 4 results;
Filtering for just the last year: 80+ results
I already understood what is happening: Typesense is filtering the results first and then deciding how to apply the query; If there is a great match (in this scenario the last century or no filter at all) it will return that document; otherwise, it will loosen the query, performing "derived queries", so to speak. If I limit the document set too much, it will only use fragments of the original query as input, returning lots of results.
Questions:
-
Is there a way to perform late filtering, meaning, fetch de documents first, then apply the filters? This would be the ideal scenario, but I couldn't find in the documentation;
-
Is there a way to disable any query modifications? I tried max_candidates: 0 and num_typos:0 but they don't affect the results. Only "exhaustive_search" works, but it goes the inverse way, trying every possibility of query it generates. I want to disable this behavior.
If the responses are in the documentation, please link them to me.
Possible related to #983
@danielbasso Could you try setting max_candidates: 10000
, that will fetch all possible candidate prefixes, before filtering.
You also want to set drop_tokens_threshold: 0
and if needed typo_tokens_threshold: 0
.
Hi @jasonbosco .
max_candidates: 10000
doesn't change the result set at all. exhaustive_search: true
does, but I want to go the other way around. drop_tokens_threshold: 0
and typo_tokens_threshold:0
apparently did the trick, thanks!
Although in my last post drop_tokens_threshold: 0 and typo_tokens_threshold:0 kinda solves the problem, late filtering would still be prefered, meaning:
- Search first, make dropping tokens if no results, etc.
- The filter the result set by the filters applied.
Seems more flexible and consistent.