typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Home Page:https://typesense.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow late filtering or disable query modifications

danielbasso opened this issue · comments

Description

I'm working on a Typesense database and found a strange behavior. When filtering a int64 field (that represents a date) I get wildly different results hits:

5 term query:

No filter at all: 1 result;
Filtering for almost the entire dataset: 1 result;
Filtering the last decade: 3 results;
Filtering the last 4 years: 4 results;
Filtering for just the last year: 80+ results

I already understood what is happening: Typesense is filtering the results first and then deciding how to apply the query; If there is a great match (in this scenario the last century or no filter at all) it will return that document; otherwise, it will loosen the query, performing "derived queries", so to speak. If I limit the document set too much, it will only use fragments of the original query as input, returning lots of results.

Questions:

  1. Is there a way to perform late filtering, meaning, fetch de documents first, then apply the filters? This would be the ideal scenario, but I couldn't find in the documentation;

  2. Is there a way to disable any query modifications? I tried max_candidates: 0 and num_typos:0 but they don't affect the results. Only "exhaustive_search" works, but it goes the inverse way, trying every possibility of query it generates. I want to disable this behavior.

If the responses are in the documentation, please link them to me.

Possible related to #983

@danielbasso Could you try setting max_candidates: 10000, that will fetch all possible candidate prefixes, before filtering.

You also want to set drop_tokens_threshold: 0 and if needed typo_tokens_threshold: 0.

Hi @jasonbosco .

max_candidates: 10000 doesn't change the result set at all. exhaustive_search: true does, but I want to go the other way around. drop_tokens_threshold: 0 and typo_tokens_threshold:0 apparently did the trick, thanks!

Although in my last post drop_tokens_threshold: 0 and typo_tokens_threshold:0 kinda solves the problem, late filtering would still be prefered, meaning:

  1. Search first, make dropping tokens if no results, etc.
  2. The filter the result set by the filters applied.

Seems more flexible and consistent.