typesense / typesense

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Home Page:https://typesense.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request: disable typos for alphanumeric tokens (combination of numbers and other characters)

studiobovenkamer opened this issue · comments

We would like to have the option to disable typos for tokens that consist of (indexed) symbols when letters and numbers are combined in the token.

An example search would be c-136/14
Our schema is set to'symbols_to_index' => ['/']
Proposed new search parameter enable_typos_for_alpha_numerical_tokens: false

Desired result result:
136/14 would be considered as one token and will not be considered for any typos for this token.

So it should NOT match 13/14 (136)214 536/14/EN

But it could match C-136/14 c136/14 A-136/14

We are currently running Typesense v0.26.0.rc58

With 'symbols_to_index' => ['/'] the token c-136/14 will be indexed as c136/14. We will be able to prevent typo matching on such alphanumerical tokens with a flag, just as we introduced it for numerical tokens.

However., your examples of allowing only 136/14 to match A-136/14 won't be possible (unless it's done via split_join_tokens mode) as the logic involves specific business logic that's not universally applicable.