askorama / orama

🌌 Fast, dependency-free, full-text and vector search engine with typo tolerance, filters, facets, stemming, and more. Works with any JavaScript runtime, browser, server, service!

Home Page:https://docs.orama.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Customizable "weights" for Hybrid Search results

valstu opened this issue · comments

Problem Description

In the current hybrid search implementation there seems to be a limitation in balancing between full text search results and vector search results. For instance, in some cases, a more precise text match might be essential, but the current system does not allow for emphasizing full text search results over vector search results or vice versa.

Proposed Solution

Introduce a feature allowing users to assign custom weights to different search strategies, such as full text and vector searches. This would enable users to prioritize one method over the other based on their specific needs. For example, in cases where exact text matches are more critical, users could assign a higher weight to full text search results.

Alternatives

I could do two separate searches, one for full text and one for hybrid search. Then I could pass both results and query to my reranker model and then return the most relevant results. Or just do the "weighting" on my code.

Additional Context

To illustrate the need for this feature, I encountered a situation where my search query exactly matched the title of a document, but the hybrid search failed to return it. This was likely due to the embedding model (multilingual-e5-large) used for vector search, which tends to generate disproportionately high relevancy scores. Cosine similarity scores for this model are always between 0.7-1, where 0.7 is actually really bad score. By adjusting the similarity property of the search I was able to limit the amount of vector search results so that the actual document appeared on the results.

Work in progress on this one. Thanks for the suggestion!

Wow, that was fast 🔥