kbrsh / wade

:ocean: Blazing fast 1kb search library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Consider n-grams

cmcaine opened this issue · comments

At the risk of telling you things you're already well aware of, I'm going to forge ahead with this issue ;)

In the demo, if you search for e.g. "good work is no", then the top results are entries that contain the given words more often, rather than the entry that contains the full phrase "good work is no".

This is expected from the algorithm you describe in your blog post (also, did you realise that you are describing a form of TF-IDF scoring for each term?).

You could improve relevance for these kinds of searches by searching and scoring bigrams or n-grams in addition to individual terms.

I'm not invested in this particularly, (i.e. I don't intend to use this library), but I was browsing your projects and took a look. Maybe these comments are interesting, maybe not!