Support search splitting on characters

Question

Support search splitting on characters

pullopen opened this issue 3 years ago · comments

Is your feature request related to a problem? Please describe.

Right now, when I search in Chinese, I have to enter the whole sentence or phrase since Chinese words are not splitted on whitespace. Usually we use characters combination to search.

Describe the solution you'd like

I hope Plume will support a mode to search by combination of characters (strictly ordered) rather than splitted on whitespace. It would be very helpful for Chinese users as well as many other languages which do not use whitespace.

Additional context
version: 0.6.1dev

KITAITI Makoto commented 3 years ago

Thanks!

KITAITI Makoto · Answer 1 · Sun Dec 20 2020 15:51:40 GMT+0800 (China Standard Time)

Thank you for finding and trying Plume!

If you are an administrator of your server, an environment variable SEARCH_CONTENT_TOKENIZER might be useful (See Useful Environment Variables for details). I think ngram tokenizer is useful for Chinese blogs.

You need recreate search index when switching tokenizer:

(Stop Plume process)
% rm -rfv search_index
% SEARCH_CONTENT_TOKENIZER=ngram plm search init
% SEARCH_CONTENT_TOKENIZER=ngram plume

You can try at my instance which uses this setting, by signing up, creating a blog, writing an article in Chinese and search some words in the article.

pullopen · Answer 2 · Mon Dec 21 2020 17:34:30 GMT+0800 (China Standard Time)

Thank you for finding and trying Plume!

If you are an administrator of your server, an environment variable SEARCH_CONTENT_TOKENIZER might be useful (See Useful Environment Variables for details). I think ngram tokenizer is useful for Chinese blogs.

You need recreate search index when switching tokenizer:
(Stop Plume process)
% rm -rfv search_index
% SEARCH_CONTENT_TOKENIZER=ngram plm search init
% SEARCH_CONTENT_TOKENIZER=ngram plume
You can try at my instance which uses this setting, by signing up, creating a blog, writing an article in Chinese and search some words in the article.

Thank you so much! I actually have set the search tokenizer to ngram, but I didn't recreate the search index. Now the problem is solved! Thank you!

PS. For docker users, stopping containers can result into a failure, you can just:

Stop nginx. docker-compose down stop plume.
Delete all the stuff in search_index but still keep the empty folder.
Change the SEARCH_TAG_TOKENIZER and SEARCH_CONTENT_TOKENIZER to ngram. docker-compose up -d Restart plume.
Run docker-compose run --rm plume plm search init.
Restart plume and nginx.

I think it will do the job.

KITAITI Makoto · Answer 3 · Tue Dec 22 2020 10:11:42 GMT+0800 (China Standard Time)

Sounds good! And thanks for additional information. I will fix documentation later.

pullopen · Answer 4 · Tue Dec 22 2020 13:31:06 GMT+0800 (China Standard Time)

Sounds good! And thanks for additional information. I will fix documentation later.

I've modified the process a little bit to make sure it will work😉