Plume-org / Plume

Federated blogging application, thanks to ActivityPub (now on https://git.joinplu.me/ — this is just a mirror)

Home Page:https://joinplu.me

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support search splitting on characters

pullopen opened this issue · comments

Is your feature request related to a problem? Please describe.

Right now, when I search in Chinese, I have to enter the whole sentence or phrase since Chinese words are not splitted on whitespace. Usually we use characters combination to search.

Describe the solution you'd like

I hope Plume will support a mode to search by combination of characters (strictly ordered) rather than splitted on whitespace. It would be very helpful for Chinese users as well as many other languages which do not use whitespace.

Additional context
version: 0.6.1dev

Thank you for finding and trying Plume!

If you are an administrator of your server, an environment variable SEARCH_CONTENT_TOKENIZER might be useful (See Useful Environment Variables for details). I think ngram tokenizer is useful for Chinese blogs.

You need recreate search index when switching tokenizer:

(Stop Plume process)
% rm -rfv search_index
% SEARCH_CONTENT_TOKENIZER=ngram plm search init
% SEARCH_CONTENT_TOKENIZER=ngram plume

You can try at my instance which uses this setting, by signing up, creating a blog, writing an article in Chinese and search some words in the article.

Thank you for finding and trying Plume!

If you are an administrator of your server, an environment variable SEARCH_CONTENT_TOKENIZER might be useful (See Useful Environment Variables for details). I think ngram tokenizer is useful for Chinese blogs.

You need recreate search index when switching tokenizer:

(Stop Plume process)
% rm -rfv search_index
% SEARCH_CONTENT_TOKENIZER=ngram plm search init
% SEARCH_CONTENT_TOKENIZER=ngram plume

You can try at my instance which uses this setting, by signing up, creating a blog, writing an article in Chinese and search some words in the article.

Thank you so much! I actually have set the search tokenizer to ngram, but I didn't recreate the search index. Now the problem is solved! Thank you!

PS. For docker users, stopping containers can result into a failure, you can just:

  1. Stop nginx. docker-compose down stop plume.
  2. Delete all the stuff in search_index but still keep the empty folder.
  3. Change the SEARCH_TAG_TOKENIZER and SEARCH_CONTENT_TOKENIZER to ngram. docker-compose up -d Restart plume.
  4. Run docker-compose run --rm plume plm search init.
  5. Restart plume and nginx.

I think it will do the job.

Sounds good! And thanks for additional information. I will fix documentation later.

Sounds good! And thanks for additional information. I will fix documentation later.

I've modified the process a little bit to make sure it will work😉

Thanks!