kanishka-linux / reminiscence

Self-Hosted Bookmark And Archive Manager

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support keywords extraction for other current languages

stephane-martin opened this issue · comments

Hello,

currently it seems that in the keywords extraction process, stop words are hard coded to be for English language. Thus, when archiving content in some other language, the selected keywords are very often stop words in that language (I mainly archive content in French...)

Maybe the list of stop words could be selected dynamically, based on automatic language detection ? (see https://github.com/Mimino666/langdetect for example)

Thanks for great product :)

commented

Yes, currently only english language is supported. I'll try to look into supporting other languages as well.