1e0ng / simhash

A Python Implementation of Simhash Algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

分词

huangheLee opened this issue · comments

您好, 请问一下您是如何不同语言的分词的?
Hello, How do you separate phrases with different language?

commented

It depends on each language. Do you mean multiple languages in one sentence?

我知道的 分词依赖于具体语言,但是在您的程序中我没有发现处理具体语言的部分或者我没有调整任何参数, 它就可以处理英语 汉语 越南语等 我的疑问在这里
I know it depends on each language to separate phrases. I can't find anything about this in your program or I did not change any args, but it works fine with English, Chinese and Vietnamese. That's my Question.

---- I have poor English This is translation in baidu below ---

I know that word segmentation depends on the specific language, but in your program I did not find the part dealing with the specific language or I did not adjust any parameters, it can deal with English, Chinese, Vietnamese and other questions here.

commented

This project is independent with word segmentation. You can choose any algorithm you want for word segmentation. After a sentence has been segmented into tokens, you can pass them into Simhash function.

Oh, I get it. Thanks for your explanation!

commented

Welcome. I'll close this issue. 😄