1e0ng / simhash

A Python Implementation of Simhash Algorithm

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seems not work on Chinese character?

MrRace opened this issue · comments

text1 = '**'
text2 = '**人'
words1 = list(words1)
words2 = list(words2)
print(Simhash(words1).distance(Simhash(words2))) 

the result is 14. It seems not work on Chinese ?

commented

The simhash algorithm is more effective for articles.