jiangnanboy / text-de-duplication

text de-duplication 文本去重

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text de-duplication

利用tf-idf,simhash进行文本去重 A program about text keywords calculation by statistics and text similarity de-duplication by simhash.

About

text de-duplication 文本去重