teloon / signDDD

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

signDDD

signDDD is a sentence-level near Duplicate Document Detection project, which is based on the idea of signature file.

By optimizing the hashcodes of the words in the vocabulary, the method will achieve better precision. Refer to Learning hash codes for efficient content reuse detection.

Besides, this project implements both GPU-based and CPU-based hamming distance computing.

About


Languages

Language:Python 52.8%Language:C++ 34.9%Language:Shell 12.3%