Wind-Gone / MinHash-DocSimilarity

Use Min-Hash to Compare Different Docs Similarity

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MinHash-DocSimilarity

Using the Min-Hash algorithm to compare different docs’ similarities.

We skip the step of splitting words.

It’s a simple and crude code implementation in Python in O(N^3) complexity.

You may find many redundant data structures (forgive it, it’s just derived from a tiny homework), but the whole process follows the origin theory clearly.

About

Use Min-Hash to Compare Different Docs Similarity

License:Apache License 2.0


Languages

Language:Python 100.0%