averykhoo / minhash

minhash + fibonacci hashing -> deterministic minhash

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

fibonacci-minhash

minhash + fibonacci hashing -> deterministic minhash

notes

  • this isn't the usual algorithm that does banding and shuffling of bits/bytes (eg in the stanford lecture on jaccard similarity)
  • hashes cal always be calculated for unseen strings with unseen ngrams
  • using a hash to make the algorithm fully deterministic has not been implemented

About

minhash + fibonacci hashing -> deterministic minhash


Languages

Language:Python 100.0%