liquidsunset / similarity_search

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

similarity_search

make, then run:

similarity_search <#lines(sets) to find common integers(tokens,words)> <jaccard-threshold(0..1)>

demo with threshold 0.9 (almost ident sets/lines)

./similarity_search dblp_first500.txt 50 0.9

demo for original implementation

./set_sim_join --timings --statistics --whitespace '/home/liquid/similarity_search/enron.format' allpairs 0.9

About


Languages

Language:C++ 62.3%Language:Java 29.9%Language:Shell 6.7%Language:CMake 1.0%