julialwang / docuSearch

a Python program that uses LSH (locality-sensitive hashing) to search and retrieve filenames from a csv file that contains similar words to the user's input.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

docuSearch: A LSH project

This is a Python program that builds LSH (locality-sensitive hash) from scratch to search and retrieve filenames with similar titles as what is inputted. It will give the highest similarity titles first and do its best to provide whichever ones are most identifiable.

After running the program, make sure that the default document or whichever .csv file of titles is desired to be searched is imported properly (all titles must be separated by /n newline). Then, type in a keyword, phrase, or complete title to browse similar titles within the .csv file.

The repository also contains a non-randomized brute-force method that can be timed for comparison to optimized algorithm, as well as integrated matplotlib programs to generate timed functions.

About

a Python program that uses LSH (locality-sensitive hashing) to search and retrieve filenames from a csv file that contains similar words to the user's input.


Languages

Language:Python 100.0%