There are 0 repository under minhash-lsh-algorithm topic.
End-to-end earthquake detection pipeline via efficient time series similarity search
A Clojure library for querying large data-sets on similarity
SetSketch: Filling the Gap between MinHash and HyperLogLog
A simple audio fingerprinting system
There are Python 2.7 codes and learning notes for Spark 2.1.1
A text similarity computation using minhashing and Jaccard distance on reuters dataset
insight data engineering fellow project
MinHash and LSH index written in Rust for Node.js
An improved method of locality-sensitive hashing for scalable instance matching. In this study, we propose a scalable approach for automatically identifying similar candidate instance pairs in very large datasets utilizing minhash-lsh-algorithm in C#.
Minhash clustering of text documents
Project 1: Similar document searching via MinHash and Locality Sensitive Hashing
An easy-to-use script for fast similarity search in the textual data (and embedding space) with GPU & Multi-core support.
:page_with_curl:Document similarity detection using hashing
Recommendation systems for Yelp (collaborative filtering & content-based)
Implementation of a B+ Tree for range and exact match queries and of the LSH algorithm for finding similar documents as measured by Jaccard Similarity.
Scalable Data Mining - Assignment submissions
A set of methods and model evaluation metrics for predicting links in an academic citation network using Apache Spark and Scala
Homeworks for Advanced Data Mining and Language Technology (DMT) at La Sapienza University of Rome
Finding Similar Pairs using PySpark
documents my master's level thesis work on building continous, topical web crawler based on mercator 1999
Word/Image/Audio Embedding models, Tokenizer models, Ngram language models, MatrixModels, Corpus building, Vocabulary Building, Language modelling
Detecting correlated columns in DBMS systems using techniques like Pearson Correlation, LSH Minhashing and Random Sampling.
First homework for the Advance Data Mining course
LSH from zero 🦾 native Map-Reduce in PySpark 🚀
similarity of the texts (Jaccard Similarity, Minhash, LSH)
Implementing Locality Sensitive Hashing for DNA Sequences.
Textual data manipulation projects with applications of advanced data mining techniques: recommendation systems, information retrieval systems, search engines, latent sentiment analysis, pagerank, PCA.
SpellChecker: an application to check for spell errors.
Homework_4 for Algorithmic Methods for Data Mining (ADM), MSc in Data Science at La Sapienza University of Rome