There are 1 repository under jaccard-similarity topic.
MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW
Compare html similarity using structural and style metrics
A package to compute medical segmentation metrics.
Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.
A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
A Clojure library for querying large data-sets on similarity
Spark functions to run popular phonetic and string matching algorithms
SetSketch: Filling the Gap between MinHash and HyperLogLog
Calculate various string metrics efficiently in Haskell
ProbMinHash – A Class of Locality-Sensitive Hash Algorithms for the (Probability) Jaccard Similarity
Aim is to come up with a job recommender system, which takes the skills from LinkedIn and jobs from Indeed and throws the best jobs available for you according to your skills.
BagMinHash - Minwise Hashing Algorithm for Weighted Sets
Minhash and maxhash library in Python, combining flexibility, expressivity, and performance.
This is an implementation of the paper written by Yuhua Li, David McLean, Zuhair A. Bandar, James D. O’Shea, and Keeley Crockett
Easy-to-use Java library for similarity checking of strings or numeric-series
A text similarity computation using minhashing and Jaccard distance on reuters dataset
Locality Sensitive Hashing for semantic similarity (Python 3.x)
insight data engineering fellow project
Text Matching Based on LCQMC: A Large-scale Chinese Question Matching Corpus
Exploring Jaccard and Cosine similarities performances then visualising their output using k means and kmeans with pca. Additional input on time series analysis, web scrapping and twitter scrapping.
TreeMinHash: Fast Sketching for Weighted Jaccard Similarity Estimation
A collection of string comparisons algorithms
The evaluation of subjective answers has long been a challenge for educators, employers, and researchers. CheckMyAnswer, powered by machine learning algorithms, has emerged as a solution to this challenge.
A graph mining problem where the task was to predict a link between the given nodes. Engineered different features like Jaccard Distance, Cosine-Similarity, Shortest Path, Page Rank, Adar Index, HITS score and Kartz Centrality. Finally built non-linear models to get the final F1 score as 0.92.
Simple library for finding duplicate and near-duplicate text documents in massive sets/libraries/databases
Project 1: 🎬🍿 Movie-Recommendation-System, Project 2: 📰🔍Fake News Detection System
Clustering similar tweets using K-means clustering algorithm and Jaccard distance metric
Find similar nodes in graph using jaccard similarity. Use this to recommend similar dishes and restaurants
Hybrid RecSys, CF-based RecSys, Model-based RecSys, Content-based RecSys, Finding similar items using Jaccard similarity
Script which creates clusters using K-Means Clustering Algorithm with different similarity metrics.
Using Jaccard-Similarity and Minhashing to determine similarity between two text documents
PPJoin and P4Join Python 3 implementation
A notebook for movie and TV show recommendations using Boolean and TF-IDF methods. Get personalized suggestions based on text descriptions and choose the method that suits your preferences.