There are 18 repositories under document-similarity topic.
Compute Sentence Embeddings Fast!
Web Application for checking the similarity between query and document using the concept of Cosine Similarity.
Document similarity algorithms experiment - Jaccard, TF-IDF, Doc2vec, USE, and BERT.
Document Search Engine Tool
A Clojure library for querying large data-sets on similarity
Document Search Engine project with TF-IDF abd Google universal sentence encoder model
Repo for Aspire - A scientific document similarity model based on matching fine-grained aspects of scientific papers.
Contains interesting projects like Cat face detection, cat face recognition, code generation, Building chatbot, finding similar documents, image segmentation, UCI credit card, anomaly detection, MNIST etc.
Compilation of Natural Language Processing (NLP) codes. BONUS: Link to Information Retrieval (IR) codes compilation. (checkout the readme)
A simple Django-based resume ranker website where recruiters post their jobs and candidates applies for their desired vacancies. The system gets the document similarity between the job description and the candidate resumes, generates similarity scores using the KNN model, and rank or shortlist the candidate resumes.
A tool which can find your any document using semantic search
Document Similarity with Apache Spark using Locality Sesitive Hashing and Python
Using Jaccard-Similarity and Minhashing to determine similarity between two text documents
Survey data and Python code for the ICADL 2021 paper "A Qualitative Evaluation of User Preference for Link-based vs. Text-based Recommendations of Wikipedia Articles"
Rust-based text search engine from scratch supporting multiple document similarity metrics (TF-IDF, BM25, BM25VA)
Compare sentences from input document with all sentences from reference documents - find very similar ones.
Document searching from queries using Inverted index
A Two-ended Hiring web application built using flask. The application uses document similarity techniques for recommendation.
Aims to provide job searching strategy for new graduates who are interested in data-related positions.
The Bitnation Jurisdiction Public Notary DApp
Simple document similarity module implemented in NodeJS
DocxMatch is a Streamlit app that analyzes the similarity between Word files.
A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford
Code to train a LSI model using Pubmed OA medical documents and to use pre-trained Pubmed models on your own corpus for document similarity.
A comprehensive toolkit for analyzing text data using various AI and NLP techniques, including topic modeling, sentiment analysis, and text classification, demonstrated on the 20 Newsgroups dataset.
Natural language processing examples and automations
Individual group project in Python
This repository will demonstrate how to explore spiritual world using NLP techniques like, sentiment analysis, topic modeling, information retrieval and text summarization.
Classifying products into categories using NLP techniques