There are 5 repositories under tf-idf topic.
Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Machine learning movie recommending system
Selected Machine Learning algorithms for natural language processing and semantic analysis in Golang
Machine Learning Lectures at the European Space Agency (ESA) in 2018
Python文本挖掘系统 Research of Text Mining System
An extremely simple Python library to perform TF-IDF document comparison.
several methods for text classification
This repository consists of a complete guide on natural language processing (NLP) in Python where we'll learn various techniques for implementing NLP including parsing & text processing and understand how to use NLP for text feature engineering.
IResearch is a cross-platform, high-performance search analytics library written entirely in C++ with the focus on a pluggability of different ranking/similarity models
Implementation with some extensions of the paper "Snowball: Extracting Relations from Large Plain-Text Collections" (Agichtein and Gravano, 2000)
Stringlifier is on Opensource ML Library for detecting random strings in raw text. It can be used in sanitising logs, detecting accidentally exposed credentials and as a pre-processing step in unsupervised ML-based analysis of application text data.
Arabic Open Domain Question Answering System using Neural Reading Comprehension
Keyword extraction based on TF-IDF on specific corpus. 基于特定语料库的TF-IDF的中文关键词提取
中文文本分类实践,基于搜狗新闻语料库,采用传统机器学习方法以及预训练模型等方法
商品类目预测,使用 Spring Boot 开发框架和 Spark MLlib 机器学习框架,通过 TF-IDF 和 Bayes 算法,训练出一个商品类目预测模型。该模型可以根据商品名称自动预测出商品类目。项目对外提供 RESTFul 接口。
Fast, efficient, in-memory Full Text Search for Kotlin
Term frequency–inverse document frequency for Chinese novel/documents implemented in python.
Implementation of algorithm in keyword extraction,including TextRank,TF-IDF and the combination of both
A structured collection of notes (mostly, on machine learning) and a Flask app for reading and searching them.
Document similarity algorithms experiment - Jaccard, TF-IDF, Doc2vec, USE, and BERT.
Emacs package that helps org-mode users (re)discover similar documents
A practical guide to topic mining and interactive visualizations
It is a content based recommender system that uses tf-idf and cosine similarity for N Most SImilar Items from a dataset