bysiber / text_similarity_tfidf

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

text_similarity_tfidf

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

Project Objectives

The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm. This allows for the calculation of a similarity score between text documents and enables comparisons.

Used Algorithm

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. This algorithm calculates the ratio between the frequency of each term in a document and the number of documents in the collection that contain that term. This provides a similarity score between the documents.

Requirements

  • Python 3
  • sklearn

Usage

To measure TF-IDF similarity, follow the steps below:

  1. Run the main.py file.
  2. Add the file names of the text documents to be compared to the text_files list.
  3. Run the program to display the similarity results on the screen.
  4. Ensure that you have the necessary dependencies installed before running the program. You can install the dependencies by running the following command:

Sample Outputs

Below are examples of the project's outputs: Similarity between test1.txt and test2.txt is -> 0.432891

About

The project utilizes the TF-IDF (Term Frequency-Inverse Document Frequency) algorithm. The main objective of this project is to measure the similarity between text documents using the TF-IDF algorithm.

License:MIT License


Languages

Language:Python 100.0%