toUpperCase78 / tfidf-python

A simple TF-IDF python class; original work was from http://code.google.com/p/tfidf/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

tfidf-python

A simple yet useful TF-IDF Python class

Details

This TF-IDF implementation performs the following below:

  • Constructs an IDF corpus and stopword list either from documents specified by the user, or by reading from input files
  • Displays the current status of TF-IDF
  • Creates tokens based on the regular expression
  • Computes IDF for a specified term based on the corpus
  • Generates keywords ordered by TF-IDF for a specified document
  • Saves the corpus and non-stopword files to the storage

Original work can be found here: http://code.google.com/p/tfidf/

This repo was forked from https://github.com/gearmonkey/tfidf-python. However, it was left untouched by the repo owner for more than 10 years! Therefore, the main purpose of this fork is to overhaul everything for usability and compatibility of the latest versions of Python (e.g. version 3.10).

In EnglishText directory, you can find 12 long texts written in English to better simulate the functionality of the TF-IDF implementation. All texts include news that correpond to Formula 1, technology and gaming, at a time from 2022.

About

A simple TF-IDF python class; original work was from http://code.google.com/p/tfidf/


Languages

Language:Python 100.0%