dudarev / movie-words

Sort words in a subtitles file based on TFIDF

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sort words in subtitles file based on TFIDF or count.

When sorting based on TFIDF, words that are less common in other movies subtitles and more frequent in specified file come first.

Words frequency in other subtitles is based on

https://github.com/hermitdave/FrequencyWords/

Usage:

pip install -r requirements.txt
make en_full.txt
python words.py [-h] [-s {t,tfidf,c,count}] -i INPUT

About

Sort words in a subtitles file based on TFIDF


Languages

Language:Python 93.5%Language:Makefile 6.5%