deekshithmarla / doc-count

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This repository consists of a set of python scripts which count the number of occurences of words in a given directory consisting of *.txt files (the other files are ignored). The word count is calculated for each unique word occuring in all the documents considered together. 

RUNNING THE SCRIPTS
-------------------
the gen_doc_class_input.py is the main script run it as follows : 

$python gen_doc_class_input.py <path to a directory with *.txt files>

if you want the final word count vectors to be written to a file , use the program as follows :

$python gen_doc_class_input.py <path to a directory with *.txt files> -f <output_file_name>

note : the first argument should always be the path, and -f should always be followed by the file name to write to.

About


Languages

Language:Python 100.0%