iceiony / word_count

count most common words and occuring sentences

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Instructions

Code was tested on R v3.2.2

  1. To install needed dependencies run
Rscript install_dependencies.R

If install fails for the rJava package, reconfigure the R java binding

sudo R CMD javareconf
  1. Place input documents in ./in/

  2. Output generated in ./out/ after running

Rscript count_words.R

Two csv files are produced :

  • sentences.csv -> contains sentence id's and unique sentences from documents
  • top_words.csv -> contains the words, counts, list of documents, and list of seantence id's

Notes

The openNLP package uses a java implementation. If case of any Java errors when running the script, reconfigure R's java binding with the above command.

About

count most common words and occuring sentences


Languages

Language:R 100.0%