ggerganov / ggwords

Generate language n-gram statistics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ggwords

Generate n-gram statistics by processing the contents of English books/texts.

Usage

git clone https://github.com/ggerganov/ggwords
cd ggwords
mkdir build
cd build
cmake ..
make -j4

./bin/analyze /path/to/metadata/books.txt /path/to/books/text

Sample data

The data in ./data was generated using https://github.com/pgcorpus/gutenberg

About

Generate language n-gram statistics

License:GNU General Public License v3.0


Languages

Language:C++ 63.0%Language:CMake 35.0%Language:Python 1.7%Language:C 0.4%