fjcero / python-text-classification

N-gram text classification for large corpus

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python-text-classification

N-gram text classification for large corpus

Included

  • Process large files
  • Test cases
  • Docker support

Running locally

Local Python version (3.12)

pyenv local
pyenv activate

Executing script

python ngrams.py some-large-text.txt

# OR

cat some-large-text.txt | python ngrams.py

Using with Docker

docker build . -t ngrams
docker run -i --rm ngrams < ./texts/mobydick.txt
cat ./texts/mobydick.txt | docker run -i --rm ngrams

Running Tests

python tests.py

About

N-gram text classification for large corpus


Languages

Language:Python 98.0%Language:Dockerfile 2.0%