akutuzov / nlp_lm

Examples of language modeling approaches

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

NLP through language modeling

Examples of language modeling approaches

Python version >= 3.5 is required

Cleaning the corpus

python3 filter.py -c CORPUS_FILE -o CLEANED_CORPUS_FILE

Training models

usage: train_lm.py [-h] --train TRAIN --model {random,freq,trigram,rnn} [--save SAVE]

--train TRAIN, -t TRAIN Path to training file (plain text)

--model {random,freq,trigram,rnn}, -m {random,freq,trigram,rnn}

optional arguments:

-h, --help show this help message and exit

--save SAVE, -s SAVE Save model to...

Example

python3 train_lm.py -t cleaned_corpus.txt.gz -m rnn -s model.h5

Testing models

usage: test_lm.py [-h] --test TEST --model {random,freq,trigram,rnn} --modelfile MODELFILE

--test TEST, -t TEST Path to testing file (plain text)

--model {random,freq,trigram,rnn}, -m {random,freq,trigram,rnn}

--modelfile MODELFILE, -mf MODELFILE File name

optional arguments: -h, --help show this help message and exit

More corpora and a non-lemmatized word embedding model for Russian can be found at:

http://ls.hpc.uio.no/~andreku/lm/

PS. To save RAM in the process of training a trigram model, consider using bounter instead of Counter(). However, it doesn't play well with TensorFlow.

About

Examples of language modeling approaches


Languages

Language:Python 100.0%