nytud / emLam

Preprocessing scripts for Hungarian Language Modeling

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

emLam

Preprocessing and modeling scripts for Hungarian Language Modeling

Installation

The package can be installed with either of

pip install .
python setup.py install

(though the former is preferred over the latter). These commands install all packages required by the preprocessing scripts. In order to use the RNN models, tensorflow and numpy must be installed separately:

# For nVidia GPUs -- strongly recommended
pip install -r requirements_gpu.txt
# In every other case
pip install -r requirements.txt

Further resources

The emLam corpus, a specially prepared version of the Hungarian Webcorpus, is available from http://hlt.bme.hu/en/resources/emLam.

If you use the repository or the corpus in your project, please cite the following paper (bib and paper here):

Dávid Márk Nemeskey 2017. emLam – a Hungarian Language Modeling baseline. In Proceedings of the 13th Conference on Hungarian Computational Linguistics (MSZNY 2017).

About

Preprocessing scripts for Hungarian Language Modeling

License:MIT License


Languages

Language:Python 100.0%