emalgorithm / char-rnn_vs_word-rnn

A comparison of char-rnn with word-rnn for language modelling and text generation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

A Comparison between Word-level and Char-level Recurrent Neural Networks for Language Modeling

Language modelling is the task of finding the probability of a sequence of words in a given language. With the rise of deep learning, neural language models which obtain state-of-the-art accuracy in language modelling have been developed. However, it is still an unexplored question whether to model a language at the character level or at the word level. We show that word-based Recurrent Neural Networks models outperform char-based ones when modelling English corpora of various sizes, while also being faster to train. However, we also show that char-level Recurrent Neural Networks are better at modeling a morphologically complex language like Finnish. Finally, we look at text generated by both our models, and argue that neural language models are not able yet to generate meaningful language.

The full report can be found at https://docdro.id/6QRxmPJ.

About

A comparison of char-rnn with word-rnn for language modelling and text generation.


Languages

Language:Jupyter Notebook 69.4%Language:Python 30.6%