deep-learning nlp phoneme-prediction rnn text-generation

Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation

We explore the performance of a phoneme-based text generation model. Character based models have a limited amount of potential inputs and as such require high computation costs to model long term dependencies. Word-based models are accurate and require less computational costs, but in contrast to character-based, have an overwhelming input size with tens of thousands possible unique words. A phoneme-based attempts to bridge this gap by offering a greater amount of unique inputs as compared to the character-based but substantially less than a word-based model. We evaluate the performance of this phoneme-based model against a character and word based using BLEU, ROUGE, and human based metrics.

Final project for LIGN 167 Deep Learning for Natural Language Understanding, UCSD.

About

Codebase for Arpagen: A Corpus and Baseline for Phoneme-Level Text Generation.

deep-learning nlp phoneme-prediction rnn text-generation

MIT License

Languages

Language:Python 56.4%Language:Jupyter Notebook 43.6%