ruyimarone / char2rhyme

πŸ‡¦πŸ‡§πŸ‡¨ ➑️ πŸ’¬ Experiments with NLP + Rhymes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

char2rhyme

I wanted to experiment with NLP and rhyming text, and I wanted to implement a sequence to sequence model from scratch.

What is it now?

This is a simple sequence to sequence model that translates a word expressed as a sequence of english characters to a sequence of ARPABET pronunciation characters. Since I do this at the character level, the system is (hopefully) capable of generalizing to unseen or entirely new words. The model is implemented with PyTorch and uses The CMU Pronouncing Dictionary as a data source.

What will it be?

  • A testbed for various ideas and models involving pronunciation and rhymes
  • A place to practice implementing various sequence tasks from scratch in PyTorch
  • A general rhyme aware character encoder (think word2vec but for rhyming words)
  • A model to generate rhymes or puns (use the encoder in a language model?)

Model Details

This is a standard sequence to sequence model, currently without attention. The Encoder is a BiLSTM operating over character embeddings. The final hidden state is passed to a decoder that predicts sequences of ARAPABET characters/outputs. I don't use teacher forcing, and batches are constructed so that all inputs and outputs are the same length. This works surprisingly well since we just operater over somewhat short character sequences with a high correspondence between input and ouput length.

Immediate TODOs

  • Implement attention and make cool charts showing which characters correspond to which syllables
  • Get some examples with ARPABET <-> IPA translations (Is it gΙͺf or dΚ’Ιͺf???)
  • Add real logging, command line args, evaluation and do some hyperparameter tuning.
  • Check sanity/correctness: Is my perplexity measurement right? Improve mini-batching?

About

πŸ‡¦πŸ‡§πŸ‡¨ ➑️ πŸ’¬ Experiments with NLP + Rhymes


Languages

Language:Python 100.0%