khanrc / pt.seq2seq

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Seq2seq

PyTorch implementations of seq2seq models for Neural Machine Translation (NMT) task:

  • seq2seq (RNN)
  • seq2seq with attention (RNN + attention)
  • ConvS2S
  • Transformer
  • DynamicConv (+ LightConv)

No-torchtext version

Please refer to no-torchtext tag. In this version, the dataset.py, lang.py and data_prepare.py structuralize low-level text to make it easier to use in the training code.

Supporting datasets

Supporting datasets include pytorch tutorial ENG to FRA translation dataset and torchtext NMT datasets.

Dependencies

  • python3
  • pyyaml
  • pytorch >= 1.10
  • tensorboard >= 1.14
  • torchtext
  • spacy
    • python -m spacy download en
    • python -m spacy download de

Results

Hparams:

  • Task & data: ENG to FRA translation task, max_len=14, min_freq=2.

Models:

Model Loss (sum) PPL BLEU* Note
Seq2Seq 15.11 6.320
Seq2Seq + KV attn 13.57 5.244 64.10
Seq2Seq + Additive attn 13.28 5.054 64.48
Seq2Seq + Multiplicative attn 14.01 5.526
ConvS2S 13.06 4.931 61.62
ConvS2S + out-caching 12.44 4.572 60.90
Transformer-init 12.73 4.675 66.38
LightConv 12.29 4.493 K=[3,3,5,5,7,7]
DynamicConv 11.81 4.237 68.35 K=[3,3,5,5,7,7]
  • [!] BLEU is recorded in different run
  • PPL and BLEU does not match
  • about the Transformer
    • after-norm does not work; should use before-norm.
    • LR warmup and xavier init is important for the performance

ToDo

  • Beam search
  • Word tokenization
    • BPE
    • Word piece model

References

About


Languages

Language:Python 100.0%