mit-han-lab / lite-transformer

[ICLR 2020] Lite Transformer with Long-Short Range Attention

Home Page:https://arxiv.org/abs/2004.11886

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CNN\DM dateset preprocess (bpe 30K)

Wangt-CN opened this issue · comments

Hi, thanks a lot for the great work.
I am new in nlp and I meet some problems in preprocess the CNN DM dataset (get BPE file for train and val).
Could you please kindly provide the shell scripts of cnndm dataset preprocessing (bpe) which matches with the test set you provided in google drive?

Thanks a lot. Very appreciate.

Hi Wang,
Thank you for asking! In our experiments, we downloaded the CNN/DM dataset using tensorflow/tensor2tensor and preprocessed it with fairseq-preprocess. We just uploaded the preprocessed dataset to google drive. Please feel free to have a try on them.