CNN\DM dateset preprocess (bpe 30K)
Wangt-CN opened this issue · comments
Wang Tan commented
Hi, thanks a lot for the great work.
I am new in nlp and I meet some problems in preprocess the CNN DM dataset (get BPE file for train and val).
Could you please kindly provide the shell scripts of cnndm dataset preprocessing (bpe) which matches with the test set you provided in google drive?
Thanks a lot. Very appreciate.
Zhanghao Wu commented
Hi Wang,
Thank you for asking! In our experiments, we downloaded the CNN/DM dataset using tensorflow/tensor2tensor and preprocessed it with fairseq-preprocess. We just uploaded the preprocessed dataset to google drive. Please feel free to have a try on them.