Preprocessing scripts for TED dataset
seokhwankim opened this issue · comments
Could you possibly add the preprocessing scripts for TED dataset?
It would help to reproduce the results on your Interspeech paper.
i want the preprocessing scripts too, because i trained a model on training data divided by myself, and got a worse result than author's.
The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view
I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta
The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view
I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta
thank you very much