ottokart / punctuator2

A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text

Home Page:http://bark.phon.ioc.ee/punctuator

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Preprocessing scripts for TED dataset

seokhwankim opened this issue · comments

Could you possibly add the preprocessing scripts for TED dataset?
It would help to reproduce the results on your Interspeech paper.

commented

i want the preprocessing scripts too, because i trained a model on training data divided by myself, and got a worse result than author's.

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view
I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

commented

The TED dataset was preprocessed by the authors of http://www.lrec-conf.org/proceedings/lrec2016/pdf/103_Paper.pdf and the resulting dataset is shared at: https://drive.google.com/file/d/0B13Cc1a7ebTuMElFWGlYcUlVZ0k/view
I used this simple script to convert the format of the files: https://drive.google.com/open?id=1sW23C4kqRJ6rDSBurco8_0lJ3VZJIkta

thank you very much