preprocessing the training data

Question

preprocessing the training data

marymirzaei opened this issue 6 years ago · comments

Thank you very much for your nice work.
I have a problem with preprocessing the training data. The transcript file for Blizzard2013 segmented data is a file named prompts.gui which can be found here:
https://www.dropbox.com/s/6ugwnbqgwlfvxvl/prompts.gui?dl=0
I was wondering how the metdata.train file should look like. It seems that I need to clean up the attached file to be used for training and match the criteria. Is it possible to upload your cleaned up 'metadata-train' file, the converter of prompt.gui to metadata-train, or the desired format of the metadata.train file?

Shan Yang · Answer 1 · Wed May 09 2018 14:22:43 GMT+0800 (China Standard Time)

Hi, I just simply extract the text from the prompts.gui, ignoring other information like prosody.

You can get the file format from the attachment.
metadata.zip

CruelPaw · Answer 2 · Wed Mar 18 2020 19:36:24 GMT+0800 (China Standard Time)

Hi, I just simply extract the text from the prompts.gui, ignoring other information like prosody.

You can get the file format from the attachment.
metadata.zip

Do you know what the other information is? I can't understand what the 3rd line in prompt.gui mean. Following is an example

CA-BB-01-01
Black Beauty @ : # the Autobiography @ of a Horse . #
B L 62iHfN KcF _ B y13iHfW ^ T Y2iLfN @ : || _ DH Y2iLfN cYa _ 33iHfN ^ T N42iLfN ^ B 6y2iLfN cY ^ 42iHfN ^ GcS R N41iLfN ^ F Y1iLfN cYa @ _ N41iLfN VcD _ N41iLfN _ H 32iHfW R ScT . ||