Different with paper
LSimon95 opened this issue · comments
Simon commented
Implementation is different with paper in some modules for losing details in the early paper. And I will modify the code following the new information and train on a larger dataset. Some differences are shown below.
- Mel-encoder in MRTE is conv stack like prosody-encoder in the paper, but attention modules in my code.
- Different structure of conv stack in prosody-encoder with lack of max pooling layer, but I think it has no significant impact on the result of training on the small and high-quality dataset.
Liujingxiu23 commented
@LSimon95 Do you know the value of n_heads in MHA in MRTE module used by Bytedance? I read the paper, but did not find related declare. I see you use 2?
Simon commented
@Liujingxiu23 No. I can't find the exact value. Same as the content encoder for convenience.