may i ask How did you eliminate the difficulty of requiring phoneme audio alignment through predicting semantic latent?

Question

may i ask How did you eliminate the difficulty of requiring phoneme audio alignment through predicting semantic latent?

rainbowjack opened this issue 5 months ago · comments

Can you indicate in which file you implemented this feature?
and , As you wrote in Read Me: \ t<speakeer_id>\ t\ t<script>\ t<phonemixed_transscript>If these parameters cannot be replaced with placeholders, will the presence or absence of these parameters have a performance impact on the final trained model?

Songting · Answer 1 · Thu Jun 20 2024 15:49:08 GMT+0800 (China Standard Time)

This repo will be updated soon with a new training pipeline eliminating the need for phone alignment and speaker labels.
It does not cause any performance degradation