Plachtaa / FAcodec

Training code for FAcodec presented in NaturalSpeech3

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

may i ask How did you eliminate the difficulty of requiring phoneme audio alignment through predicting semantic latent?

rainbowjack opened this issue · comments

Can you indicate in which file you implemented this feature?
and , As you wrote in Read Me: \ t<speakeer_id>\ t\ t<script>\ t<phonemixed_transscript>If these parameters cannot be replaced with placeholders, will the presence or absence of these parameters have a performance impact on the final trained model?

This repo will be updated soon with a new training pipeline eliminating the need for phone alignment and speaker labels.
It does not cause any performance degradation