adelacvg / NS2VC

Unofficial implementation of NaturalSpeech2 for Voice Conversion and Text to Speech

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dataset.py : codes.size(-1)= 48960, sum(duration)=38678 assert error:

lileishitou opened this issue · comments

commented

dataset.py check:

assert abs(codes.size(-1) - sum(duration)) < 3, (codes.size(-1), sum(duration), filename)
assert abs(audio.shape[1]-lmin * self.hop_length) < 3 * self.hop_length

why to check the encode and duration?

The error may be caused by false alignment. Please check the textgrid file that "sp", "spn", "sil" are not empty or "". Duration and spec length should be matched so that model can converge.

commented

I found some textGrid files does not have sil(sil, sp, spn) , but other files have . I used mfa tool and use the token "english_us_arpa english_us_arpa" as model. why the generated TextGrid files different?

I have added a check for empty silent phones. Update to the latest code, and reprocessed the dataset to see if there are any remaining issues. Hope this can help you.

commented

tks, that helps a lot.