Facing dimension mismatch
nayanjha16 opened this issue · comments
I am facing a dimension mismatch issue in pitch embedding addition to the encoder output . When I am trying to train the Fastspeech2 model on Hindi data. The Screenshot of the issue is attached as under.
Note :- I have made the necessary changes in the script to adapt it for Hindi dataset
The same code runs for LJSpeech dataset but fails for Hindi Dataset.
Kindly help me resolve the issue !
@nayanjha16 were you able to get it running? I am using the Indic TTS hindi dataset and facing the same issue.
@hadarishav I was able to fix this issue by fixing the alignments and running the preprocessing step yet again . It works for me when the group size is set to 1.
@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?
@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?
@hadarishav My apologies for replying late, there are two parameters one is the batch size and the other being the group size . I had set the group_size to 1.
Attaching the code snippet for your reference from main.py
batch_size = train_config["optimizer"]["batch_size"]
**group_size = 1 # Set this larger than 1 to enable sorting in Dataset**
assert batch_size * group_size < len(dataset)
loader = DataLoader(
dataset,
batch_size=batch_size * group_size,
shuffle=True,
collate_fn=dataset.collate_fn,