Facing dimension mismatch

Question

Facing dimension mismatch

nayanjha16 opened this issue 2 years ago · comments

I am facing a dimension mismatch issue in pitch embedding addition to the encoder output . When I am trying to train the Fastspeech2 model on Hindi data. The Screenshot of the issue is attached as under.

Note :- I have made the necessary changes in the script to adapt it for Hindi dataset

The same code runs for LJSpeech dataset but fails for Hindi Dataset.
Kindly help me resolve the issue !

hadarishav · Answer 1 · Tue Dec 20 2022 16:39:57 GMT+0800 (China Standard Time)

@nayanjha16 were you able to get it running? I am using the Indic TTS hindi dataset and facing the same issue.

Nayan Anand · Answer 2 · Fri Jan 06 2023 13:26:20 GMT+0800 (China Standard Time)

@hadarishav I was able to fix this issue by fixing the alignments and running the preprocessing step yet again . It works for me when the group size is set to 1.

hadarishav · Answer 3 · Thu Jan 12 2023 22:21:54 GMT+0800 (China Standard Time)

@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?

Nayan Anand · Answer 4 · Thu Feb 23 2023 18:14:14 GMT+0800 (China Standard Time)

@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?

@hadarishav My apologies for replying late, there are two parameters one is the batch size and the other being the group size . I had set the group_size to 1.

Attaching the code snippet for your reference from main.py

batch_size = train_config["optimizer"]["batch_size"]
**group_size = 1  # Set this larger than 1 to enable sorting in Dataset**
assert batch_size * group_size < len(dataset)
loader = DataLoader(
    dataset,
    batch_size=batch_size * group_size,
    shuffle=True,
    collate_fn=dataset.collate_fn,