ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Facing dimension mismatch

nayanjha16 opened this issue · comments

I am facing a dimension mismatch issue in pitch embedding addition to the encoder output . When I am trying to train the Fastspeech2 model on Hindi data. The Screenshot of the issue is attached as under.

Note :- I have made the necessary changes in the script to adapt it for Hindi dataset
Error_screen

The same code runs for LJSpeech dataset but fails for Hindi Dataset.
Kindly help me resolve the issue !

@nayanjha16 were you able to get it running? I am using the Indic TTS hindi dataset and facing the same issue.

@hadarishav I was able to fix this issue by fixing the alignments and running the preprocessing step yet again . It works for me when the group size is set to 1.

@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?

@nayanjha16 how did you fix the alignments? by group size you mean batch size for training?

@hadarishav My apologies for replying late, there are two parameters one is the batch size and the other being the group size . I had set the group_size to 1.

Attaching the code snippet for your reference from main.py

batch_size = train_config["optimizer"]["batch_size"]
**group_size = 1  # Set this larger than 1 to enable sorting in Dataset**
assert batch_size * group_size < len(dataset)
loader = DataLoader(
    dataset,
    batch_size=batch_size * group_size,
    shuffle=True,
    collate_fn=dataset.collate_fn,