Issues from sampling by newest pretrained model

Question

Issues from sampling by newest pretrained model

yizhidamiaomiao opened this issue 2 years ago · comments

Dear authors,

I try to use your pretrained model listed in readme.md
"2022/08/09 We upload trained diffsound model on audiocaps dataset, and the baseline AR model, and the codebook trained on audioset with the size of 512. (https://disk.pku.edu.cn/link/DA2EAC5BBBF43C9CAB37E0872E50A0E4)"

When I try to run the command "python evaluation/generate_samples_batch.py" to sampling some audio, the codes raise an Error:
"RuntimeError: Error(s) in loading state_dict for VQModel:
size mismatch for quantize.embedding.weight: copying a param with shape torch.Size([512, 256]) from checkpoint, the shape in current model is torch.Size([256, 256])
"

I have already tried many revised versions of your 'caps_text.yaml' (change several 256 to 512), but none of them works. Could you please share any ways for me to do the sampling from your newest trained model? Thanks a lot.

Dongchao Yang · Answer 1 · Sat Sep 03 2022 10:17:36 GMT+0800 (China Standard Time)

hi, if you use the codebook trained with 512, you also should use the diffsound model trained with 512. But now, the 512 trained diffsound model is not released, so you can only use codebook with 256. I will upload diffsound trained with 512 as soon as.

Dongchao Yang · Answer 2 · Sat Sep 03 2022 10:17:57 GMT+0800 (China Standard Time)

hi, if you use the codebook trained with 512, you also should use the diffsound model trained with 512. But now, the 512 trained diffsound model is not released, so you can only use codebook with 256. I will upload diffsound trained with 512 as soon as.

hi, if you use the codebook trained with 512, you also should use the diffsound model trained with 512. But now, the 512 trained diffsound model is not released, so you can only use codebook with 256. I will upload diffsound trained with 512 as soon as.