yangdongchao / Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Home Page:http://dongchaoyang.top/text-to-sound-synthesis-demo/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Missing libraries "ftfy" "regex" "einops"

dto opened this issue · comments

commented

Thank you for your help. With fresh environment, I get missing libs: "ftfy", "regex", and "einops". After installing these through Conda, I still get the same error about KeyError:
Perhaps my environment (installing via "conda env create" with your config in Codebook) is not really clean? I would greatly appreciate any help you can offer in this matter. Thank you!

(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ python3 evaluation/generate_samples_batch.py
Restored from /apdcephfs/share_1316500/donchaoyang/code3/SpecVQGAN/logs/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt
Traceback (most recent call last):
File "evaluation/generate_samples_batch.py", line 204, in
Diffsound = Diffsound(config=config_path, path=pretrained_model_path, ckpt_vocoder=ckpt_vocoder)
File "evaluation/generate_samples_batch.py", line 44, in init
self.info = self.get_model(ema=True, model_path=path, config_path=config)
File "evaluation/generate_samples_batch.py", line 64, in get_model
model = build_model(config) #加载 dalle model
File "evaluation/../sound_synthesis/modeling/build.py", line 5, in build_model
return instantiate_from_config(config['model'])
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/models/dalle_spec.py", line 40, in init
self.transformer = instantiate_from_config(diffusion_config)
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/transformers/diffusion_transformer.py", line 172, in init
self.condition_emb = instantiate_from_config(condition_emb_config) # 加载能获得condition embedding的模型
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/embeddings/clip_text_embedding.py", line 25, in init
model, _ = clip.load(clip_name, device='cpu',jit=False)
File "evaluation/../sound_synthesis/modeling/modules/clip/clip.py", line 114, in load
model = build_model(state_dict or model.state_dict()).to(device)
File "evaluation/../sound_synthesis/modeling/modules/clip/model.py", line 409, in build_model
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
KeyError: 'visual.layer1.0.conv1.weight'

Please refer to Diffsound readme part to install the package.

commented

Hello. I followed the "pip install" steps listed in your Diffsound readme. It did produce one error during this, "pytorch-lightning 1.7.0 requires tensorboard>=2.9.1, but you have tensorboard 1.15.0 which is incompatible."

I notice your recipe does not specify the version of pytorch-lightning needed?

Aside from this, the error with KeyError is still unchanged :(

commented

I found the following additional error during your pip install readme steps: pytorch-lightning 1.2.10 requires PyYAML!=5.4.*,>=5.1, but you have pyyaml 5.4.1 which is incompatible.
pytorch-lightning 1.2.10 requires torchmetrics==0.2.0, but you have torchmetrics 0.3.1 which is incompatible.
Successfully installed torch-1.9.0 torchvision-0.10.0