yangdongchao / Text-to-sound-Synthesis

The source code of our paper "Diffsound: discrete diffusion model for text-to-sound generation"

Home Page:http://dongchaoyang.top/text-to-sound-synthesis-demo/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

KeyError: 'visual.layer1.0.conv1.weight'

dto opened this issue · comments

commented

(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ python3 evaluation/generate_samples_batch.py
/home/dto/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/cuda/init.py:83: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 10010). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:109.)
return torch._C._cuda_getDeviceCount() > 0
/home/dto/miniconda3/envs/specvqgan/lib/python3.8/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of 'cuda', but CUDA is not available. Disabling')
Restored from /apdcephfs/share_1316500/donchaoyang/code3/SpecVQGAN/logs/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt
Traceback (most recent call last):
File "evaluation/generate_samples_batch.py", line 204, in
Diffsound = Diffsound(config=config_path, path=pretrained_model_path, ckpt_vocoder=ckpt_vocoder)
File "evaluation/generate_samples_batch.py", line 44, in init
self.info = self.get_model(ema=True, model_path=path, config_path=config)
File "evaluation/generate_samples_batch.py", line 64, in get_model
model = build_model(config) #加载 dalle model
File "evaluation/../sound_synthesis/modeling/build.py", line 5, in build_model
return instantiate_from_config(config['model'])
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/models/dalle_spec.py", line 40, in init
self.transformer = instantiate_from_config(diffusion_config)
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/transformers/diffusion_transformer.py", line 172, in init
self.condition_emb = instantiate_from_config(condition_emb_config) # 加载能获得condition embedding的模型
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/embeddings/clip_text_embedding.py", line 25, in init
model, _ = clip.load(clip_name, device='cpu',jit=False)
File "evaluation/../sound_synthesis/modeling/modules/clip/clip.py", line 114, in load
model = build_model(state_dict or model.state_dict()).to(device)
File "evaluation/../sound_synthesis/modeling/modules/clip/model.py", line 409, in build_model
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
KeyError: 'visual.layer1.0.conv1.weight'

It seems like your enviroment is not match.

commented

Ok, thank you. I will try making a fresh environment.

commented

I am still getting this KeyError even with a freshly created environment (according to your instructions with "conda env create -f".) However, I needed to install a package named "einops" from conda-forge first. But I still get the error after running. Can you help?

(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ conda install einops -c conda-forge
Collecting package metadata (current_repodata.json): done
Solving environment: done

Package Plan

environment location: /home/dto/miniconda3/envs/specvqgan

added / updated specs:
- einops

The following NEW packages will be INSTALLED:

einops conda-forge/noarch::einops-0.4.1-pyhd8ed1ab_0

The following packages will be SUPERSEDED by a higher-priority channel:

ca-certificates pkgs/main::ca-certificates-2022.07.19~ --> conda-forge::ca-certificates-2022.6.15-ha878542_0
certifi pkgs/main::certifi-2022.6.15-py38h06a~ --> conda-forge::certifi-2022.6.15-py38h578d9bd_0

Proceed ([y]/n)?

Preparing transaction: done
Verifying transaction: done
Executing transaction: done
(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$ USE_CUDA=0 python3 evaluation/generate_samples_batch.py
Restored from /apdcephfs/share_1316500/donchaoyang/code3/SpecVQGAN/logs/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt
Traceback (most recent call last):
File "evaluation/generate_samples_batch.py", line 204, in
Diffsound = Diffsound(config=config_path, path=pretrained_model_path, ckpt_vocoder=ckpt_vocoder)
File "evaluation/generate_samples_batch.py", line 44, in init
self.info = self.get_model(ema=True, model_path=path, config_path=config)
File "evaluation/generate_samples_batch.py", line 64, in get_model
model = build_model(config) #加载 dalle model
File "evaluation/../sound_synthesis/modeling/build.py", line 5, in build_model
return instantiate_from_config(config['model'])
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/models/dalle_spec.py", line 40, in init
self.transformer = instantiate_from_config(diffusion_config)
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/transformers/diffusion_transformer.py", line 172, in init
self.condition_emb = instantiate_from_config(condition_emb_config) # 加载能获得condition embedding的模型
File "evaluation/../sound_synthesis/utils/misc.py", line 132, in instantiate_from_config
return cls(**config.get("params", dict()))
File "evaluation/../sound_synthesis/modeling/embeddings/clip_text_embedding.py", line 25, in init
model, _ = clip.load(clip_name, device='cpu',jit=False)
File "evaluation/../sound_synthesis/modeling/modules/clip/clip.py", line 114, in load
model = build_model(state_dict or model.state_dict()).to(device)
File "evaluation/../sound_synthesis/modeling/modules/clip/model.py", line 409, in build_model
vision_width = state_dict["visual.layer1.0.conv1.weight"].shape[0]
KeyError: 'visual.layer1.0.conv1.weight'
(specvqgan) dto@thexder:/apdcephfs/share_1316500/donchaoyang/code3/Text-to-sound-Synthesis/Diffsound$