collabora / WhisperSpeech

An Open Source text-to-speech system built by inverting Whisper.

Home Page:https://collabora.github.io/WhisperSpeech/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Outdated RQBottleneckTransformer model

Subuday opened this issue · comments

It looks like architecture of RQBottleneckTransformer has been changed, but model has not been retrained/reuploaded.
So trying to load RQBottleneckTransformer using
vq_model = vq_stoks.RQBottleneckTransformer.load_model(ref="collabora/whisperspeech:whisper-vq-stoks-medium-en+pl.model").cuda() leads to error:

Error(s) in loading state_dict for RQBottleneckTransformer:
	Missing key(s) in state_dict: "rq.project_in.weight", "rq.project_in.bias", "rq.project_out.weight", "rq.project_out.bias". 
	Unexpected key(s) in state_dict: "rq.layers.0.project_in.weight", "rq.layers.0.project_in.bias", "rq.layers.0.project_out.weight", "rq.layers.0.project_out.bias".

Okay, I was using incorrect version of vector_quantize_pytorch.
The correct fine is specified in settings.ini file.

Yeah, that's unfortunate. It would probably make sense to update the checkpoint and use the newest version of vector_quantize_pytorch since AFAIR the math did not change at all, just the layer names.

Maybe we could do it when we start working on new languages @zoq ?