chavinlo / musicgen_trainer

simple trainer for musicgen/audiocraft

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

codes concatenation

Beinabih opened this issue · comments

Hi,

Thank you for your code. It has been very helpful to me in writing my own trainer.

I think the line
codes = torch.cat([audio, audio], dim=0). (here)
should be
codes = torch.cat([codes, codes], dim=0)
otherwise you wont use your encoded codebooks, right? :)

kind regards,
Jonas

commented

um... no...

The codebook is obtained via the preprocess_audio that takes both the model and audio waveform as inputs. It then encodes them with encodec (here called compression_model) and then returns them.

Here is where we call that function with model and audio waveform (tensor) as arguments and we get the codebook in return

audio = preprocess_audio(audio, model) #returns tensor

Note that the codebook here is just one batch, we then concate it to match the shape of the condition_tensors.

commented

Feel free to correct me if I'm wrong, because the trainer does not work for anything outside overfitting at the moment.

Ah sry for the confusion, your totally right...

I moved the model.compression_model.encode(wav) out of the
process_audio function since i am using the Audiocraft Dataloader. I think i hallucinated some of my code in your trainer.py