chavinlo / musicgen_trainer

Hi,

Thank you for your code. It has been very helpful to me in writing my own trainer.

I think the line
codes = torch.cat([audio, audio], dim=0). (here)
should be
codes = torch.cat([codes, codes], dim=0)
otherwise you wont use your encoded codebooks, right? :)

kind regards,
Jonas

um... no...

The codebook is obtained via the preprocess_audio that takes both the model and audio waveform as inputs. It then encodes them with encodec (here called compression_model) and then returns them.

Here is where we call that function with model and audio waveform (tensor) as arguments and we get the codebook in return

musicgen_trainer/train.py

Line 139 in 5fd56f0

audio = preprocess_audio(audio, model) #returns tensor

Note that the codebook here is just one batch, we then concate it to match the shape of the condition_tensors.

Feel free to correct me if I'm wrong, because the trainer does not work for anything outside overfitting at the moment.

Ah sry for the confusion, your totally right...

I moved the model.compression_model.encode(wav) out of the
process_audio function since i am using the Audiocraft Dataloader. I think i hallucinated some of my code in your trainer.py

codes concatenation