Encoding Long Audio Clips

Question

Encoding Long Audio Clips

aviaefrat opened this issue a year ago · comments

I need the EnCodec tokens of long audio clips (hours long).
Inputing such files as-is results in cuda OOM.
I've seen you "do not try to be smart about long files".
Does chunking the long audio files naively (and concatenating the EnCodec tokens post-hoc) produce identical results as inputting an entire file to the model?
If not, how should I chunk my audio files?