Reduce RAM Usage of the Audio Operator for large audio files

Question

Reduce RAM Usage of the Audio Operator for large audio files

aatmanvaidya opened this issue 2 months ago · comments

Operators are core components of Feluda, they are modules that help us analyse media items.

One such operator is the Audio Vec Embedding, it takes in a audio file as input and converts it to a vector of 2048 dimension.
The operator uses a pretrained CNN model for the conversion of a file to a vector dimension. All the code for how the operator works can be found in the src/core/operators/audio-cnn-model/ folder.

When processing large audio files, the RAM usage of the operator is significantly high, here are some profiling results for the operator. For a 5 min audio clip roughly 2 GB of RAM is getting used.

The task is to reduce the RAM usage and ensure efficient processing of large audio files.

This is an open ended issue and analysis of the audio-cnn-model will have to be done to figure out what is causing this issue.
Another solution to think about could be smart sampling of the audio, instead of loading the entire audio file, can we select key frames and then let the operator process it.
One thing to remember is that, in the process of reducing RAM usage, the search results should not decrease.