SonyCSLParis / audio-representations

JEPAs for audio representation learning

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Audio representation learning with JEPAs

This repository contains the PyTorch code associated to the paper Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning, presented at the SASB workshop at ICASSP 2024.

Usage

  • Clone the repository and install the requirements using the provided requirements.txt or environment.yml.

  • Then, preprocess your dataset to convert audios into mel-spectrograms:

    python wav_to_lms.py /your/local/audioset /your/local/audioset_lms
  • Write the list of files to use as training data in a csv file

    cd data
    echo file_name > files_audioset.csv
    find /your/local/audioset_lms -name "*.npy" >> files_audioset.csv
  • You can now start training! We rely on Dora for experiment scheduling. For start an experiment locally, just type:

    dora run

    Under the hood, Hydra is used for handle configurations, so you can override configurations via CLI or build your own YAML config files. For example, type:

    dora run data=my_dataset model.encoder.embed_dim=1024

    to train our model with a larger encoder on your custom dataset.

    Moreover, you can seamlessly launch SLURM jobs on a cluster thanks to Dora:

    dora launch -p partition-a100 -g 4 data=my_dataset

    We refer to the respective documentations of Hydra and Dora for more advanced usage.

Performances

Our model is evaluated on 8 various downstream tasks, including environmental, speech and music classification ones. Please refer to our paper for additional details.

alt text

Checkpoints

Will be available soon...

Credits

About

JEPAs for audio representation learning


Languages

Language:Python 98.5%Language:Shell 1.2%Language:Makefile 0.3%