Audio representation learning with JEPAs

This repository contains the PyTorch code associated to the paper Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning, presented at the SASB workshop at ICASSP 2024.

Usage

Clone the repository and install the requirements using the provided requirements.txt or environment.yml.
Then, preprocess your dataset to convert audios into mel-spectrograms:
```
python wav_to_lms.py /your/local/audioset /your/local/audioset_lms
```

Write the list of files to use as training data in a csv file

cd data
echo file_name > files_audioset.csv
find /your/local/audioset_lms -name "*.npy" >> files_audioset.csv

You can now start training! We rely on Dora for experiment scheduling. For start an experiment locally, just type:
```
dora run
```
Under the hood, Hydra is used for handle configurations, so you can override configurations via CLI or build your own YAML config files. For example, type:
```
dora run data=my_dataset model.encoder.embed_dim=1024
```
to train our model with a larger encoder on your custom dataset.

Moreover, you can seamlessly launch SLURM jobs on a cluster thanks to Dora:
```
dora launch -p partition-a100 -g 4 data=my_dataset
```
We refer to the respective documentations of Hydra and Dora for more advanced usage.

Performances

Our model is evaluated on 8 various downstream tasks, including environmental, speech and music classification ones. Please refer to our paper for additional details.

Checkpoints

Will be available soon...

Credits

This great Lightning+Hydra template
EVAR for evaluating our representations

SonyCSLParis / audio-representations

Audio representation learning with JEPAs

Usage

Performances

Checkpoints

Credits

About

Languages