Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification

Francesca Ronchini¹, Luca Comanducci¹, and Fabio Antonacci¹

¹ Dipartimento di Elettronica, Informazione e Bioingegneria - Politecnico di Milano

Abstract
Install & Usage
Link to additional material
Additional information

Abstract

In the past few years, text-to-audio models have emerged as a significant advancement in automatic audio generation. Although they represent impressive technological progress, the effectiveness of their use in the development of audio applications remains uncertain. This paper aims to investigate these aspects, specifically focusing on the task of classification of environmental sounds. This study analyzes the performance of two different environmental classification systems when data generated from text-to-audio models is used for training. Two cases are considered: a) when the training dataset is augmented by data coming from two different text-to-audio models; and b) when the training dataset consists solely of synthetic audio generated. In both cases, the performance of the classification task is tested on real data. Results indicate that text-to-audio models are effective for dataset augmentation, whereas the performance of the models drops when relying on only generated audio.

Install & Usage

For generating the data, we used AudioLDM2 and AudioGen.

Intalling AudioLDM2

Please refer to the AudioLDM2 GitHub repo and follow the installation instructions. For this study, we used the official checkpoints available in the Hugging Face 🧨 Diffusers and the audioldm checkpoint.

When AudioLDM2 has been installed, you can generate the audio files running the script audio_generation/class_generation_audioldm.py Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

After that, you can run the script with the command:

cd audio_generation
python class_generation_audioldm.py

Intalling AudioGen

Please refer to the AudioGen GitHub repo and follow the installation instructions.

When AudioGen has been installed, you can generate the audio files running the script audio_generation/class_generation_audiogen.py. Before running the script, you need to specify the path to the output folder, the audio class to generate, the prompt to use to generate the files, and the number of files to generate in the audio_generation/class_generation_audiogen.py.

cd audio_generation
python class_generation_audiogen.py

Run the code

When all the data have been generated, you can reproduce the experiments.

First, install all the packages required by the system. Run the following command on your terminal to install all the packages needed:

pip install -r requirements.txt

When all packages have been installed, you need to specify which dataset to use following the instructions on the config/default.yaml file.

After all the parameters have been defined, you can run the code with the following command:

python main.py

Link to additional material

Additional material and audio samples are available on the companion website.

Additional information

For more details: "Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification", Francesca Ronchini, Luca Comanducci, and Fabio Antonacci - arXiv, 2024.

If you use code or comments from this work, please cite our paper:

@article{ronchini2024synthesizing,
  title={Synthesizing Soundscapes: Leveraging Text-to-Audio Models for Environmental Sound Classification},
  author={Ronchini, Francesca and Comanducci, Luca and Antonacci, Fabio},
  journal={arXiv preprint arXiv:2403.17864},
  year={2024}
}

RonFrancesca / Text-to-Audio-ESC