Testing CNN model using sound generated from pyroomacoustics room simulation

Question

Testing CNN model using sound generated from pyroomacoustics room simulation

kehinde-elelu opened this issue 8 months ago · comments

I have generated a large set of audio using Pyroomacoustics for room simulation, employing a circular microphone array and a single sound source.

I have successfully trained and tested a CRNN (Convolutional Recurrent Neural Network) model with this audio dataset to predict events and DOA estimation.

However, when using the trained model to analyze audio from a Respeaker v4 mic-array, the results are unsatisfactory, despite both setups having a similar mic-arrangement in the simulated scenarios.

Can I accurately estimate the Direction of Arrival (DOA) for a circular microphone array with a small radius, especially given that the Respeaker mic-array has microphones spaced less than 0.05cm apart from each other?

I've observed differences in the spectrograms between the WAV files generated from the Pyroomacoustics room simulation and the Respeaker audio. Is it possible to adjust the room simulation parameters to generate audio with a spectrogram more closely resembling that of the Respeaker?

coreeey · Answer 1 · Mon Mar 11 2024 11:03:29 GMT+0800 (China Standard Time)

I have generated a large set of audio using Pyroomacoustics for room simulation, employing a circular microphone array and a single sound source.

I have successfully trained and tested a CRNN (Convolutional Recurrent Neural Network) model with this audio dataset to predict events and DOA estimation.

However, when using the trained model to analyze audio from a Respeaker v4 mic-array, the results are unsatisfactory, despite both setups having a similar mic-arrangement in the simulated scenarios.

Can I accurately estimate the Direction of Arrival (DOA) for a circular microphone array with a small radius, especially given that the Respeaker mic-array has microphones spaced less than 0.05cm apart from each other?

I've observed differences in the spectrograms between the WAV files generated from the Pyroomacoustics room simulation and the Respeaker audio. Is it possible to adjust the room simulation parameters to generate audio with a spectrogram more closely resembling that of the Respeaker?

have you solve this problem?

Robin Scheibler · Answer 2 · Mon Mar 11 2024 14:03:47 GMT+0800 (China Standard Time)

Hello, first of all sorry to @kehinde-elelu as I never replied 🙇

There is not yet a perfect solution to match the simulation to a specific hardware or make it generalize in general.
Enabling the randomized image source model (by setting use_rand_ism=True see the doc) will help the model generalize in practice.

However, the simulation will still be missing the response of the microphone array you are using. If you have a way to measure it, you could try to include it in the simulation.