LCAV / pyroomacoustics

Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.

Home Page:https://pyroomacoustics.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to simulation energy decrease?

coreeey opened this issue · comments

In theory, as the speech signal travels further into the far-field, we expect to observe a significant decrease in energy, leading to a noticeable attenuation in the spectrum. This attenuation typically manifests as a shift from dense resonance peaks to gradually sparse ones. However, in my simulation experiments, I noticed a curious anomaly: regardless of the distance simulated, the spectrum only exhibited aliasing effects without any observable attenuation. So why does this phenomenon occur?
(the code i used is room_L_shape_3d_rt.py in example)
The last line is the original signal, the others are reverberated signals
pic1

Hello @coreeey , this looks pretty good to me.
I suppose the absence of attenuation may be due to a global rescaling of the signal before saving to file. Please check that.
Also, I don't see any aliasing occuring in these spectrogram (aliasing would be some copy of high frequencies into low frequencies).
The further the source is from the microphone, the longer the reverberation time will be.
This causes the longer tail that is observed in your simulated signals.

Thanks @fakufaku for your very prompt reply, I did indeed perform a global rescaling of the signal before saving it to file, which might explain the absence of attenuation. I will attempt another approach to address this issue. Additionally, I realize now that I misunderstood the 'aliasing occurring in these spectrograms.' I initially thought it referred to aliasing of spectra over time.

Thanks @fakufaku, I managed to address this issue by multiplying the normalized signal by 1000, resulting in a spectrum that closely resembles the actual microphone audio.

s = convolve(audio_anechoic, rir)
s = np.squeeze(s, axis=0)
s_norm = s / np.max(np.abs(s))
# s_norm_ = np.int16(s_norm * 32767)
s_norm_ = np.int16(s_norm * 1000)
wavfile.write("tmp_out.wav", 16000, s_norm_)

However, I have a minor inquiry to make: 's_norm_ = np.int16(s_norm * 32767)' and 's_norm_ = np.int16(s_norm * 1000)'. What is the relationship between the 1000 and 32767? Will the volume increase when multiplied by a larger value? and if i want to adjust the simulated microphone's sound pressure to reach 65 dB. Can I achieve this by modifying this constant?
2

Hi @coreeey, can you clarify whether the signals used to create the spectrogram plots in the initial comment were

  1. loaded from disk, or
  2. were the direct room processed output without writing to disk

Hi @coreeey, can you clarify whether the signals used to create the spectrogram plots in the initial comment were

  1. loaded from disk, or
  2. were the direct room processed output without writing to disk

Hi @DanTremonti, i processed output with a max based normalization, and plot the spectrogram through audacity.

@coreeey Thanks for the clarification :)

@coreeey The normalization of audio before saving to a format like wav is one of the finer and confusing point of audio processing. The problem is that wav (when saving has integer valued samples) has a finite precision.
Many files are saved in 16 bits and you want to maximize the use of the 16 bits to represent the amplitude of the sound.
If the maximum amplitude is too small, only a few bits will be used to encode all the values. Often, we will rescale the maximum to a value close to 2^15 which is the maximum value allowed by 16 bits to maximize the precision used.
The rescaling in practicer only changes the volume of the audio.
The simingly innocuous operation will lose the relative difference of amplitudes of different files, as you have noticed in your original issue.

The trick if you want to conserve the relative differences is to rescale all files by the same value in such a way that they do not go outside the range of 16 bits. This is usually done by rescaling the maximum absolute amplitude across all signals that we want to compare so that it maps to 2^15.

Here is an example for two signals.

scale = max(abs(signal1).max(), abs(signal2).max())
signal1 = (signal1 * 32768 / scale).astype(np.int16)
signal2 = (signal2 * 32768 / scale).astype(np.int16)

@coreeey The normalization of audio before saving to a format like wav is one of the finer and confusing point of audio processing. The problem is that wav (when saving has integer valued samples) has a finite precision. Many files are saved in 16 bits and you want to maximize the use of the 16 bits to represent the amplitude of the sound. If the maximum amplitude is too small, only a few bits will be used to encode all the values. Often, we will rescale the maximum to a value close to 2^15 which is the maximum value allowed by 16 bits to maximize the precision used. The rescaling in practicer only changes the volume of the audio. The simingly innocuous operation will lose the relative difference of amplitudes of different files, as you have noticed in your original issue.

The trick if you want to conserve the relative differences is to rescale all files by the same value in such a way that they do not go outside the range of 16 bits. This is usually done by rescaling the maximum absolute amplitude across all signals that we want to compare so that it maps to 2^15.

Here is an example for two signals.

scale = max(abs(signal1).max(), abs(signal2).max())
signal1 = (signal1 * 32768 / scale).astype(np.int16)
signal2 = (signal2 * 32768 / scale).astype(np.int16)

@fakufaku thank you for the detailed and kind reply, and for developing such a great project.