breizhn / DTLN-aec

This Repostory contains the pretrained DTLN-aec model for real-time acoustic echo cancellation.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

speech is totolly removed.

dttlgotv opened this issue · comments

I tested with a wav including clean speech. After running your code, I found the speech is totolly removed. Is it reasonble?

Hi,
in general there three scenarios for AEC:

  • single talk near-end: The near end talker is returned
  • single talk far-end: The far end talker is removed from the signal. In the optimal case the model returns silence
  • double talk: The near-end talker is returned and the far-end talker is removed.

Probably you gave the same signal to the input and to the loopback. This results in the removal of the talker from the signal and is equivalent to "single talk far-end". If you want to know more about the concepts and conditions, please read the paper on arXiv linked in the README.

No, the near-end microphone file must be in the in-folder with an "_mic.wav" and in the same folder there must be "_lpb.wav" file which is the loopback/far-end microphone signal. For examples, please check out the dev and blind test set of the AEC-Challenge repository.

No, the near-end microphone file must be in the in-folder with an "_mic.wav" and in the same folder there must be "_lpb.wav" file which is the loopback/far-end microphone signal. For examples, please check out the dev and blind test set of the AEC-Challenge repository.

I placed two files in in-floder, one is g_mic.war (music file + speech ) and another is g_lpb.wav(which is speech from far end). After running your program, the output wav is totolly silence. Acorrding to your program, only music can be produced.

Is the speech on both files the same? If yes, the removal is correct. I will add some example files from the AEC-Challenge test set to this repository.

I added some sample audio files to the repository.
Please try to process them and compare the results with the *_processed.wav files in the folder. If the files are not the same, your setup is probably not working.

I added some sample audio files to the repository.
Please try to process them and compare the results with the *_processed.wav files in the folder. If the files are not the same, your setup is probably not working.

Thanks a lot.

Maybe this line "lpb, fs_2 = sf.read(audio_file_name.replace("mic.wav", "lpb.wav")) " has some problem. It can not read lpb.wav file well. I have to hardcode to read .. Then everythig is well.

Another important question : When will you provide trainning code or other model format file like pb?

Sounds like the str.replace() command is not working properly. I tested it on Mac, Windows and Ubuntu with Python 3.8 everything installed inside a conda environment and it worked perfectly.

At the moment I do not plan to provide a training setup or other model formats. It should be possible to recreate the training and training data from the description of the paper in a reasonable amount of time. This repository is more thought for baseline usage or for integration into projects.

Because I can not recreate your issue and because the model by itself works, I close this issue.

@breizhn hi,

Is it possible to subtract the music or any other audio output (which is pretty close to the microphone array) from the input audio stream (human speech)? Similar to what smart speakers like Alexa do: hear me while music is playing in the background. Let's say I have 2 streams (music + speech vs music). I need to subtract music and leave only speech. Does your AEC implementation solve this problem?