speech is totolly removed.

Question

speech is totolly removed.

dttlgotv opened this issue 4 years ago · comments

I tested with a wav including clean speech. After running your code, I found the speech is totolly removed. Is it reasonble?

Nils L. Westhausen · Answer 1 · Fri Nov 06 2020 18:55:20 GMT+0800 (China Standard Time)

Hi,
in general there three scenarios for AEC:

single talk near-end: The near end talker is returned
single talk far-end: The far end talker is removed from the signal. In the optimal case the model returns silence
double talk: The near-end talker is returned and the far-end talker is removed.

Probably you gave the same signal to the input and to the loopback. This results in the removal of the talker from the signal and is equivalent to "single talk far-end". If you want to know more about the concepts and conditions, please read the paper on arXiv linked in the README.

dttlgotv · Answer 2 · Fri Nov 06 2020 19:55:50 GMT+0800 (China Standard Time)

your meaning is that near end wav must be placed in out folder？发自我的iPhone

…

------------------ Original ------------------ From: Nils L. Westhausen <notifications@github.com> Date: Fri,Nov 6,2020 6:55 PM To: breizhn/DTLN-aec <DTLN-aec@noreply.github.com> Cc: dttlgotv <tantan.gxh@foxmail.com>, Author <author@noreply.github.com> Subject: Re: [breizhn/DTLN-aec] speech is totolly removed. (#1) Hi, in general there three scenarios for AEC: single talk near-end: The near end talker is returned single talk far-end: The far end talker is removed from the signal. In the optimal case the model returns silence double talk: The near-end talker is returned and the far-end talker is removed. Probably you gave the same signal to the input and to the loopback. This results in the removal of the talker from the signal and is equivalent to "single talk far-end". If you want to know more about the concepts and conditions, please read the paper on arXiv linked in the README. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Nils L. Westhausen · Answer 3 · Fri Nov 06 2020 20:00:21 GMT+0800 (China Standard Time)

No, the near-end microphone file must be in the in-folder with an "_mic.wav" and in the same folder there must be "_lpb.wav" file which is the loopback/far-end microphone signal. For examples, please check out the dev and blind test set of the AEC-Challenge repository.

dttlgotv · Answer 4 · Mon Nov 09 2020 10:21:10 GMT+0800 (China Standard Time)

No, the near-end microphone file must be in the in-folder with an "_mic.wav" and in the same folder there must be "_lpb.wav" file which is the loopback/far-end microphone signal. For examples, please check out the dev and blind test set of the AEC-Challenge repository.

I placed two files in in-floder, one is g_mic.war (music file + speech ) and another is g_lpb.wav(which is speech from far end). After running your program, the output wav is totolly silence. Acorrding to your program, only music can be produced.

Nils L. Westhausen · Answer 5 · Tue Nov 10 2020 22:26:43 GMT+0800 (China Standard Time)

Is the speech on both files the same? If yes, the removal is correct. I will add some example files from the AEC-Challenge test set to this repository.

Nils L. Westhausen · Answer 6 · Wed Nov 11 2020 00:17:44 GMT+0800 (China Standard Time)

I added some sample audio files to the repository.
Please try to process them and compare the results with the *_processed.wav files in the folder. If the files are not the same, your setup is probably not working.

dttlgotv · Answer 7 · Wed Nov 11 2020 11:02:36 GMT+0800 (China Standard Time)

I added some sample audio files to the repository.
Please try to process them and compare the results with the *_processed.wav files in the folder. If the files are not the same, your setup is probably not working.

Thanks a lot.

Maybe this line "lpb, fs_2 = sf.read(audio_file_name.replace("mic.wav", "lpb.wav")) " has some problem. It can not read lpb.wav file well. I have to hardcode to read .. Then everythig is well.

Another important question : When will you provide trainning code or other model format file like pb?

Nils L. Westhausen · Answer 8 · Wed Nov 11 2020 17:56:59 GMT+0800 (China Standard Time)

Sounds like the str.replace() command is not working properly. I tested it on Mac, Windows and Ubuntu with Python 3.8 everything installed inside a conda environment and it worked perfectly.

At the moment I do not plan to provide a training setup or other model formats. It should be possible to recreate the training and training data from the description of the paper in a reasonable amount of time. This repository is more thought for baseline usage or for integration into projects.

Because I can not recreate your issue and because the model by itself works, I close this issue.

Serhii Korol · Answer 9 · Wed Mar 17 2021 07:22:11 GMT+0800 (China Standard Time)

@breizhn hi,

Is it possible to subtract the music or any other audio output (which is pretty close to the microphone array) from the input audio stream (human speech)? Similar to what smart speakers like Alexa do: hear me while music is playing in the background. Let's say I have 2 streams (music + speech vs music). I need to subtract music and leave only speech. Does your AEC implementation solve this problem?