seanwood / gcc-nmf

Real-time GCC-NMF Blind Speech Separation and Enhancement

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

two small problems?

opened this issue · comments

First problem is that the low latency algorithm works in terms of working, but the output for both symmetric and asymmetric is silence, no sound? I copied the exact code from the python books.

Second problem is with the low latency and online speech enhancement algorithms, compared to the first two algorithms, that output the correct WAV format, these 2 last ones output all good, except that the bit depth is doubled for some reason? So instead of input signed 16-bit WAV, I get float 32bit WAV output? how to fix this?

thanks!

I've fixed the second problem, but can't reproduce the first. Could you please check and see if by chance these latest changes fix it too? Thanks again!

First problem still happens, do you want me to send you somehow the python demo script I am using and you can tell me what is wrong?

For the second problem, the bit depth is fixed, however the output is clipping like crazy ! :(
I think, maybe, if the wavfile.py not only is used for the wavwrite, but maybe also for the wavread somehow? otherwise I don't know what else is the issue? Is there a way to try with the wavread?

thanks!

The silent output bug was due to an accidental integer division. I was able to reproduce it with Python2.7. The gainFactor variable was getting set to 0 prior to adding cast to float here:
gainFactor = hopSize / float(len(synthesisWindow)) * 2

I'm not getting the clipping issue even with Python2.7, but I just added a quick fix that might work? Tomorrow, I will add the call to wavread too.

Yes, finally the low latency is now fixed and all good. thank you! the only problem left is the clipping from the online, but also, I get clipping from the two offline algorithms, I just forgot to mention that. Maybe a hint to fix this is there, low latency has no clipping, while the 3 other algorithms have clipping, so what is going on?

The notebooks should all be reading / writing wav files the same way now via wavread and wavwrite. I wasn't able to reproduce the clipping, does this change fix it on your end?

Thanks for the wavread, but sadly, clipping is not fixed. Why does it work with 1 script and not with the other 3? I am attaching my 4 demo scripts and you can check it out, maybe there is a difference somewhere between mine and yours, but I am pretty sure I copied from the notebook exactly as it was. Demo3 is the online real-time script that outputs correctly, other 3 clip.
gccNMF.zip

Ah! The clipping problem is probably because of an unintended amplification of the STFT. I added a correcting gain factor for the low latency notebook, but it was missing in the others. Since the test files in the notebooks are pretty quiet to begin with, the STFT amplification wasn't clipping the output for me. My guess is that your test_mix.wav file is louder than the ones from the notebook? In any case, should be fixed now :)

Everything seems to be good, aside from a new error now for the online speech enhancement, not the low latency, just the normal online one, the error appears only at the wavwrite part:
Traceback (most recent call last):
File "demo3.py", line 108, in
wavwrite( targetEstimateSamplesOLA, targetEstimateFileName, sampleRate )
File "C:\gccNMF\wavfile.py", line 38, in wavwrite
raise ValueError('wavwrite: max abs signal value exceeds 1')
ValueError: wavwrite: max abs signal value exceeds 1

How to fix? thanks!

Your output signal for the "online" notebook still has absolute values greater than 1... that shouldn't happen, so there's still a bug there somewhere. Just to be sure, did you update your demo3.py with the stftGainFactor code from the notebook? In the mean time, I've added clipping protection so when the max abs sample value is >= 1, the signal gets normalized before the wavwrite.

Would it be possible to share your "test_mix.wav" file to help isolate the bug?

Finally, everything works correctly now! Thanks a lot! One last question tho, why does the offline speech enhancement (demo2.py) take so much time to process compared to the other 3 algorithms? or is it just on my end? It takes so much time, is that normal? Thank you!