hangzhaomit / Sound-of-Pixels

Codebase for ECCV18 "The Sound of Pixels"

Home Page:http://sound-of-pixels.csail.mit.edu

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Why the model does not go training?

avis-ma opened this issue · comments

Hello, I am a Chinese student.
I have pre-processed the dataset, and use the train_MUSIC.sh to train the default model.
But the result is not what I supposed. The metrics is all 0.
Even I directly use the eval_MUSIC.sh (I have downloaded the trained model), I also get the 0 metics(SDR ,SIR, .etc).
I don't change the code that you submit in github.
So how can I find what the problem is?

I am getting the same result - all metrics are 0. @avis-ma Did you solve this?

EDIT: This bug, if it is indeed a bug - would be great if the authors confirmed this, might have come from line 129 in file dataset/base.py:

audio_raw *= (2.0**-31)

From the comment of the authors, this line is supposed to normalize the output of the torchaudio.load() function to the range [-1, 1]. However, this normalization is already done by the torchaudio.load() function itself (see https://pytorch.org/audio/#torchaudio.load). Therefore, what line 129 does is effectively making everything in audio_raw becomes zeros.

This all-zero audio_raw make all the metrics zeros as well, because later in the function calc_metrics() in file main.py, line 150, there is a check whether the ground truth audio is all-zero or not. If it is, then no metric calculation is carried out. Since the audio loaded from the dataset is always all-zero, this check is always true and hence the metrics were never calculated.

The fix is simple: comment out line 129 in file dataset/base.py. After doing so I got something like this:

[Eval Summary] Epoch: 0, Loss: 0.2974, SDR_mixture: 1.4887, SDR: 3.9951, SIR: 9.2085, SAR: 10.6352
Plotting html for visualization...

I am getting the same result - all metrics are 0. @avis-ma Did you solve this?

EDIT: This bug, if it is indeed a bug - would be great if the authors confirmed this, might have come from line 129 in file dataset/base.py:

audio_raw *= (2.0**-31)

From the comment of the authors, this line is supposed to normalize the output of the torchaudio.load() function to the range [-1, 1]. However, this normalization is already done by the torchaudio.load() function itself (see https://pytorch.org/audio/#torchaudio.load). Therefore, what line 129 does is effectively making everything in audio_raw becomes zeros.

This all-zero audio_raw make all the metrics zeros as well, because later in the function calc_metrics() in file main.py, line 150, there is a check whether the ground truth audio is all-zero or not. If it is, then no metric calculation is carried out. Since the audio loaded from the dataset is always all-zero, this check is always true and hence the metrics were never calculated.

The fix is simple: comment out line 129 in file dataset/base.py. After doing so I got something like this:

[Eval Summary] Epoch: 0, Loss: 0.2974, SDR_mixture: 1.4887, SDR: 3.9951, SIR: 9.2085, SAR: 10.6352
Plotting html for visualization...

Thanks @ngmq , the data scale really matters. However, the training process still cannot converge. After 2 training epochs, the loss hovers around some value, say 0.20, and the predicted two masks are also similar. Did you encounter this problem before?

@zjsong That did not happen to me, my training went fine. Maybe checking the input data would help?

@ngmq Thanks for your reply. I just found if the training process takes enough steps forward (e.g., >25 epochs), it would show promising results accordingly.