PIT Loss for multichannel audio for speech separation

Question

PIT Loss for multichannel audio for speech separation

SutirthaChakraborty opened this issue 5 months ago · comments

Sutirtha Chakraborty commented 5 months ago

I have a 4 channel audio generated by my model (left,right,side,mid).
I can I apply PIT loss into it
The shape of the tensors are
Speaker one : [batch,channel,time]
Speaker two: [batch,channel,time]

If I need to apply PIT, how should I apply : [batch,channel,speaker,time] ?

if I convert it to mono, or take the mean, the model is unable to learn 4 channels properly.

Pariente Manuel · Answer 1 · Sat Mar 02 2024 17:02:08 GMT+0800 (China Standard Time)

I think the channel should be first, in order to build the permutation matrix of dimension (batch, speaker, speaker) with broadcasting.