asteroid-team / asteroid

The PyTorch-based audio source separation toolkit for researchers

Home Page:https://asteroid-team.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Window length does not equal the segment size !?!

markusMM opened this issue · comments

frame = frame * self.window

I have loded an mp3 into a 1x2x147250253 tensor and get an error back about the segment size beeing just half as long as the window length (see below).

IDK, if I did something wrong or how I have to interpret this?

nnet = ConvTasNet(n_src=2)
continuous_nnet = LambdaOverlapAdd(
		nnet=nnet,  # function to apply to each segment.
		n_src=2,  # number of sources in the output of nnet
		window_size=64000,  # Size of segmenting window
		hop_size=None,  # segmentation hop size
		window="hanning",  # Type of the window (see scipy.signal.get_window
		reorder_chunks=True,  # Whether to reorder each consecutive segment.
		enable_grad=False,  # Set gradient calculation on of off (see torch.set_grad_enabled)
)
new_mp3 = continuous_nnet.forward(mp3)
    142         # Here we can do the reshaping
    143         with torch.autograd.set_grad_enabled(self.enable_grad):
--> 144             olad = self.ola_forward(x)
    145             return olad
    146 

~\Anaconda3\envs\Conda3\lib\site-packages\asteroid\dsp\overlap_add.py in ola_forward(self, x)
    114 
    115             if self.use_window:
--> 116                 frame = frame * self.window
    117             else:
    118                 frame = frame / (self.window_size / self.hop_size)

RuntimeError: The size of tensor a (128000) must match the size of tensor b (64000) at non-singleton dimension 1

It's because of the two input channels. With a single input channel, it does work as expected.

import torch
from asteroid.models import ConvTasNet
from asteroid.dsp import LambdaOverlapAdd

nnet = ConvTasNet(n_src=2)
continuous_nnet = LambdaOverlapAdd(
        nnet=nnet,  # function to apply to each segment.
        n_src=2,  # number of sources in the output of nnet
        window_size=64000,  # Size of segmenting window
        hop_size=None,  # segmentation hop size
        window="hanning",  # Type of the window (see scipy.signal.get_window
        reorder_chunks=True,  # Whether to reorder each consecutive segment.
        enable_grad=False,  # Set gradient calculation on of off (see torch.set_grad_enabled)
)

# This does not work
# u = torch.randn(1, 2, 128000* 8)
# continuous_nnet.forward(u)

# This works
u = torch.randn(1, 1, 128000*8)
continuous_nnet.forward(u)

Does that solve the issue?

Unexpectingly no! 😆

Generally, I wonder if that or all of the source separation networks are made for only one or also multiple input channels!

Maybe I should have asked / researched this first ^^!

Is there a way it could be adapted to also work for 2 -> K channels?

Cheers

Some models work on multichannel data, but need to be designed so.

You can always use a monochannel for multichannel data, the simplest way here is to apply the model to both channels independently. Just put the channels instead of the batch dimension and it'll work.

Heya,

The model seems to only work mono-channel....

n_src = 2
nnet = ConvTasNet(
  n_src=n_src, in_channels=1
)
continuous_nnet = LambdaOverlapAdd(
		nnet=nnet,  # function to apply to each segment.
		n_src=n_src,  # number of sources in the output of nnet
		window_size=64000,  # Size of segmenting window
		hop_size=None,  # segmentation hop size
		window="hanning",  # Type of the window (see scipy.signal.get_window
		reorder_chunks=True,  # Whether to reorder each consecutive segment.
		enable_grad=False,  # Set gradient calculation on of off (see torch.set_grad_enabled)
)
new_mp3 = continuous_nnet.forward(mp3.permute(1,0,2))

Cheers