Window length does not equal the segment size !?!
markusMM opened this issue · comments
asteroid/asteroid/dsp/overlap_add.py
Line 116 in d07a907
I have loded an mp3 into a 1x2x147250253 tensor and get an error back about the segment size beeing just half as long as the window length (see below).
IDK, if I did something wrong or how I have to interpret this?
nnet = ConvTasNet(n_src=2)
continuous_nnet = LambdaOverlapAdd(
nnet=nnet, # function to apply to each segment.
n_src=2, # number of sources in the output of nnet
window_size=64000, # Size of segmenting window
hop_size=None, # segmentation hop size
window="hanning", # Type of the window (see scipy.signal.get_window
reorder_chunks=True, # Whether to reorder each consecutive segment.
enable_grad=False, # Set gradient calculation on of off (see torch.set_grad_enabled)
)
new_mp3 = continuous_nnet.forward(mp3)
142 # Here we can do the reshaping
143 with torch.autograd.set_grad_enabled(self.enable_grad):
--> 144 olad = self.ola_forward(x)
145 return olad
146
~\Anaconda3\envs\Conda3\lib\site-packages\asteroid\dsp\overlap_add.py in ola_forward(self, x)
114
115 if self.use_window:
--> 116 frame = frame * self.window
117 else:
118 frame = frame / (self.window_size / self.hop_size)
RuntimeError: The size of tensor a (128000) must match the size of tensor b (64000) at non-singleton dimension 1
It's because of the two input channels. With a single input channel, it does work as expected.
import torch
from asteroid.models import ConvTasNet
from asteroid.dsp import LambdaOverlapAdd
nnet = ConvTasNet(n_src=2)
continuous_nnet = LambdaOverlapAdd(
nnet=nnet, # function to apply to each segment.
n_src=2, # number of sources in the output of nnet
window_size=64000, # Size of segmenting window
hop_size=None, # segmentation hop size
window="hanning", # Type of the window (see scipy.signal.get_window
reorder_chunks=True, # Whether to reorder each consecutive segment.
enable_grad=False, # Set gradient calculation on of off (see torch.set_grad_enabled)
)
# This does not work
# u = torch.randn(1, 2, 128000* 8)
# continuous_nnet.forward(u)
# This works
u = torch.randn(1, 1, 128000*8)
continuous_nnet.forward(u)
Does that solve the issue?
Unexpectingly no! 😆
Generally, I wonder if that or all of the source separation networks are made for only one or also multiple input channels!
Maybe I should have asked / researched this first ^^!
Is there a way it could be adapted to also work for 2 -> K
channels?
Cheers
Some models work on multichannel data, but need to be designed so.
You can always use a monochannel for multichannel data, the simplest way here is to apply the model to both channels independently. Just put the channels instead of the batch dimension and it'll work.
Heya,
The model seems to only work mono-channel....
n_src = 2
nnet = ConvTasNet(
n_src=n_src, in_channels=1
)
continuous_nnet = LambdaOverlapAdd(
nnet=nnet, # function to apply to each segment.
n_src=n_src, # number of sources in the output of nnet
window_size=64000, # Size of segmenting window
hop_size=None, # segmentation hop size
window="hanning", # Type of the window (see scipy.signal.get_window
reorder_chunks=True, # Whether to reorder each consecutive segment.
enable_grad=False, # Set gradient calculation on of off (see torch.set_grad_enabled)
)
new_mp3 = continuous_nnet.forward(mp3.permute(1,0,2))
Cheers