real_time

Question

real_time

shenbuguanni opened this issue 2 years ago · comments

Firstly,thanks your work very much, I'm study the TFA module recently. I have some problems in the script. 1. the shape of ZF is [1 D] where the script is [T D] 2. the time_seq is the number that add before input x, so why is this done in real time and how the value (32 in script) of time_seq should be set and what it relates to? Thank you again!

Seorim Hwang · Answer 1 · Mon Apr 25 2022 19:39:43 GMT+0800 (China Standard Time)

Hello! Thank you for your interest in our research.

Yes, that's right. Causal TFA (CTFA) used in this repo is obtained by modifying the TFA (reference is mentioned in README). One major shortcoming of this TFA is that causality is not guaranteed since it operates per block of frames (because it is [1, D] as you mentioned). To suffice the causality necessary for real-time implementation, we modify the equation for time axis averaging so that it includes only the causal frames.

And in this case, time_seq (to be precise, time_seq-1) means the number of the look-back frames. The reason we add the time_seq before input x is to eliminate latency, and 32 was determined experimentally.

shenbuguanni · Answer 2 · Tue Apr 26 2022 15:22:01 GMT+0800 (China Standard Time)

so the time_seq is the number of history frames used by nn.AvgPool1d()? I find causal conv2d script have the same operation (pad one frame before), so I set smaller time_seq will not affect causality but scan less historical information.
In addition, I would like to ask about the 1x1 conv, to my knowledge, the kernel_size is 1x1 but there is 3x1 in the script. I'm read some papers which add conv 1x1 to the skip connection (DCCRN+),So, the 1x1 conv can not change the input shape, I don't know if you've done the corresponding experiments and give me some useful advice.

Seorim Hwang · Answer 3 · Tue Apr 26 2022 15:58:39 GMT+0800 (China Standard Time)

so the time_seq is the number of history frames used by nn.AvgPool1d()? --> Yes!
so I set smaller time_seq will not affect causality --> what does this mean?

Could you please explain the question below in more detail? Sorry, I didn't understand the question.

shenbuguanni · Answer 4 · Tue Apr 26 2022 16:41:05 GMT+0800 (China Standard Time)

so I set smaller time_seq will not affect causality --> it means pad any size(>0) of time_seq before input x, the system is causality.
1x1 conv used in skip connection ---> the kernel size is 1x1? I think the output's channel and freq should be the same of input because it will add to the decoder. if use 1x1 kernel size, what's the point of doing this operation. or there is other operation after 1x1 conv like BN and activate function?

Seorim Hwang · Answer 5 · Tue Apr 26 2022 16:58:47 GMT+0800 (China Standard Time)

First of all, I didn't use any convolutional filter for skip-connection.
What part did you see and say?

shenbuguanni · Answer 6 · Wed Apr 27 2022 09:28:32 GMT+0800 (China Standard Time)

1x1 conv down_ sample where the kernel size is not 1x1.

Seorim Hwang · Answer 7 · Wed Apr 27 2022 09:58:22 GMT+0800 (China Standard Time)

down_sampling is not used as a filter for skip connections. It is the module used to encode. It has the same input and output channels and simply compresses the frequency axis in half for down-sampling.
If you don't understand, I recommend taking a look at the Nested Unet structure. References to this are also in the README.

And there is no particular reason why the kernel size is (3x1).