andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Home Page:http://andrewowens.com/multisensory/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Question about the original audio waveform input

luhuijun666 opened this issue · comments

Hi owen,
Thanks for your contributions!
In your paper,you said you applied a series of strided 1D convolutions to the input waveform.
So the input waveform you refered here (before fusion) is the original audio signal waveform without STFT,right?
Why and how you process the 1D signal ? Could you kindly explain this point for me?