andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features

Home Page:http://andrewowens.com/multisensory/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Questions about the entrance of the training function

yxixi opened this issue · comments

commented

Great job! I tried to train this model by myself, however, I encountered some problems. I am anxious to know the solutions to these problems.
a. Could you point out a detailed method of calling the training function?
b. How to input the 'Kinetics-Sounds' dataset into the model for training?
c. I noticed that you mentioned 'rewriting the read_data(pr, gpus) function'. I wonder what the variable 'pr' stands for.
Looking forward to your reply soon!
Thanks!
@andrewowens

Sorry for the slow reply!

a) I usually run it like this:

python -c "import sep_params, sourcesep; sourcesep.train(sep_params.full(num_gpus=3), [0, 1, 2], restore = False)"

This uses the "full" parameter set defined in sep_params.py

b) I kept only these categories from the Kinetics dataset:
blowing nose
bowling
chopping wood
ripping paper
shuffling cards
singing
tapping pen
using computer
blowing out candles
dribbling basketball
laughing
mowing lawn
shoveling snow
stomping grapes
tap dancing
tapping guitar
tickling
strumming guitar
playing accordion
playing bagpipes
playing bass guitar
playing clarinet
playing drums
playing guitar
playing harmonica
playing keyboard
playing organ
playing piano
playing saxophone
playing trombone
playing trumpet
playing violin
playing xylophone

following this paper by Arandjelovic and Zissserman (note that the list of categories in their paper is slightly out of date, though, since it used a pre-release version of the Kinetics dataset).

c) The "pr" variable is the parameter set. You can find examples of these in sep_params.py, such as "full" (the full model), and "unet_pit" (only sound).

Hope that helps!