Soundnet model in Pytorch
The code is for converting the pretrained tensorflow soundnet model to pytorch model. So no training code for SoundNet model. The pretrained pytorch soundnet model is sound8.pth
- Tensorflow (Only if .pth doesn't exist)
- python 3.6 with numpy
- pytorch 0.4+
-
If the file sound8.pth has not been generated yet, follow the original instructions : model
-
If audio preprocessing is required (ex : the sample rate is not 22.050 Hz),utils.py has a method for converting the indicated folder.
To convert a file:
sox input.wav -r 22050 -c 1 ouput.wav
-
To extract a features vector use:
audio,sr = load_audio(filepath)
features = ex.extract_pytorch_feature(audio,'./soundnet/sound8.pth')
print([x.shape for x in features])
##extract vector
conv = ex.extract_vector(features,idlayer) #features vector
Highlevel features:
- conv5, idlayer = 4
- conv7, idlayer = 6
In order to find the the temporal resolution 1/m
for each layer, the slope and the interception are calculated, which describes the relationship between the time in seconds and the number of channels of the extract_feature_vector
method.
Mode for soundnet tensorflow model is ported from soundnet_tensorflow. Thanks for his works!
- Yusuf Aytar, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." Advances in Neural Information Processing Systems. 2016.