SoundNet_Pytorch

Soundnet model in Pytorch

Introduction

The code is for converting the pretrained tensorflow soundnet model to pytorch model. So no training code for SoundNet model. The pretrained pytorch soundnet model is sound8.pth

Prerequisites

Tensorflow (Only if .pth doesn't exist)
python 3.6 with numpy
pytorch 0.4+

How to use

If the file sound8.pth has not been generated yet, follow the original instructions : model
If audio preprocessing is required (ex : the sample rate is not 22.050 Hz),utils.py has a method for converting the indicated folder.

To convert a file: sox input.wav -r 22050 -c 1 ouput.wav
To extract a features vector use:

audio,sr = load_audio(filepath)
    features = ex.extract_pytorch_feature(audio,'./soundnet/sound8.pth')   
    print([x.shape for x in features])
    
    ##extract vector
    conv = ex.extract_vector(features,idlayer) #features vector

Highlevel features:

conv5, idlayer = 4
conv7, idlayer = 6

The temporal resolution

In order to find the the temporal resolution 1/m for each layer, the slope and the interception are calculated, which describes the relationship between the time in seconds and the number of channels of the extract_feature_vector method.

Acknowledgments

Mode for soundnet tensorflow model is ported from soundnet_tensorflow. Thanks for his works!

reference

Yusuf Aytar, Carl Vondrick, and Antonio Torralba. "Soundnet: Learning sound representations from unlabeled video." Advances in Neural Information Processing Systems. 2016.

nicofarr / SoundNet_Pytorch