How to train model with audio feature?

Question

How to train model with audio feature?

tlatlbtle opened this issue 5 years ago · comments

Thanks for this great repo!

As for audio2face, I found that in model files, it does not has audio embedding part:
https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/NoSkipNet_X2Face_pose.py
The default value for ''audio'' in line 197 is false, also there is not any codes for audio model. How to reimplement your method for audio2face?

Thanks.

Olivia · Answer 1 · Wed Aug 28 2019 11:39:53 GMT+0800 (China Standard Time)

The audio code is at https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/Audio2Face.ipynb.

We don't have the code for training as it was rather a pain to implement. We had to use matlab to take Joon et al's audio features which were then saved out to numpy and used to train the new model.

whyma · Answer 2 · Mon Sep 16 2019 16:05:16 GMT+0800 (China Standard Time)

Hi, I use code here to extract audio feature: http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip.
Download dataset here: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
And frames extracted at 1fps: http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/data/zippedFaces.tar.gz

I do not find "audio.wav" described on line 71 in "extract_audio_voxceleb.m" in test-set.
It seems every wav file in test set has been split into several clips: 00001.wav, 00002.wav, etc.

Do I need to merge these clips to get "audio.wav"?

whyma · Answer 3 · Wed Oct 16 2019 18:49:53 GMT+0800 (China Standard Time)

Hi, would you mind to tell us how to gain "audio.wav" mentioned in "extract_audio_voxceleb.m" (
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip)
line 71:
audio_file = ['/datasets/voxceleb1/wav/' track_name '/' track_id '/audio.wav'];

Olivia · Answer 4 · Thu Oct 17 2019 00:45:04 GMT+0800 (China Standard Time)

Hi! We didn't do the preprocessing, but yes I believe the preprocessing gave us an audio file corresponding to the entire video. This was because the frames we used were preprocessed according to the entire video as well I believe. I don't think merging the clips will work as this won't give the original video. What is probably easiest is to simply modify the script but ensure that the frames and the audio correspond (lines 82-84) using whatever preprocessed version of the data you have.

whyma · Answer 5 · Wed Oct 30 2019 21:58:31 GMT+0800 (China Standard Time)

I just download original complete video in mp4 format from youtube and translate it into wav audio. It works for me. Thanks.