oawiles / X2Face

Pytorch code for ECCV 2018 paper

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to train model with audio feature?

tlatlbtle opened this issue · comments

commented

Thanks for this great repo!

As for audio2face, I found that in model files, it does not has audio embedding part:
https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/NoSkipNet_X2Face_pose.py
The default value for ''audio'' in line 197 is false, also there is not any codes for audio model. How to reimplement your method for audio2face?

Thanks.

The audio code is at https://github.com/oawiles/X2Face/blob/2d0a3a620c8ebf57c6df75c79fb82052eceb89ba/UnwrapMosaic/Audio2Face.ipynb.

We don't have the code for training as it was rather a pain to implement. We had to use matlab to take Joon et al's audio features which were then saved out to numpy and used to train the new model.

commented

Hi, I use code here to extract audio feature: http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip.
Download dataset here: http://www.robots.ox.ac.uk/~vgg/data/voxceleb/vox1a/vox1_test_wav.zip
And frames extracted at 1fps: http://www.robots.ox.ac.uk/~vgg/research/CMBiometrics/data/zippedFaces.tar.gz

I do not find "audio.wav" described on line 71 in "extract_audio_voxceleb.m" in test-set.
It seems every wav file in test set has been split into several clips: 00001.wav, 00002.wav, etc.

Do I need to merge these clips to get "audio.wav"?

commented

Hi, would you mind to tell us how to gain "audio.wav" mentioned in "extract_audio_voxceleb.m" (
http://www.robots.ox.ac.uk/~vgg/research/unsup_learn_watch_faces/extract_audio_code.zip)
line 71:
audio_file = ['/datasets/voxceleb1/wav/' track_name '/' track_id '/audio.wav'];

Hi! We didn't do the preprocessing, but yes I believe the preprocessing gave us an audio file corresponding to the entire video. This was because the frames we used were preprocessed according to the entire video as well I believe. I don't think merging the clips will work as this won't give the original video. What is probably easiest is to simply modify the script but ensure that the frames and the audio correspond (lines 82-84) using whatever preprocessed version of the data you have.

commented

I just download original complete video in mp4 format from youtube and translate it into wav audio. It works for me. Thanks.