faceswap-GAN

Adding Adversarial loss and perceptual loss (VGGface) to deepfakes' auto-encoder architecture.

Updates

Date	Update
2018-03-17	Training: V2 model now provides a 40000-iter training schedule which automatically switches to proper loss functions at predefined iterations. (Cage/Trump dataset results)
2018-03-13	Model architecture: V2.1 model now provides 3 base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. See "4. Training Phase Configuration" in v2.1 notebook for detail.
2018-03-03	Model architecture: Add a new notebook which contains an improved GAN architecture. The architecture is greatly inspired by XGAN and MS-D neural network.
2018-02-13	Video conversion: Add a new video procesisng script using MTCNN for face detection. Faster detection with configurable threshold value. No need of CUDA supported dlib. (New notebook: v2_test_vodeo_MTCNN)

Descriptions

GAN-v2

FaceSwap_GAN_v2_train.ipynb (recommned for trainnig)
- Script for training the version 2 GAN model.
- Video conversion functions are also included.
FaceSwap_GAN_v2_test_video.ipynb
- Script for generating videos.
- Using face_recognition module for face detection.
FaceSwap_GAN_v2_test_video_MTCNN.ipynb (recommned for video conversion)
- Script for generating videos.
- Using MTCNN for face detection. Does not reqiure CUDA supported dlib.
faceswap_WGAN-GP_keras_github.ipynb
- This notebook contains a class of GAN mdoel using WGAN-GP.
- Perceptual loss is discarded for simplicity.
- The WGAN-GP model gave me similar result with LSGAN model after tantamount (~18k) generator updates.
```
gan = FaceSwapGAN() # instantiate the class
gan.train(max_iters=10e4, save_interval=500) # start training
```
FaceSwap_GAN_v2_sz128_train.ipynb
- Input and output images have larger shape (128, 128, 3).
- Minor updates on the architectures:
  1. Add instance normalization to generators and discriminators.
  2. Add additional regressoin loss (mae loss) on 64x64 branch output.
- Not compatible with _test_video and _test_video_MTCNN notebooks above.

Miscellaneous

dlib_video_face_detection.ipynb
1. Detect/Crop faces in a video using dlib's cnn model.
2. Pack cropped face images into a zip file.
Training data: Face images are supposed to be in ./faceA/ and ./faceB/ folder for each target respectively. Face images can be of any size.

Results

Generative Adversarial Network, GAN (version 2)

Improved output quality: Adversarial loss improves reconstruction quality of generated images.
VGGFace perceptual loss: Perceptual loss improves direction of eyeballs to be more realistic and consistent with input face.
Smoothed bounding box (Smoothed bbox): Exponential moving average of bounding box position over frames is introduced to eliminate jitter on the swapped face.
Unsupervised segmentation mask: Model learns a proper mask that helps on handling occlusion, eliminating artifacts on bbox edges, and producing natrual skin tone. In below are results transforming Hinako Sano (佐野ひなこ) to Emi Takei (武井咲).
- From left to right: source face, swapped face (before masking), swapped face (after masking).
- From left to right: source face, swapped face (after masking), mask heatmap.

Source video: 佐野ひなことすごくどうでもいい話？(遊戯王)

Optional 128x128 input/output resolution: Increase input and output size from 64x64 to 128x128.
Face detection/tracking using MTCNN and Kalman filter: More stable detection and smooth tracking.
Training schedule: V2 model provides a predefined training schedule. The Trump/Cage results above are generated by model trained for 21k iters using TOTAL_ITERS = 30000 predefined training schedule.
V2.1 update: An improved architecture is updated in order to stablize training. The architecture is greatly inspired by XGAN ~~and MS-D neural network~~.
- V2.1 model provides three base architectures: (i) XGAN, (ii) VAE-GAN, and (iii) a variant of v2 GAN. (default base_model="GAN")
- Add more discriminators/losses to the GAN. To be specific, they are:
  1. GAN loss for non-masked outputs (common): Add two more discriminators to non-masked outputs.
  2. Perceptual adversarial loss (common): Feature level L1 loss which improves semantic detail.
  3. Domain-adversarial loss (XGAN): "It encourages the embeddings learned by the encoder to lie in the same subspace"
  4. Semantic consistency loss (XGAN): Loss of cosine distance of embeddings to preserve semantic of input.
  5. KL loss (VAE-GAN): KL divergence between N(0,1) and latent vector.
- ~~One res_block in the decoder is replaced by MS-D network (default depth = 16) for output refinement~~.
  - ~~This is a very inefficient implementation of MS-D network.~~ MS-D network is not included for now.
- Preview images are saved in ./previews folder.
- (WIP) Random motion blur as data augmentation, reducing ghost effect in output video.
- FCN8s for face segmentation is introduced to improve masking in video conversion (default use_FCN_mask = True).
  - To enable this feature, keras weights file should be generated through jupyter notebook provided in this repo.

Frequently asked questions

1. Slow video processing / OOM error?

It is likely due to too high resolution of input video, modify the parameters in step 13 or 14 will solve it.
- First, increase video_scaling_offset = 0 to 1 or higher.
- If it doesn't help, set manually_downscale = True.
- If the above still do not help, disable CNN model for face detectoin.
```
def process_video(...):
  ...
  #faces = get_faces_bbox(image, model="cnn") # Use CNN model
  faces = get_faces_bbox(image, model='hog') # Use default Haar features.  
```

2. How does it work?

The following illustration shows a very high-level and abstract (but not exactly the same) flowchart of the denoising autoencoder algorithm. The objective functions look like this.

3. No audio in output clips?

Set audio=True in the video making cell.

output = 'OUTPUT_VIDEO.mp4'
clip1 = VideoFileClip("INPUT_VIDEO.mp4")
clip = clip1.fl_image(process_video)
%time clip.write_videofile(output, audio=True) # Set audio=True

4. Previews look good, but video result does not seem to transform the face?

Default setting transfroms face B to face A.
To transform face A to face B, modify the following parameters depending on your current running notebook:
- Change path_abgr_A to path_abgr_B in process_video() (step 13/14 of v2_train.ipynb and v2_sz128_train.ipynb).
- Change whom2whom = "BtoA" to whom2whom = "AtoB" (step 12 of v2_test_video.ipynb).

Requirements

keras 2
Tensorflow 1.3
Python 3
OpenCV
keras-vggface
moviepy
dlib (optional)
face_recognition (optinoal)

Acknowledgments

Code borrows from tjwei, eriklindernoren, fchollet, keras-contrib and deepfakes. The generative network is adopted from CycleGAN. Weights and scripts of MTCNN are from FaceNet. Illustrations are from irasutoya.

samim23 / faceswap-GAN