action-units facial-expression-recognition emotion-recognition facial-action-units facial-expressions pytorch

AUNets

This repo provides a PyTorch implementation for AUNets. AUNets relies on the power of having independent and binaries CNNs for each facial expression. It works with the hollistical facial image i.e., no keypoints or facial aligment needed.

Project page: https://biomedicalcomputervision.uniandes.edu.co/index.php/research?id=30

Citation

@article{romero2018multi,
  title={Multi-view dynamic facial action unit detection},
  author={Romero, Andr{\'e}s and Le{\'o}n, Juan and Arbel{\'a}ez, Pablo},
  journal={Image and Vision Computing},
  year={2018},
  publisher={Elsevier}
}

Usage (TRAIN)

$./main.sh -GPU 0 -OF None #It will train AUNets (12 models and 3 folds) from emotionnet weights.
$./main.sh -GPU 0 -OF None -HYDRA true #It will train HydraNets (12 models and 3 folds) from emotionnet weights. 

$./main.sh -GPU 0 -OF None -finetuning imagenet #It will train AUNets (12 models and 3 folds) from imagenet weights. 
$./main.sh -GPU 0 -OF None -HYDRA true -finetuning imagenet #It will train HydraNets (12 models and 3 folds) from imagenet weights. 

# -OF options: None, Alone, Horizontal, Channels, Conv, FC6, FC7. [default=None].
# -finetuning options: emotionnet, imagenet, random. [default=emotionnet].
# -au options: 1,2,4,...,24. [default=all].
# -fold options: 0,1,2. [default=all].
# -mode_data options: normal, aligned. [default=normal].

Usage (DEMO)

$./main.sh -AU 12 -gpu 0 -fold 0 -OF Horizontal -DEMO Demo

# -DEMO: folder or image location (absolute or relative). 
#        When OF!=None, DEMO must be a folder that contains *ONLY* RGB images. 
#        data_loader.py will assume $DEMO_OF is the folder where OF images are stored. 

# It will output the confidence score for each image in the folder, or for one single image if DEMO is a file.

# Example
$./main.sh -AU 12 -gpu 0 -fold 0 -OF None -DEMO Demo
./main.py -- --AU=12 --fold=0 --GPU=0 --OF None --DEMO Demo --finetuning=emotionnet --mode_data=normal
 [!!] loaded trained model: ./snapshot/models/BP4D/normal/fold_0/AU12/OF_None/emotionnet/02_1800.pth!
AU12 - OF None | Forward  | Demo/000.jpg : 0.00438882643357
AU12 - OF None | Forward  | Demo/012.jpg : 0.00548902712762
AU12 - OF None | Forward  | Demo/024.jpg : 0.00295104249381
AU12 - OF None | Forward  | Demo/036.jpg : 0.00390655593947
AU12 - OF None | Forward  | Demo/048.jpg : 0.00493786809966

VGG16 Architecture

Network (VGG - Fig (a))

Other Optical Flow stream architectures are found in misc folder

Requirements

Python 2.7
PyTorch 0.3.1
Tensorflow (only if tensorboard flag)
Other modules in requirements.txt

Weights (.pth models)

Each model can be as heavy as 1GB for a total of 36GB per network variant (12 AUs - 3 folds). You can find them here.

Results using misc/VGG16-OF_Horizontal.png

These results are reported using three-fold cross validation over the BP4D dataset.

AU	1	2	4	6	7	10	12	14	15	17	23	24	*Av.*
F1	53.4	44.7	55.8	79.2	78.1	83.1	88.4	66.6	47.5	62.0	47.3	49.7	63.0

Three fold:

Based on DRML paper, we use their exact subject-exclusive three fold testing (These subjects are exclusively for testing on each fold, the remaining subjects are for train/val):

Fold	*Subjects*
1	['F001', 'F002', 'F008', 'F009', 'F010', 'F016', 'F018', 'F023', 'M001', 'M004', 'M007', 'M008', 'M012', 'M014']
2	['F003', 'F005', 'F011', 'F013', 'F020', 'F022', 'M002', 'M005', 'M010', 'M011', 'M013', 'M016', 'M017', 'M018']
3	['F004', 'F006', 'F007', 'F012', 'F014', 'F015', 'F017', 'F019', 'F021', 'M003', 'M006', 'M009', 'M015']

About

Pytorch implementation of Multi-View Dynamic Facial Action Unit Detection, Image and Vision Computing (2018)

action-units facial-expression-recognition emotion-recognition facial-action-units facial-expressions pytorch

MIT License

Languages

Language:Python 77.7%Language:C++ 6.9%Language:C++ 6.9%Language:C++ 6.9%Language:Shell 1.7%