Light (and Fast) Anime Face Detector

This repository is based on a light and fast face detector (LFFD) (paper). The model is trained on data from Anime-Face-Detector repository by qhgz2013 plus additional ~600 pseudo semi-supervised images sourced from danbooru2019. This detector can theoretically run at >100 FPS while retaining an acceptable level of accuracy.

This repository is tested under the following settings:

Ubuntu 18.04.4 LTS
Python 3.6.8
GeForce GTX 1060 6GB (Mobile)
NVIDIA Driver 440.64
CUDA 10.0.130 (installed via conda as cudatoolkit)
CUDNN 7.6.5 (installed via conda)
mxnet 1.6.0

Core Dependencies

Python 3.6+
opencv-contrib-python
mxnet
numpy

You might need other libraries (e.g. tqdm, imutils, etc.) in order to run the demo. If you want to utilize your GPU, please consult MXNet's official installation guide.

Usage

Run demo on a video from command line:

    python run_video.py --input-path INPUT_PATH --output-path OUTPUT_PATH --detector-path configs/anime.json

Run demo on images in a directory from command line:

    python run_directory.py --input-directory INPUT_DIR --detector-path configs/anime.json

As a Python library:

    import os
    from core.detector import LFFDDetector
    
    """
        !!! IMPORTANT !!!
        Disable auto-tuning
        You might experience a major slow-down if you run the model on images with varied resolution / aspect ratio.
        This is because MXNet is attempting to find the best convolution algorithm for each input size, 
        so we should disable this behavior if it is not desirable.
    """ 
    os.environ["MXNET_CUDNN_AUTOTUNE_DEFAULT"] = "0"

    with open(CONFIG_PATH, "r") as f:
        config = json.load(f)
    detector = LFFDDetector(config, use_gpu=True)
    image = cv2.imread(IMAGE_PATH)
    boxes = detector.detect(image)

Examples

Credit: Promotional art from Touhou Cannonball

(Size 640) Credit: 素敵な墓場で暮しましょ！ by syuri22@例大祭た14a

Credit: Me

Inference Time

The inference speed for this detector is varied depending on the size (or resize_scale) parameter in the config. By lowering the parameters, there can be a significant speed gain with potentially worse model performance.

Input Resolution in the following table is after the image resizing is done. Inference Time is calculated by timing LFFDDetector.detect() method.

Base	Input Resolution (WxH)	Inference Time
Intel® Core™ i7-8750H CPU @ 2.20GHz × 12 (CPU)	384x259	115ms
Intel® Core™ i7-8750H CPU @ 2.20GHz × 12 (CPU)	384x216	96ms
Intel® Core™ i7-8750H CPU @ 2.20GHz × 12 (CPU)	276x384	125ms
GeForce GTX 1060 6GB (Mobile)	384x259	13.7ms
GeForce GTX 1060 6GB (Mobile)	384x216	10.6ms
GeForce GTX 1060 6GB (Mobile)	276x384	17.7ms

MXNet auto-tuning is disabled for GPU benchmarking (by setting MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 in environment variable). There is a potential gain by enabling it in case that the detector runs on images with the same resolution / aspect ratio.

Training

Please refer to the original LFFD implementation for details.

Known Issues / Improvements

Currently, the detector cannot handle these cases well.

Extreme size (the whole image is a face or extremely small face)
"Non-standard" (e.g. chibi) stylistic choices
Rotation
Side views
Ornaments / headdresses (e.g. helmet)
Faces that are very close to others

All of these except the last one could be solvable by using more data and / or image augmentation. We could let the detector pre-label the data for us, which should reduce the workload when we manually correct them afterwards.

A consistent rule for bounding box might also be needed to minimize the bounding box loss (right now, it's a rough estimation of how much padding we will need for a face).

size (and resize_scale) also needs to be carefully chosen since the trained dataset is not big enough for the model to learn continuous face scaling.

Speed-wise, we could implement a batch processing. Non-maximum suppression function could also be written as cython or to be compatiable with numba to gain more speed.

cheese-roll / light-anime-face-detector