emilianavt / OpenSeeFace

Thanks for your contribution! And I will include your license and your repo link correctly.
I also have a question, estimate facial landmarks from 224x224 image is not neccassary sometime since my input image is near 100x100, is that possible to rescale the network?
I will also try this by my own.

Thank you!

I have experimented with this before. When I tried just halving the resolution it stopped producing useful output. If you need more speed, you can try using the new 30 point model lm_modelT_opt.onnx, which runs on 56x56 inputs and runs at about five times the frame rate as the slowest model for me. The output is noisier, but I find that with higher smoothing it can still give acceptable results and it is still very robust against bad lighting and head rotation. One thing to note is that it requires a higher cutoff threshold as it may start to hallucinate faces where there are none otherwise. You can find the changes neccessary for decoding landmarks by looking for the model_type < 0 parts here:

OpenSeeFace/tracker.py

Lines 666 to 680 in 46e26f5

    
           self.res = 224. 
        
           self.mean_res = self.mean_224 
        
           self.std_res = self.std_224 
        
           if model_type < 0: 
        
               self.res = 56. 
        
               self.mean_res = np.tile(self.mean, [56, 56, 1]) 
        
               self.std_res = np.tile(self.std, [56, 56, 1]) 
        
           self.res_i = int(self.res) 
        
           self.out_res = 27. 
        
           if model_type < 0: 
        
               self.out_res = 6. 
        
           self.out_res_i = int(self.out_res) + 1 
        
           self.logit_factor = 16. 
        
           if model_type < 0: 
        
               self.logit_factor = 8.

@emilianavt Thanks for your reply, I will also trying the 56x56 network.
I am also trying to load your weights in pytorch, however it throws error

from model import *
PATH = "./weights/lm_model0.pth"
model = OpenSeeFaceLandmarks("small", 0.5, True)
model.load_state_dict(torch.load(PATH))
model.eval()

Throws

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-d07b4f955829> in <module>
      1 PATH = "./weights/lm_model0.pth"
----> 2 model = OpenSeeFaceLandmarks("small", 0.5, True)
      3 model.load_state_dict(torch.load(PATH))
      4 model.eval()

~\Develop\OpenSeeFace\model.py in __init__(self, size, channel_multiplier, inference)
    135     def __init__(self, size="large", channel_multiplier=1.0, inference=False):
    136         kwargs = geffnet.mobilenetv3._gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)
--> 137         super(OpenSeeFaceLandmarks, self).__init__(**kwargs)
    138         if size == "large":
    139             self.up1 = UNetUp(round_channels(960, channel_multiplier), round_channels(112, channel_multiplier), 256, (14,14))

TypeError: __init__() argument after ** must be a mapping, not MobileNetV3

My environment is Python3.7, pytorch 1.6

Alright. The issue of model.py cause by the update of genffnet. I fix the problem by adding

from geffnet.efficientnet_builder import *
from geffnet.config import layer_config_kwargs
from geffnet.activations import get_act_fn, get_act_layer

def _gen_mobilenet_v3(variant, channel_multiplier=1.0, pretrained=False, **kwargs):
    """Creates a MobileNet-V3 large/small/minimal models.
    Ref impl: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet_v3.py
    Paper: https://arxiv.org/abs/1905.02244
    Args:
      channel_multiplier: multiplier to number of channels per layer.
    """
    if 'small' in variant:
        num_features = 1024
        if 'minimal' in variant:
            act_layer = 'relu'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s2_e1_c16'],
                # stage 1, 56x56 in
                ['ir_r1_k3_s2_e4.5_c24', 'ir_r1_k3_s1_e3.67_c24'],
                # stage 2, 28x28 in
                ['ir_r1_k3_s2_e4_c40', 'ir_r2_k3_s1_e6_c40'],
                # stage 3, 14x14 in
                ['ir_r2_k3_s1_e3_c48'],
                # stage 4, 14x14in
                ['ir_r3_k3_s2_e6_c96'],
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c576'],
            ]
        else:
            act_layer = 'hard_swish'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s2_e1_c16_se0.25_nre'],  # relu
                # stage 1, 56x56 in
                ['ir_r1_k3_s2_e4.5_c24_nre', 'ir_r1_k3_s1_e3.67_c24_nre'],  # relu
                # stage 2, 28x28 in
                ['ir_r1_k5_s2_e4_c40_se0.25', 'ir_r2_k5_s1_e6_c40_se0.25'],  # hard-swish
                # stage 3, 14x14 in
                ['ir_r2_k5_s1_e3_c48_se0.25'],  # hard-swish
                # stage 4, 14x14in
                ['ir_r3_k5_s2_e6_c96_se0.25'],  # hard-swish
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c576'],  # hard-swish
            ]
    else:
        num_features = 1280
        if 'minimal' in variant:
            act_layer = 'relu'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s1_e1_c16'],
                # stage 1, 112x112 in
                ['ir_r1_k3_s2_e4_c24', 'ir_r1_k3_s1_e3_c24'],
                # stage 2, 56x56 in
                ['ir_r3_k3_s2_e3_c40'],
                # stage 3, 28x28 in
                ['ir_r1_k3_s2_e6_c80', 'ir_r1_k3_s1_e2.5_c80', 'ir_r2_k3_s1_e2.3_c80'],
                # stage 4, 14x14in
                ['ir_r2_k3_s1_e6_c112'],
                # stage 5, 14x14in
                ['ir_r3_k3_s2_e6_c160'],
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c960'],
            ]
        else:
            act_layer = 'hard_swish'
            arch_def = [
                # stage 0, 112x112 in
                ['ds_r1_k3_s1_e1_c16_nre'],  # relu
                # stage 1, 112x112 in
                ['ir_r1_k3_s2_e4_c24_nre', 'ir_r1_k3_s1_e3_c24_nre'],  # relu
                # stage 2, 56x56 in
                ['ir_r3_k5_s2_e3_c40_se0.25_nre'],  # relu
                # stage 3, 28x28 in
                ['ir_r1_k3_s2_e6_c80', 'ir_r1_k3_s1_e2.5_c80', 'ir_r2_k3_s1_e2.3_c80'],  # hard-swish
                # stage 4, 14x14in
                ['ir_r2_k3_s1_e6_c112_se0.25'],  # hard-swish
                # stage 5, 14x14in
                ['ir_r3_k5_s2_e6_c160_se0.25'],  # hard-swish
                # stage 6, 7x7 in
                ['cn_r1_k1_s1_c960'],  # hard-swish
            ]
    with layer_config_kwargs(kwargs):
        model_kwargs = dict(
            block_args=decode_arch_def(arch_def),
            num_features=num_features,
            stem_size=16,
            channel_multiplier=channel_multiplier,
            act_layer=resolve_act_layer(kwargs, act_layer),
            se_kwargs=dict(
                act_layer=get_act_layer('relu'), gate_fn=get_act_fn('hard_sigmoid'), reduce_mid=True, divisor=8),
            norm_kwargs=resolve_bn_args(kwargs),
            **kwargs,
        )
    return model_kwargs

to the model.py and replace

kwargs = geffnet.mobilenetv3._gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)

with

kwargs = _gen_mobilenet_v3([size], channel_multiplier=channel_multiplier)

I should probably bundle the necessary geffnet code.

I am trying to export PyTorch model to onnx model by my own and meat this error. With some hard code I got 112x112 model working. However when export like this

dummy_input = torch.randn(1, 3, 112, 112, device='cpu')
torch.onnx.export(model, dummy_input, "lm_model0.onnx", verbose=True, input_names=["input"], output_names=["output"], opset_version=11)

It throws

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-0bb05b5a39e2> in <module>
      1 dummy_input = torch.randn(1, 3, 107, 107, device='cpu')
----> 2 torch.onnx.export(model, dummy_input, "lm_model0.onnx", verbose=True, input_names=["input"], output_names=["output"])

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    206                         do_constant_folding, example_outputs,
    207                         strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
--> 208                         custom_opsets, enable_onnx_checker, use_external_data_format)
    209 
    210 

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     90             dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
     91             custom_opsets=custom_opsets, enable_onnx_checker=enable_onnx_checker,
---> 92             use_external_data_format=use_external_data_format)
     93 
     94 

~\Anaconda3\envs\torch\lib\site-packages\torch\onnx\utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format)
    543                     params_dict, opset_version, dynamic_axes, defer_weight_export,
    544                     operator_export_type, strip_doc_string, val_keep_init_as_ip, custom_opsets,
--> 545                     val_add_node_names, val_use_external_data_format, model_file_location)
    546             else:
    547                 proto, export_map = graph._export_onnx(

RuntimeError: ONNX export failed: Couldn't export Python operator HardSwishJitAutoFn

Defined at:
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\geffnet-1.0.0-py3.7.egg\geffnet\activations\activations_me.py(174): forward
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py(704): _slow_forward
C:\Users\xuhao\Anaconda3\envs\torch\lib\site-packages\torch\nn\modules\module.py(720): _call_impl
C:\Users\xuhao\Develop\OpenSeeFace\model.py(251): _forward_impl
C:\Users\xuhao\Develop\OpenSeeFace\model.py(281): forward
....

============

I reset commit to 8795d3298d to solve this issue.
The API of this geffnet is really not stable....

I have not encountered this issue before. I'm on c450c12ae6ffb1757f62dde3c2765da3c10f6def of geffnet.

I modified the UNetUp class and also the input and output size to make the origin model works on 112x112->14x14 and output the it to onnx.

from model import *
PATH = "weights/lm_model0.pth"
model = OpenSeeFaceLandmarks("small", 0.5, True)
model.load_state_dict(torch.load(PATH))
dummy_input = torch.randn(1, 3, 112, 112, device='cpu')
torch.onnx.export(model, dummy_input, "lm_model0_small.onnx", verbose=True, input_names=["input"], output_names=["output"], opset_version=11)

The network is 4x faster which is suitable for my application. However it seems didn't provide any result could match the face. Do I need to retrain the model or just maybe some error on heat map processsing?

My heat map process code is like this

float logit(float p)
{
    if (p >= 1.0)
        p = 0.99999;
    else if (p <= 0.0)
        p = 0.0000001;

    p = p / (1 - p);
    return log(p) / 16;
}

CvPts proc_heatmaps(float* heatmaps, int x0, int y0, float scale_x, float scale_y)
{
    CvPts facical_landmarks;
    int heatmap_size = EMI_NN_OUTPUT_SIZE*EMI_NN_OUTPUT_SIZE;
    for (int landmark = 0; landmark < 66; landmark++)
    {
        int offset = heatmap_size * landmark;
        int argmax = -100;
        float maxval = -100;
        for (int i = 0; i < heatmap_size; i++)
        {
            if (heatmaps[offset + i] > maxval)
            {
                argmax = i;
                maxval = heatmaps[offset + i];
            }
        }

        int x = argmax / EMI_NN_OUTPUT_SIZE;
        int y = argmax % EMI_NN_OUTPUT_SIZE;


        float conf = heatmaps[offset + argmax];
        float res = EMI_NN_SIZE - 1;

        int off_x = floor(res * (logit(heatmaps[66 * heatmap_size + offset + argmax])) + 0.1);
        int off_y = floor(res * (logit(heatmaps[2 * 66 * heatmap_size + offset + argmax])) + 0.1);


        float lm_y = (float)y0 + (float)(scale_x * (res * (float(x) / (EMI_NN_OUTPUT_SIZE-1)) + off_x));
        float lm_x = (float)x0 + (float)(scale_y * (res * (float(y) / (EMI_NN_OUTPUT_SIZE-1)) + off_y));

        facical_landmarks.push_back(cv::Point2f(lm_x, lm_y));
    }
    return facical_landmarks;
}

The landmark decoding looks correct to me, but this matches my experience with just reducing the resolution. I'm not sure why it doesn't work. I suspect the offset layers might give bad results if the resolution is different.

That's the reason I trained the special lower resolution model with less points. I also tried training at 112x112 before, but found that the gain in performance was smaller compared to the reduction in accuracy, so I settled on 56x56 to make the performance gain worthwhile.

The landmark decoding looks correct to me, but this matches my experience with just reducing the resolution. I'm not sure why it doesn't work. I suspect the offset layers might give bad results if the resolution is different.

That's the reason I trained the special lower resolution model with less points. I also tried training at 112x112 before, but found that the gain in performance was smaller compared to the reduction in accuracy, so I settled on 56x56 to make the performance gain worthwhile.

Thanks for your patience again.
Can you still found your trained112x112 model, I wonder if I can give a try on this.
And I am also trying you 56x56 model. Are the 3D positions of the 56x56 model follows the ahead 30 of 66 points or not?

The 30 points are the green points here and correspond to the blue ones of the 66 point set. I fill the 66 point array with the 30 points data like this.

Ok! Thanks again!
If you are interesting in flight simulation you may take a look of my own project which I am integrating your awesome network.

I have test the model. The 30 points model is too noisy for me =.=. Looks like 114x114 model with 66 features may balanced since I need more than 75FPS on cpu..

I have looked a bit more at the result of the downscaled models and it just looks completely broken. I don't think I have my old results, but might try training another 112x112 some time soon.

@emilianavt Thanks a lot! Waiting for your update.

Training will probably take a few more days as I mainly train overnight.

Training will probably take a few more days as I mainly train overnight.

Thanks a lot!
My project is also a sparse project work in night, so I have enough time to wait hhhhh

Validation loss doesn't seem to be improving anymore, so you can give this one a try:
lm_modelV_opt.zip

The logit factor is 16, input resolution 112x112, output resolution 14x14.

Validation loss doesn't seem to be improving anymore, so you can give this one a try:
lm_modelV_opt.zip

The logit factor is 16, input resolution 112x112, output resolution 14x14.

Thanks a lot!!! As I have tried in my own code, this model works pretty well, looks like its running speed similar to model0 better performance similar to 1 or 2. More analyze will be processed later.

Validation loss doesn't seem to be improving anymore, so you can give this one a try:
lm_modelV_opt.zip

The logit factor is 16, input resolution 112x112, output resolution 14x14.

BTW, how about your quantization progress of these models? I see you are trying to quat them in onnxruntime's repo

I have encountered the same issue as you. Models get smaller but slower when successfully quantized. I haven't bothered evaluating accuracy due to this. Hopefully something can be fixed on the onnxruntime side. I believe in theory quantization should be able to give a good speedup, which would help a lot.

I also trained a faster version. I'm still trying to figure out where the two 112x112 models fit among the other different models quality-wise.

lm_modelU_opt.zip

I also trained a faster version. I'm still trying to figure out where the two 112x112 models fit among the other different models quality-wise.

lm_modelU_opt.zip

In my practice, lm_modelV_opt.zip is more accuracy than lm_model0 with same inference time, and not as good as lm_model1.
I will test this lm_modelU_opt later.

Btw, how did you solve the clip issue while quant the model?
I wonder if my quantized model gives bad result becasue I fix the clip min and max

I only tried dynamic quantization on that model, which worked without solving that issue, but it caused things to run slower.

I only tried dynamic quantization on that model, which worked without solving that issue, but it caused things to run slower.

looks like modelU has similar performance compare to model0 but much faster in my application.

Thank you for your feedback!

Thank you for your feedback!

Thanks you for your excellent work again!

	self.res = 224.
	self.mean_res = self.mean_224
	self.std_res = self.std_224
	if model_type < 0:
	self.res = 56.
	self.mean_res = np.tile(self.mean, [56, 56, 1])
	self.std_res = np.tile(self.std, [56, 56, 1])
	self.res_i = int(self.res)
	self.out_res = 27.
	if model_type < 0:
	self.out_res = 6.
	self.out_res_i = int(self.out_res) + 1
	self.logit_factor = 16.
	if model_type < 0:
	self.logit_factor = 8.

Rescale the network