emilianavt / OpenSeeFace

as stated above do you have any interest in supporting the oak-d line of spatial 3d cameras?, as with the kickstarter that is going around they are now at the same price as the leap controller though that price will rise. the products use a open python sdk called depthai.

Those cameras look interesting, but OpenSeeFace is mainly concerned with inferring landmarks from RGB images and a putting together the final training dataset for the models was a lot of work. I'm unlikely to find the time or resources to put together anything remotely similar for depth cameras.

@silverhikari theoretically you can take the onnx model and convert it to OpenVino so it run on the Oak-D.

I got a OAK-D Lite and will try to make it work with VSeeFace. If i dont forget i can inform you if my experiment works out.

@emilianavt the OAK-Ds have RGB cameras too, so technically you only need to convert the model (there is a python script for that) and interface the camera instead of calling the CNN yourself.

Using the Position data from depthcamera is more of a bonus (or in my case for handtracking instead of a leapmotion)

I see, if they have RGB too, it should work!

yep there is a 4k RGB camera (the middle one usually), but also the stereo cameras are accessible individually as black and white cams (480p iirc)
The more interesting thing is to run the NN on the cam though as it has a AI-Chip onboard and then just grabbing the output data, hence the conversion to OpenVINO

btw, do you by any chance have a layout of the OSC / VMC protocol that VSeeFace uses (the message names so to speak)
im not very good with Japanese andd it seems not creating landmarks (if not specifically needed would be helping a lot)

The VMC protocol only transmits blendshapes and bones. OpenSeeFace's face tracking data is transmitted using custom UDP packets. It's probably easiest to understand from the parser: https://github.com/emilianavt/OpenSeeFace/blob/master/Unity/OpenSee.cs#L137

There is also some English language documentation on the VMC protocol here: https://protocol.vmc.info/english.html

Oh thank you for that link, i couldnt find that on the site, probably cause you come to the jap version from google. Yep have found that parser, currently trying to reverse engineer where values are coming from and what they mean. Im absolutely not familiar with python, so my small tool will be C# with some C++(sadly DepthAI has only Python and C++ APIs). But i found some simple examples that include gaze and headtracking so if a converted model of yours wont work out of the box i will try to go with that one and try to match the OpenSee protocoll. Holen Sie sich Outlook für Android<https://aka.ms/ghei36>

…

________________________________ From: Emiliana ***@***.***> Sent: Sunday, January 16, 2022 6:55:44 PM To: emilianavt/OpenSeeFace ***@***.***> Cc: TheMasterofBlubb ***@***.***>; Comment ***@***.***> Subject: Re: [emilianavt/OpenSeeFace] any interest in supporting oak-d/oak-d-lite camera? (#32) The VMC protocol only transmits blendshapes and bones. OpenSeeFace's face tracking data is transmitted using custom UDP packets. It's probably easiest to understand from the parser: https://github.com/emilianavt/OpenSeeFace/blob/master/Unity/OpenSee.cs#L137 There is also some English language documentation on the VMC protocol here: https://protocol.vmc.info/english.html — Reply to this email directly, view it on GitHub<#32 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADBTO5HFIKSJRERZMRO72SLUWMBCBANCNFSM5GDIAOHA>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you commented.Message ID: ***@***.***>

If you are not familiar with python, the trickier part might be figuring out the decoding for the model's output. The current code for that is a bit dense and optimized:

OpenSeeFace/model.py

Lines 168 to 178 in baff2c0

    
           t_main = x[:, 0:66].reshape((-1, 66, 28*28)) 
        
           t_m = t_main.argmax(dim=2) 
        
           indices = t_m.unsqueeze(2) 
        
           t_conf = t_main.gather(2, indices).squeeze(2) 
        
           t_off_x = x[:, 66:132].reshape((-1, 66, 28*28)).gather(2, indices).squeeze(2) 
        
           t_off_y = x[:, 132:198].reshape((-1, 66, 28*28)).gather(2, indices).squeeze(2) 
        
           t_off_x = (223. * logit_arr(t_off_x) + 0.5).floor() 
        
           t_off_y = (223. * logit_arr(t_off_y) + 0.5).floor() 
        
           t_x = 223. * (t_m / 28.).floor() / 27. + t_off_x 
        
           t_y = 223. * t_m.remainder(28.).float() / 27. + t_off_y 
        
           x = (t_conf.mean(1), torch.stack([t_x, t_y, t_conf], 2))

In some very early versions, there should be a more readable function for decoding landmarks in tracker.py though.

Edit: I found it:

OpenSeeFace/tracker.py

Lines 105 to 111 in 0690bdd

    
           def logit(p, factor=16.0): 
        
               if p >= 1.0: 
        
                   p = 0.9999999 
        
               if p <= 0.0: 
        
                   p = 0.0000001 
        
               p = p/(1-p) 
        
               return float(np.log(p)) / float(factor)

OpenSeeFace/tracker.py

Lines 641 to 660 in 0690bdd

    
           def landmarks(self, tensor, crop_info): 
        
               crop_x1, crop_y1, scale_x, scale_y, _ = crop_info 
        
               avg_conf = 0 
        
               lms = [] 
        
               res = self.res - 1 
        
               for i in range(0, 66): 
        
                   m = int(tensor[i].argmax()) 
        
                   x = m // 28 
        
                   y = m % 28 
        
                   conf = float(tensor[i][x,y]) 
        
                   avg_conf = avg_conf + conf 
        
                   off_x = res * ((1. * logit(tensor[66 + i][x, y])) - 0.0) 
        
                   off_y = res * ((1. * logit(tensor[66 * 2 + i][x, y])) - 0.0) 
        
                   off_x = math.floor(off_x + 0.5) 
        
                   off_y = math.floor(off_y + 0.5) 
        
                   lm_x = crop_y1 + scale_y * (res * (float(x) / 27.) + off_x) 
        
                   lm_y = crop_x1 + scale_x * (res * (float(y) / 27.) + off_y) 
        
                   lms.append((lm_x,lm_y,conf)) 
        
               avg_conf = avg_conf / 66. 
        
               return (avg_conf, np.array(lms))

@TheMasterofBlubb How did you get on with converting the model to OpenVino and generating OpenSEeFace compatible packets?

	t_main = x[:, 0:66].reshape((-1, 66, 28*28))
	t_m = t_main.argmax(dim=2)
	indices = t_m.unsqueeze(2)
	t_conf = t_main.gather(2, indices).squeeze(2)
	t_off_x = x[:, 66:132].reshape((-1, 66, 28*28)).gather(2, indices).squeeze(2)
	t_off_y = x[:, 132:198].reshape((-1, 66, 28*28)).gather(2, indices).squeeze(2)
	t_off_x = (223. * logit_arr(t_off_x) + 0.5).floor()
	t_off_y = (223. * logit_arr(t_off_y) + 0.5).floor()
	t_x = 223. * (t_m / 28.).floor() / 27. + t_off_x
	t_y = 223. * t_m.remainder(28.).float() / 27. + t_off_y
	x = (t_conf.mean(1), torch.stack([t_x, t_y, t_conf], 2))

	def logit(p, factor=16.0):
	if p >= 1.0:
	p = 0.9999999
	if p <= 0.0:
	p = 0.0000001
	p = p/(1-p)
	return float(np.log(p)) / float(factor)

	def landmarks(self, tensor, crop_info):
	crop_x1, crop_y1, scale_x, scale_y, _ = crop_info
	avg_conf = 0
	lms = []
	res = self.res - 1
	for i in range(0, 66):
	m = int(tensor[i].argmax())
	x = m // 28
	y = m % 28
	conf = float(tensor[i][x,y])
	avg_conf = avg_conf + conf
	off_x = res * ((1. * logit(tensor[66 + i][x, y])) - 0.0)
	off_y = res * ((1. * logit(tensor[66 * 2 + i][x, y])) - 0.0)
	off_x = math.floor(off_x + 0.5)
	off_y = math.floor(off_y + 0.5)
	lm_x = crop_y1 + scale_y * (res * (float(x) / 27.) + off_x)
	lm_y = crop_x1 + scale_x * (res * (float(y) / 27.) + off_y)
	lms.append((lm_x,lm_y,conf))
	avg_conf = avg_conf / 66.
	return (avg_conf, np.array(lms))