Google Meet background segmentation model

Question

Google Meet background segmentation model

jameshfisher opened this issue 4 years ago · comments

System information

TensorFlow.js version (you are using): 2
Are you willing to contribute it (Yes/No): No, it's not mine

Describe the feature and the current behavior/state.
This Google AI blog post describes the background segmentation model used in Google Meet. This model would be an excellent complement to the models in the tfjs-models collection. (The existing BodyPix model can be (ab)used for background segmentation, but has quality and performance issues for this use-case. I expect the Google Meet model improves on this.)

Will this change the current api? How?
No, it would be an addition to tfjs-models.

Who will benefit with this feature?
Apps consuming and/or displaying a user-facing camera feed. WebRTC video chat apps are the most obvious, where background blur/replacement is becoming expected. I also expect it could be a useful preprocessing step before applying e.g. PoseNet. It can also be used creatively on images as a pre-processing step -- for example, this recent app to enhance profile pictures integrates a background segmentation solution.

w-okada commented 3 years ago

Wow!!

RajeshT · Answer 1 · Wed Nov 04 2020 01:34:45 GMT+0800 (China Standard Time)

cc @annxingyuan @tafsiri

simon-lanf · Answer 2 · Sat Nov 07 2020 02:56:49 GMT+0800 (China Standard Time)

this would be useful for us.

Yannick Assogba · Answer 3 · Sat Nov 07 2020 03:52:36 GMT+0800 (China Standard Time)

I'll pass this on to our PM.

Jim Fisher · Answer 4 · Wed Nov 11 2020 16:58:35 GMT+0800 (China Standard Time)

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

Jason Mayes · Answer 5 · Thu Nov 12 2020 02:42:05 GMT+0800 (China Standard Time)

+1 to this! Would love to see this as part of the model repos for TFJS - a lot of people making Chrome Extensions to do great things in video calls etc and this would just make those experiences even more efficient when running to get higher FPS etc.

Alvaro Schipper · Answer 6 · Wed Nov 25 2020 23:56:35 GMT+0800 (China Standard Time)

+1 to this, would be a great, faster alternative to body-pix, really impressed by the performance in Google Meet :)

Kirawi · Answer 7 · Thu Dec 17 2020 01:14:02 GMT+0800 (China Standard Time)

Very desirable to have! Though I did just link to this issue from the Jitsi Meets repository, I think it would be very cool to have for other projects that need this functionality but don't have the capabilities to develop an in-house model.

Jim Fisher · Answer 8 · Thu Dec 17 2020 01:57:38 GMT+0800 (China Standard Time)

The blog post about this model links to this Model Card describing the model, which reads

LICENSED UNDER Apache License, Version 2.0

The Model Card also links to this paper describing Model Cards in general, which says that Model Cards can describe a license that the model is released under. So I believe the above license applies to the described model itself (e.g. rather than to the Model Card document).

So it seems like the raw .tflite model here is already Apache-licensed! @jasonmayes would you agree with this / is this Google's position?

(Thanks to @blaueente for originally noting this license in the Model Card!)

stanhrivnak · Answer 9 · Wed Dec 23 2020 07:49:00 GMT+0800 (China Standard Time)

Note: I'd also be happy if just the raw model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) was released under a permissive license - I can figure out the model structure and JavaScript wiring :-)

@jameshfisher I have successfully deployed the raw tflite model (BTW. many thanks for the link!) within a desktop app using MediaPipe. But I failed to do so for web app, since MediaPipe doesn't have any documentation for it yet (just some JS API's for specific examples, but not for custom models). But it looks like you're saying that you did it. How? Have you extracted the layers of the model + weights and "manually" created the same TF model and then converted it to TFJS? Or have you managed to compile the tflite to wasm and use MediaPipe?
Many thanks!

Kirawi · Answer 10 · Wed Dec 23 2020 07:52:07 GMT+0800 (China Standard Time)

@stanhrivnak I found this while looking into it myself: https://gist.github.com/tworuler/bd7bd4c6cd9a8fbbeb060e7b64cfa008 Unfortunately, I'm not familiar with tensorflow (sad Amd gpu gang), so I have no idea how it works or how to modify it. PINTO0309 uses modified versions of that script for his tflite -> pb scripts.

Katsuya Hyodo · Answer 11 · Thu Dec 24 2020 23:58:56 GMT+0800 (China Standard Time)

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, I'm going to delete it.

Kirawi · Answer 12 · Fri Dec 25 2020 02:02:20 GMT+0800 (China Standard Time)

I have generated and committed models for .pb, .tflite float32/float16, INT8, EdgeTPU, TFJS, TF-TRT, CoreML, and OpenVINO IR for testing. However, I was so exhausted that I did not create a test program to test it. I would be very happy if you could test it with your help. 😃
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

If there are any licensing issues, remove it.

Amazing work!

Katsuya Hyodo · Answer 13 · Fri Dec 25 2020 06:56:00 GMT+0800 (China Standard Time)

There was a Japanese engineer who implemented it in TFJS. There still seems to be a little problem with the conversion. It gets shifted to the left. Also, there is no smoothing post-processing called "light wrapping", so the border is jagged.

EqCOpUxU8AA9G2Z.mp4

Kirawi · Answer 14 · Fri Dec 25 2020 07:04:15 GMT+0800 (China Standard Time)

Is the shifting fixable?

Katsuya Hyodo · Answer 15 · Fri Dec 25 2020 07:07:18 GMT+0800 (China Standard Time)

I'm using my own tricks in the optimization phase, so that may be affecting the results. Please give me some time so I can try this out.

Katsuya Hyodo · Answer 16 · Fri Dec 25 2020 21:23:26 GMT+0800 (China Standard Time)

Is the shifting fixable?

It worked. However, the model resolution of 128x128 does not seem to be very accurate.

Kirawi · Answer 17 · Sat Dec 26 2020 02:28:26 GMT+0800 (China Standard Time)

That's unfortunate, but nonetheless amazing work man!

Kirawi · Answer 18 · Sat Dec 26 2020 08:14:25 GMT+0800 (China Standard Time)

Ah wait, I think that is intentional to reduce the computational requirements of the model. The bilateral filter mentioned in the blog further refines the mask, and it might be the case that the model works best with bright colours. I think all things considered, the model does its job fairly well. By the way, mind sharing the test setup you have for the model?

Katsuya Hyodo · Answer 19 · Sat Dec 26 2020 08:46:15 GMT+0800 (China Standard Time)

@kirawi
I did not use bilateral filter and just binarized the image, so the result may not be good.

### Download test.jpg
$ sudo gdown --id 1Tyv6P2zshOCqTgYBLoa0aC3Co8W-9JPG

### Download segm_lite_v509_128x128_float32.tflite
$ sudo gdown --id 1qOlcK8iKki_aAi_OrxE2YLaw5EZvQn1S

import numpy as np
from PIL import Image
try:
    from tflite_runtime.interpreter import Interpreter
except:
    from tensorflow.lite.python.interpreter import Interpreter

img = Image.open('test.jpg')
h = img.size[1]
w = img.size[0]
img = img.resize((128, 128))
img = np.asarray(img)
img = img / 255.
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

# Tensorflow Lite
interpreter = Interpreter(model_path='segm_lite_v509_128x128_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]['index']
output_details = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(input_details, img)
interpreter.invoke()
output = interpreter.get_tensor(output_details)

print(output.shape)
out1 = output[0][:, :, 0]
out2 = output[0][:, :, 1]

out1 = (out1 > 0.5) * 255
out2 = (out2 > 0.5) * 255

print('out1:', out1.shape)
print('out2:', out2.shape)

out1 = Image.fromarray(np.uint8(out1)).resize((w, h))
out2 = Image.fromarray(np.uint8(out2)).resize((w, h))

out1.save('out1.jpg')
out2.save('out2.jpg')

w-okada · Answer 20 · Sat Dec 26 2020 09:01:20 GMT+0800 (China Standard Time)

I create the demo page to use PINTO's model converted to tensorflowjs.

https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/t11_googlemeet-segmentation/index.html

You can change input device with control panel at right side. If you want to use your camera device, please try.

And at default this page use new version of PINTO's model, but it seems shift to left a little yet...

You can change the model to old version of PINTO's model with the control panel at right side too.
Select modelPath and click reload model button.

Katsuya Hyodo · Answer 21 · Sat Dec 26 2020 09:10:04 GMT+0800 (China Standard Time)

I overlaid the image with the tflite implementation at hand. Does it shift when I apply the filter?

Screencast.2020-12-26.10.03.33.mp4

Kirawi · Answer 22 · Sat Dec 26 2020 09:59:10 GMT+0800 (China Standard Time)

I don't think it's shifting, it looks more like the one with the white background is capturing more of the background than the other one.

Katsuya Hyodo · Answer 23 · Sat Dec 26 2020 10:01:16 GMT+0800 (China Standard Time)

@kirawi
I am currently investigating this issue in collaboration with @w-okada on twitter.

w-okada · Answer 24 · Sun Dec 27 2020 03:12:53 GMT+0800 (China Standard Time)

mmmm, I spent a lot of time to solve the "shifting" problem yesterday. However, I couldn't.
Can anybody help me?
This is my simple test code with nodejs.

const tf = require('@tensorflow/tfjs-node');
const fs = require('fs');
const jpeg = require('jpeg-js');
const { createCanvas, loadImage } = require('canvas')

const readImage = path => {
    const buf = fs.readFileSync(path)
    const pixels = jpeg.decode(buf, true)
    return pixels
}

const imageByteArray = (image, numChannels) => {
    const pixels = image.data
    const numPixels = image.width * image.height;
    const values = new Int32Array(numPixels * numChannels);
  
    for (let i = 0; i < numPixels; i++) {
      for (let channel = 0; channel < numChannels; ++channel) {
        values[i * numChannels + channel] = pixels[i * 4 + channel];
      }
    }  
    return values
}
  

const main = async()=>{
    const image = readImage("test.jpg")
    const handler = tf.io.fileSystem("./model/model.json");
    const model = await tf.loadGraphModel(handler)
    const numChannels=3
    const values = imageByteArray(image, numChannels)
    const outShape = [image.width, image.height, numChannels];
    let input = tf.tensor3d(values, outShape, 'float32');


    input = tf.image.resizeBilinear(input,[128, 128])
    input = input.expandDims(0)
    input = tf.cast(input, 'float32')
    input = input.div(tf.max(input))

    let predict = await model.predict(input)
    predict = predict.softmax()
    const res = await predict.arraySync()
    const bm = res[0]
    const width = bm[0].length
    const height = bm.length
    const canvas = createCanvas(width, height)
    const imageData = canvas.getContext("2d").getImageData(0, 0, canvas.width, canvas.height)
    for (let rowIndex = 0; rowIndex < canvas.height; rowIndex++) {
        for (let colIndex = 0; colIndex < canvas.width; colIndex++) {
            const pix_offset = ((rowIndex * canvas.width) + colIndex) * 4
            if(bm[rowIndex][colIndex][0]>0.5){
                imageData.data[pix_offset + 0] = 255
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }else{
                imageData.data[pix_offset + 0] = 0
                imageData.data[pix_offset + 1] = 0
                imageData.data[pix_offset + 2] = 0
                imageData.data[pix_offset + 3] = 128
            }
        }
    }
    // const imageDataTransparent = new NodeCanvasImageData(data, this.canvas.width, this.canvas.height);
    canvas.getContext("2d").putImageData(imageData, 0, 0)

    const tmpCanvas = createCanvas(image.width, image.height)
    tmpCanvas.getContext("2d").drawImage(canvas, 0, 0, tmpCanvas.width, tmpCanvas.height)
    const buf = tmpCanvas.toBuffer('image/png')
    fs.writeFileSync('./res.png', buf)
}

main()

stanhrivnak · Answer 25 · Mon Dec 28 2020 09:20:18 GMT+0800 (China Standard Time)

Hi guys, first of all, many thanks to @PINTO0309, @w-okada, and others for putting your effort on this! Great work so far! I would really love to have this great model from google in my web app (currently I have bodypix with custom improvements, but still it sucks). Here are my 2 cents.
I have deployed the discussed original tflite model (https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite) within a desktop app using MediaPipe and it performs amazingly (see the attached video) even under not optimal light conditions. What you see is the raw model performance without any post-processing (with it, it looks even better), resolution 128 x 128.
https://user-images.githubusercontent.com/64148065/103182841-d2053c80-48ae-11eb-8ba1-1a1518c9defb.mov

The implications are:

There is hope - the model is already good enough, the resolution 128 x 128 is high enough to have nice results when upsampling to SD/HD. Also, it's super-fast, inferences running well above 25 FPS.
There has to be a flaw in the manual conversion to h5/TFJS.

I think the best would be to compare the outputs of the original tflite model and the created TFJS model (or h5/tflite), layer after layer to see where it deviates and focus to fix that part.
The problem is that the original tflite model uses some custom ops, so it can't be read in python directly. But we know the definitions of these ops, here they are: (not sure if it uses all 3, but at least "Convolution2DTransposeBias", because that is the error it gives me in python)
https://github.com/google/mediapipe/tree/master/mediapipe/util/tflite/operations
The problem is that it's in C++, so it has to be rewritten to python or we need to go with Tensorflow C++. Also, as stated here:
google-ai-edge/mediapipe#35 (comment)
these custom ops are just merged existing operations, so it should be straight-forward.

So this is my plan. I can work on it only ~ 2 hours a day, so if you're faster, go for it and let me know! :) Or if you have any other ideas, share it please!

Katsuya Hyodo · Answer 26 · Mon Dec 28 2020 09:43:28 GMT+0800 (China Standard Time)

@stanhrivnak
I have already succeeded in replacing custom operations. You're right, it would be quicker to check the results of the output for each layer, but I don't have enough time to do that since I'm also working on converting other models at the same time.

https://github.com/PINTO0309/PINTO_model_zoo/blob/32f1a821bc3c8a04a53ba3e18a45921a136de889/082_MediaPipe_Meet_Segmentation/01_segm_lite_tflite2h5_weight_int_fullint_float16_quant.py#L691-L704

stanhrivnak · Answer 27 · Wed Dec 30 2020 06:52:33 GMT+0800 (China Standard Time)

@PINTO0309
Unfortunately, tflite format doesn't allow accessing intermediate results after each operation/layer, just the final output node... so we can't debug your code this way...
@jasonmayes
could you kindly provide information on when can we expect the release of the TFJS version of the model? Will it be in the order of weeks or months or "definitely not soon"? This information will greatly help us in our planning. Many thanks in advance!

simon-lanf · Answer 28 · Mon Jan 04 2021 16:29:36 GMT+0800 (China Standard Time)

@w-okada

https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/t11_googlemeet-segmentation/index.html

Could you publish the code for this page please ? Thank you.

Marcel Goya · Answer 29 · Mon Jan 04 2021 16:47:15 GMT+0800 (China Standard Time)

@simon-lanf You should be able to get it by simply opening the referenced JS/TSX files. Google DevTools is your friend here ....

Florian Echtler · Answer 30 · Mon Jan 04 2021 17:39:27 GMT+0800 (China Standard Time)

@w-okada this is entirely off-topic, but I just have to ask - was the picture in your post taken in Z10, by any chance?

w-okada · Answer 31 · Mon Jan 04 2021 18:49:03 GMT+0800 (China Standard Time)

@floe
I don't know. I just used the picture PINTO provided above post.

$ sudo gdown --id 1Tyv6P2zshOCqTgYBLoa0aC3Co8W-9JPG

w-okada · Answer 32 · Mon Jan 04 2021 18:51:31 GMT+0800 (China Standard Time)

@simon-lanf

This code is in my dev-branch.
You can see at (or clone from) https://github.com/w-okada/image-analyze-workers/tree/dev/011demo_googlemeet-segmentation-worker-js-demo

Florian Echtler · Answer 33 · Mon Jan 04 2021 19:42:09 GMT+0800 (China Standard Time)

Oh, now I see, the image is from PASCAL VOC. Sorry for the noise.

Florian Echtler · Answer 34 · Mon Jan 04 2021 19:43:01 GMT+0800 (China Standard Time)

JFYI, I have a C++ TFLite implementation using the Google Meet model for background segmentation: https://github.com/floe/deepbacksub

Katsuya Hyodo · Answer 35 · Mon Jan 04 2021 19:57:20 GMT+0800 (China Standard Time)

Since I was introduced to a full-size model, I will try to quantize it, including converting custom operations.

144x256
https://meet.google.com/_/rtcvidproc/release_1wttl/345264209/segm_full_v679.tflite

simon-lanf · Answer 36 · Mon Jan 04 2021 23:31:48 GMT+0800 (China Standard Time)

Can anyone tell if this one is different from v679 ?

https://meet.google.com/_/rtcvidproc/release_1wttl/345264209/segm_lite_v681.tflite

Florian Echtler · Answer 37 · Mon Jan 04 2021 23:44:58 GMT+0800 (China Standard Time)

@simon-lanf AFAICT it's the same model, just the resolution is different.

Kirawi · Answer 38 · Mon Jan 04 2021 23:47:09 GMT+0800 (China Standard Time)

That one is 96x160, I think

jiangjianping · Answer 39 · Tue Jan 05 2021 00:14:34 GMT+0800 (China Standard Time)

@tafsiri

Is there anything about the joint bilateral filter used in Google Meet? Which is the guide image? Thanks.

Katsuya Hyodo · Answer 40 · Tue Jan 05 2021 15:15:25 GMT+0800 (China Standard Time)

I replaced the custom OPs of the full-size model with standard OPs, and further converted them with my own optimization. I have not implemented any post-processing, but I think it performs quite well. The bilateral filter is not used.

segm_full_v679_144x256_opt_float32.tflite
https://drive.google.com/file/d/1tKhwGLJ3f0GYDAWFiufv0e7DGVfW6ztS/view?usp=sharing

I have also converted as much as possible for the various frameworks. If you run a TFJS model and experience misalignment, it is a problem with the TFJS runtime.

TFJS (Float32/Flot16), TF-TRT (Float32/Float16), TFLite (Float32/Float16,INT8), OpenVINO (FP32/FP16), CoreML
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

### Download test.jpg
$ sudo gdown --id 1Tyv6P2zshOCqTgYBLoa0aC3Co8W-9JPG

### Download segm_full_v679_144x256_opt_float32.tflite
$ sudo gdown --id 1tKhwGLJ3f0GYDAWFiufv0e7DGVfW6ztS

import numpy as np
from PIL import Image
try:
    from tflite_runtime.interpreter import Interpreter
except:
    from tensorflow.lite.python.interpreter import Interpreter

img = Image.open('test.jpg')
h = img.size[1]
w = img.size[0]
img = img.resize((256, 144))
img = np.asarray(img)
img = img / 255.
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

# Tensorflow Lite
interpreter = Interpreter(model_path='segm_full_v679_144x256_opt_float32.tflite', num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]['index']
output_details = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(input_details, img)
interpreter.invoke()
output = interpreter.get_tensor(output_details)

print(output.shape)
out1 = output[0][:, :, 0]
out2 = output[0][:, :, 1]

out1 = (out1 > 0.5) * 255
out2 = (out2 > 0.5) * 255

print('out1:', out1.shape)
print('out2:', out2.shape)

out1 = Image.fromarray(np.uint8(out1)).resize((w, h))
out2 = Image.fromarray(np.uint8(out2)).resize((w, h))

out1.save('out1.jpg')
out2.save('out2.jpg')

Katsuya Hyodo · Answer 41 · Tue Jan 05 2021 16:22:33 GMT+0800 (China Standard Time)

I re-committed, revising the conversion method and also improving the accuracy of the 128x128 Lite model.

segm_lite_v509_128x128_opt_float32
https://drive.google.com/file/d/1qOlcK8iKki_aAi_OrxE2YLaw5EZvQn1S/view?usp=sharing
TFJS (Float32/Flot16), TF-TRT (Float32/Float16), TFLite (Float32/Float16,INT8), OpenVINO (FP32/FP16), CoreML, and EdgeTPU
https://github.com/PINTO0309/PINTO_model_zoo/tree/master/082_MediaPipe_Meet_Segmentation

Florian Echtler · Answer 42 · Tue Jan 05 2021 17:21:40 GMT+0800 (China Standard Time)

@PINTO0309 excellent, thank you. Can you briefly summarize what optimizations you used?

w-okada · Answer 43 · Tue Jan 05 2021 17:30:34 GMT+0800 (China Standard Time)

Wow!!!
Great. With tfjs, it completely worked!

Demo page is here. You can try it!
https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/t11_googlemeet-segmentation/index.html

model.mp4

Amir · Answer 44 · Tue Jan 05 2021 17:53:46 GMT+0800 (China Standard Time)

@w-okada This is amazing!

w-okada · Answer 45 · Tue Jan 05 2021 19:00:49 GMT+0800 (China Standard Time)

With wasm, I get the image like below. Ummmm.

Katsuya Hyodo · Answer 46 · Tue Jan 05 2021 19:18:07 GMT+0800 (China Standard Time)

@floe

I used the following trick.

Fused bias, weight, and activation functions (ReLU/ReLU6) into Convolution, FullyConnected, and DepthwiseConvolution.
Since the tflite model published by Google is quantized to Float16, I dared to temporarily convert it to Float32 to support conversion to various frameworks.
In order to quantize INT8 and run it on a fast inference device called EdgeTPU, I made my own modifications to Hard-Swish.

### For TFJS, TFLite, TF-TRT, OpenVINO
hswish = x * tf.nn.relu6(x + 3) * 0.16666667
### For EdgeTPU
hswish = x * tf.nn.relu6(x + 3) * 0.16666666

Because of the problems with TensorFlow's ResizeBilinear, I did my own little trick.

jiangjianping · Answer 47 · Wed Jan 06 2021 00:54:35 GMT+0800 (China Standard Time)

@w-okada .
Excellent and beautiful! which post-process do you use?

simon-lanf · Answer 48 · Wed Jan 06 2021 15:17:32 GMT+0800 (China Standard Time)

@w-okada

Yeah I can reproduce it too, I can confirm that in WASM the results are different for the same images.

Kirawi · Answer 49 · Thu Jan 07 2021 02:06:31 GMT+0800 (China Standard Time)

Quick hacky joint bilateral filter. I know nothing about this, but it seems to work. Interestingly, out1 seems to be more accurate than out2.

import numpy as np
import cv2
try:
    from tflite_runtime.interpreter import Interpreter
except:
    from tensorflow.lite.python.interpreter import Interpreter

img = cv2.imread('Capture.png')
h = img.shape[0]
w = img.shape[1]

img = cv2.resize(img, (256, 144))
img = np.asarray(img)
img = img / 255.
img = img.astype(np.float32)
img = img[np.newaxis,:,:,:]

# Tensorflow Lite
interpreter = Interpreter(model_path='model_float16_quant.tflite', num_threads=4)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]['index']
output_details = interpreter.get_output_details()[0]['index']

interpreter.set_tensor(input_details, img)
interpreter.invoke()
output = interpreter.get_tensor(output_details)

print(output.shape)
out1 = output[0][:, :, 0]
out2 = output[0][:, :, 1]

out1 = np.invert((out1 > 0.5) * 255)
out2 = np.invert((out2 > 0.5) * 255)

print('out1:', out1.shape)
print('out2:', out2.shape)

out1 = cv2.resize(np.uint8(out1), (w, h))
out2 = cv2.resize(np.uint8(out2), (w, h))

cv2.imwrite('out1.jpg', out1)
cv2.imwrite('out2.jpg', out2)

out3 = cv2.ximgproc.jointBilateralFilter(out2, out1, 8, 75, 75)

cv2.imwrite('out3.jpg', out3)

jiangjianping · Answer 50 · Thu Jan 07 2021 08:27:43 GMT+0800 (China Standard Time)

@kirawi
Interesting. Why do you use the out2 as guide image?

Qi Wang · Answer 51 · Thu Jan 07 2021 08:28:43 GMT+0800 (China Standard Time)

I have to use tensorflow full model because I want to use DNN module in OpenCV.

But in my test, the output of segm_full_v679_opt.tflite and segm_full_v679_opt.pb looks different.
I do not apply any pre/post processing right now and threshold are both 0.5.

Any trick there?
Thanks!

Original image:

tflite output

tf output

Kirawi · Answer 52 · Thu Jan 07 2021 08:59:52 GMT+0800 (China Standard Time)

@kirawi
Interesting. Why do you use the out2 as guide image?

It's the otherway around, out2 is the joint image.

w-okada · Answer 53 · Thu Jan 07 2021 21:44:49 GMT+0800 (China Standard Time)

I made a very rough version of something JBF-like.
It's pretty smooth, but the FPS degrades significantly. I wonder if letting it go to another worker would improve it a bit?

out_trimed.13.mp4

Qi Wang · Answer 54 · Fri Jan 08 2021 06:16:23 GMT+0800 (China Standard Time)

I run the script 02_segm_full_v679_tflite_to_pb_saved_model.py using weights from weights/144x256 and the generated pb file is different from that in repo.

The result is not good as segm_full_v679_opt.tflite, but looks better than original output of segm_full_v679_opt.pb.

Katsuya Hyodo · Answer 55 · Fri Jan 08 2021 06:35:03 GMT+0800 (China Standard Time)

@jimmy7799
I seem to have committed a version of .pb that was tuned for EdgeTPU. In any case, .pb is deprecated.

Also, this is a tfjs issue, so I don't think it's really appropriate to discuss .pb or .tflite here. If necessary, you can issue an issue in my repository.

https://github.com/PINTO0309/PINTO_model_zoo/issues

Qi Wang · Answer 56 · Fri Jan 08 2021 06:40:38 GMT+0800 (China Standard Time)

@jimmy7799
I seem to have committed a version of .pb that was tuned for EdgeTPU. In any case, .pb is deprecated.

Also, this is a tfjs issue, so I don't think it's really appropriate to discuss .pb or .tflite here. If necessary, you can issue an issue in my repository.

https://github.com/PINTO0309/PINTO_model_zoo/issues

OK. Thanks!

jiangjianping · Answer 57 · Fri Jan 08 2021 18:05:01 GMT+0800 (China Standard Time)

@w-okada ,
For JBF, I wonder which is your guide image against the output mask? I test with output mask against source/last mask/the same mask using opencv JBF, It looks no obvious effect.

w-okada · Answer 58 · Fri Jan 08 2021 18:40:56 GMT+0800 (China Standard Time)

@jiangjianping
I used original image as guidimage. And note. I made JBF like filter by myself, and this is maybe not JBF.

jiangjianping · Answer 59 · Tue Jan 12 2021 07:12:41 GMT+0800 (China Standard Time)

I wonder who is knowing how to implement light wrapping effect with open source code?

w-okada · Answer 60 · Wed Jan 13 2021 17:35:39 GMT+0800 (China Standard Time)

I tried to run the model on the webworker. It reached to 100fps!? (maybe... if my performance counter is not broken...)
Note. my pc has RTX1660.

Amir · Answer 61 · Wed Jan 13 2021 17:46:32 GMT+0800 (China Standard Time)

Is there a tfjs we can run on the client side yet?

simon-lanf · Answer 62 · Wed Jan 13 2021 23:57:56 GMT+0800 (China Standard Time)

@amiregelz https://github.com/w-okada/image-analyze-workers/tree/master/011_googlemeet-segmentation-worker-js

Amir · Answer 63 · Thu Jan 14 2021 06:18:39 GMT+0800 (China Standard Time)

@simon-lanf @w-okada
In the demo page @w-okada published my CPU hits 200%, is there any way to avoid this? I thought it should be less intense in terms of computing than bodypix

w-okada · Answer 64 · Thu Jan 14 2021 07:30:17 GMT+0800 (China Standard Time)

@amiregelz
Because it uses 5 webworkers in my demo. You can use the model on one webworker or on main thread if you want.
Anyway, using 5 webworkers was overkill to achive high fps. So I fix it to 2 webworkers. Try it.

w-okada · Answer 65 · Thu Jan 14 2021 18:41:31 GMT+0800 (China Standard Time)

@amiregelz
U---nn,,, If you want to reduce the CPU usage, you can do throttling the cycle of operation.
In my demo, it loops with high tense. For instance, in my demo it reachs to 70 fps with my PC. But it is not needed in real use. You can less fps with throttling.

Kirawi · Answer 66 · Thu Jan 14 2021 22:58:39 GMT+0800 (China Standard Time)

You could look through the code for the demo.

Amir · Answer 67 · Sun Jan 17 2021 05:16:00 GMT+0800 (China Standard Time)

@w-okada @kirawi Thanks. What's the most CPU efficient way to remove the pixels of the background (or make them transparent) based on the result of manager.predict? I want to draw only the person to the destination canvas.

w-okada · Answer 68 · Sun Jan 17 2021 15:02:41 GMT+0800 (China Standard Time)

@amiregelz
your idea is one of solutions. If you think this TFJS model can run with the same performance as the original one, it is maybe not true. In google blog, they said they use the original model with tflite and XNNPACK. And you can see the fact tfjs and mediapipe is different each other in the media pipe discussion. for example google-ai-edge/mediapipe#1156 (comment)

In my feeling, this tfjs model is about only 1.2 ~ 1.4 faster the bodypix with mobilenetv1. Perhaps MobilenetV3 small and NAS make this improvement.
Regarding accuracy, probably it is more better?
(It depends on the configuration.)

Amir · Answer 69 · Sun Jan 17 2021 23:49:04 GMT+0800 (China Standard Time)

@w-okada Got it. I'm trying to achieve the highest fps possible even in the cost of accuracy or quality. Are there any optimizations I can do in terms of rendering / canvas constructions to minimize CPU load and allow higher frame rate?

w-okada · Answer 70 · Thu Jan 21 2021 15:58:48 GMT+0800 (China Standard Time)

My report.
https://dannadori.medium.com/google-meet-virtual-background-with-amazon-chime-sdk-34656a625fed

Katsuya Hyodo · Answer 71 · Sat Jan 23 2021 22:12:13 GMT+0800 (China Standard Time)

I have just started to create a tool to convert tflite to saved_model, TFJS, TF-TRT, CoreML, and EdgeTPU. Automatically replaces the custom operation Convolution2DTransposeBias with the standard operation. I plan to gradually increase the types of operations that the tool can handle.
https://github.com/PINTO0309/tflite2tensorflow.git

benbro · Answer 72 · Tue Feb 02 2021 18:29:48 GMT+0800 (China Standard Time)

The blog post about this model links to this Model Card describing the model, which reads

LICENSED UNDER Apache License, Version 2.0

The Model Card also links to this paper describing Model Cards in general, which says that Model Cards can describe a license that the model is released under. So I believe the above license applies to the described model itself (e.g. rather than to the Model Card document).

So it seems like the raw .tflite model here is already Apache-licensed! @jasonmayes would you agree with this / is this Google's position?

(Thanks to @blaueente for originally noting this license in the Model Card!)

The blog post links to this model card which says:

LICENSED UNDER Google Terms of Service

Does it means that the model isn't open source?

Jason Mayes · Answer 73 · Wed Feb 03 2021 05:37:55 GMT+0800 (China Standard Time)

Speaking from the TFJS perspective we typically release our code under Apache 2.0 which is the Google standard when open sourcing stuff on say Github. We should confirm with MediaPipe here though in case there is something I am missing or if anything different for this model specifically.

…

On Tue, Feb 2, 2021 at 2:30 AM benbro ***@***.***> wrote: The blog post about this model <https://ai.googleblog.com/2020/10/background-features-in-google-meet.html> links to this Model Card describing the model <https://drive.google.com/file/d/1WvHxUONoATFJ9JpSgAF7dU_F5V58HItS/view>, which reads LICENSED UNDER Apache License, Version 2.0 The Model Card also links to this paper describing Model Cards in general <https://arxiv.org/pdf/1810.03993.pdf>, which says that Model Cards can describe a license that the model is released under. So I believe the above license applies to the described model itself (e.g. rather than to the Model Card document). So it seems like the raw .tflite model here <https://meet.google.com/_/rtcvidproc/release/336842817/segm_lite_v509.tflite> is already Apache-licensed! @jasonmayes <https://github.com/jasonmayes> would you agree with this / is this Google's position? (Thanks to @blaueente <https://github.com/blaueente> for originally noting this license in the Model Card!) The blog post <https://ai.googleblog.com/2020/10/background-features-in-google-meet.html> links to this model card <https://drive.google.com/file/d/1lnP1bRi9CSqQQXUHa13159vLELYDgDu0/view> which says: LICENSED UNDER Google Terms of Service Does it means that the model isn't open source? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4177 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABF6DRNDNSWZKBMVR34UC4TS47H27ANCNFSM4TJALO2A> .

-- * • **Jason Mayes* * • *Senior Developer Advocate, TensorFlow.js <https://www.tensorflow.org/js/> *• *Google • *Follow me <http://linkedin.com/in/CreativeTech> or get inspired with #MadeWithTFJS demos <https://www.youtube.com/playlist?list=PLQY2H8rRoyvzSZZuF0qJpoJxZR1NgzcZw>* This email may be confidential and privileged. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

kooler · Answer 74 · Wed Feb 03 2021 05:42:50 GMT+0800 (China Standard Time)

@jasonmayes but just to clarify, from Google's point of view are there any licensing issues with others using the .tflite model?

simon-lanf · Answer 75 · Wed Feb 03 2021 05:45:44 GMT+0800 (China Standard Time)

@benbro that's new. It used to say Apache 2.0

Ashik Salim · Answer 76 · Fri Feb 05 2021 09:20:39 GMT+0800 (China Standard Time)

@benbro that's new. It used to say Apache 2.0

As a general question, does open source work that way? As in can you re-license something that was already released under an open source license?

simon-lanf · Answer 77 · Fri Feb 05 2021 22:18:31 GMT+0800 (China Standard Time)

@ashikns yes. You can always change the license, if everyone who wrote any code accepts, or if the license allows it without everyone's approval. Apache 2.0 requires that everyone approves, but since everyone who wrote the code is a Google employee, Google decides in the end. I also think that their work contract might state that Google can relicense their code written on the job anytime, for any reason, without any approval

I must add that this is not retroactive. Older versions keep their license and new versions use the new license.

Also, I am not a lawyer, i might have some details wrong.

https://en.wikipedia.org/wiki/Software_relicensing

Vedanta Jha · Answer 78 · Tue Feb 09 2021 23:56:36 GMT+0800 (China Standard Time)

@jasonmayes Could you please let us know if the meet's tflite model can be used for commercial uses or not? Thanks

Jason Mayes · Answer 79 · Wed Feb 10 2021 06:02:05 GMT+0800 (China Standard Time)

I'm afraid I can not answer on behalf of the meet team / TFLite - if the original model is from one of these teams then I would advise asking someone from one of those teams as I was not involved in the development of this model and its release process. If I find out anything on my side though I will of course update the thread.

mirrorlink · Answer 80 · Tue Feb 23 2021 02:27:51 GMT+0800 (China Standard Time)

@PINTO0309 hey, can I use this model 082_MediaPipe_Meet_Segmentation without mediapipe? using a normal tflite implementation?

OH YES LORD I FOUND IT

https://drive.google.com/file/d/1tKhwGLJ3f0GYDAWFiufv0e7DGVfW6ztS/view?usp=sharing

ITS THIS ONE RIGHT? OH YES YOU ARE THE BEST JAPANESE JAPAN HAS EVER SEEN, ITS WORKING

ALSO I STUDIED JAPANESE IN COLLEGE, NIHONGO WO SUKOCHI HANASHIMASU

BESIDES, PINTO MEANS DICK IN PORTUGUESE AND THATS AWESOME

I LOVE YOU

SOOOO

MUCH

benbro · Answer 81 · Sun Mar 14 2021 00:08:27 GMT+0800 (China Standard Time)

Anyone tried to run BackgroundMattingV2 in a web browser?

Saúl Ibarra Corretgé · Answer 82 · Wed Apr 07 2021 18:52:56 GMT+0800 (China Standard Time)

Does anyone here have a copy of the initial (Apache 2 licensed) model card?

Florian Echtler · Answer 83 · Thu Apr 15 2021 18:30:33 GMT+0800 (China Standard Time)

Does anyone here have a copy of the initial (Apache 2 licensed) model card?

Tried to get it via Wayback Machine, but no luck so far...

w-okada · Answer 84 · Wed Apr 28 2021 20:14:41 GMT+0800 (China Standard Time)

@benbro told me the new model is released under Apache2.0
The input size is lager then previous one, so performance(response time) is a little bit worse. But enough fast.

This is demo. select model 256x256
https://flect-lab-web.s3-us-west-2.amazonaws.com/P01_wokers/tfl001_google-meet-segmentation/index.html

BTW, keep model card in your storage!!!! :P
https://developers.google.com/ml-kit/images/vision/selfie-segmentation/selfie-model-card.pdf

Saúl Ibarra Corretgé · Answer 85 · Fri Apr 30 2021 16:47:26 GMT+0800 (China Standard Time)

BTW, keep model card in your storage!!!! :P
https://developers.google.com/ml-kit/images/vision/selfie-segmentation/selfie-model-card.pdf

That is not the meet model model card, alas.

w-okada · Answer 86 · Fri Apr 30 2021 20:28:27 GMT+0800 (China Standard Time)

@saghul
Oh, sorry, yes. That is not of original google meet segment model's.

Euan Smith · Answer 87 · Wed May 12 2021 20:20:45 GMT+0800 (China Standard Time)

The original model card, for anybody still looking: Model Card-Meet Segmentation.pdf

Saúl Ibarra Corretgé · Answer 88 · Wed May 12 2021 20:56:43 GMT+0800 (China Standard Time)

That's the one! Cheers!

Euan Smith · Answer 89 · Thu May 13 2021 01:40:08 GMT+0800 (China Standard Time)

Note that although Google did release the Meet model under the Apache 2.0 licence with that model card pasted above, they no longer have it available for download and there is now a different card with a different licence.

Ashik Salim · Answer 90 · Thu May 13 2021 07:31:15 GMT+0800 (China Standard Time)

Yep. The new model is called "Xeno" meet segmentation or something. This is the apache released model: OneDrive link.

Also if you tinker around a bit with google meet webpage you can still download the models directly from Google, you just need to find the right url from the js script. At least that was still working as of February.

Ming Yong · Answer 91 · Thu Jun 10 2021 06:16:10 GMT+0800 (China Standard Time)

Hi, I am product manager for MediaPipe. Please note that only the MediaPipe Selfie Segmentation Model is open sourced and licensed under Apache 2 for external use. Other versions, including those used in the Google Meet product, are licensed under Google Terms and Conditions and are not intended for open source use.

Saúl Ibarra Corretgé · Answer 92 · Thu Jun 10 2021 15:39:25 GMT+0800 (China Standard Time)

@jasonmayes Why was this closed?

Jason Mayes · Answer 93 · Fri Jun 11 2021 02:28:22 GMT+0800 (China Standard Time)

Closed as the folk from MediaPipe clarified the T&C for the models they released.

Na Li · Answer 94 · Sat Jun 12 2021 01:34:35 GMT+0800 (China Standard Time)

Reopen to track the segmentation model release through tfjs API

Qi Wang · Answer 95 · Sat Jul 17 2021 05:17:52 GMT+0800 (China Standard Time)

From https://meet.google.com/,

https://meet.google.com/_/rtcvidproc/release/hashed/segm_full_sparse_v1008_0bda82336d236e21e52f2b74129b9883.dat
https://meet.google.com/_/rtcvidproc/release/hashed/segm_lite_v1082_c59fbb2b8451df2c2752e562c6523bcc.dat

Looks like the latest model is hashed and can not be downloaded anymore.

Nikhil Mehta · Answer 96 · Sat Oct 02 2021 19:51:21 GMT+0800 (China Standard Time)

From https://meet.google.com/,

https://meet.google.com/_/rtcvidproc/release/hashed/segm_full_sparse_v1008_0bda82336d236e21e52f2b74129b9883.dat https://meet.google.com/_/rtcvidproc/release/hashed/segm_lite_v1082_c59fbb2b8451df2c2752e562c6523bcc.dat

Looks like the latest model is hashed and can not be downloaded anymore.

@jimmy7799 We are doomed then? or have we found some way to get the model?

Saúl Ibarra Corretgé · Answer 97 · Sat Oct 02 2021 20:46:57 GMT+0800 (China Standard Time)

Even if you can get it, you are not allowed to use it.

Nikhil Mehta · Answer 98 · Sat Oct 02 2021 22:07:42 GMT+0800 (China Standard Time)

@saghul I know I just want to try that out on local. No intention to use it in open source or commercial project.

Florian Echtler · Answer 99 · Sun Oct 03 2021 23:54:37 GMT+0800 (China Standard Time)

JFYI, the MediaPipe Selfie Segmentation model is a) properly Apache licensed and b) can just be downloaded as an Android AAR archive. See https://drive.google.com/file/d/1dCfozqknMa068vVsO2j_1FgZkW_e3VWv/preview .