RobustVideoMatting model conversion

Question

RobustVideoMatting model conversion

livingbeams opened this issue 7 months ago · comments

Eduardo Estopiñán commented 7 months ago

Issue Type

Support

OS

Windows

OS architecture

x86_64

Programming Language

Python

Framework

ONNX

Model name and Weights/Checkpoints URL

rvm_mobilenetv3_HxW.onnx
https://github.com/PINTO0309/PINTO_model_zoo

rvm_mobilenetv3_fp16.onnx
rvm_mobilenetv3_fp32.onnx
rvm_mobilenetv3.pth
https://github.com/PeterL1n/RobustVideoMatting

Description

First of all thanks for maintaining this amazing model conversion work.

I'm stuck trying to convert the RobustVideoMatting model and I would like to know if you could guide me.

Inside the file:
https://s3.ap-northeast-2.wasabisys.com/pinto-model-zoo/242_RobustVideoMatting/resources_mbnv3.tar.gz

I can find some fixed input versions (rvm_mobilenetv3_192x320, rvm_mobilenetv3_240x320, etc.)

I would like to generate such a model but with different input and r1i, r2i, r3i, r4i, sizes.

The original model has these input sizes:
src [batch_size,3,height,width]
r1i [batch_size,channels,height,width]
r2i [batch_size,channels,height,width]
r3i [batch_size,channels,height,width]
r4i [batch_size,channels,height,width]

I see that with the script "batchsize_clear.py" it is possible to change 'batch_size' to 'N' (I don't know if it should be 'N' or '1')

I cannot find out how to fix the "downsample_ratio" hyperparameter of the original model.

I see a file "rvm_mobilenetv3_HxW.onnx" that eliminates the "downsample_ratio" and sets generic input shapes.

I also see that with the script "set_static_shape.py" it is possible to change the input shapes, but it changes all the inputs ("src" and also the state tensors "rxi").

For example with W=1920 and H=1080 I get these input sizes:
src [1,3,1080,1920]
r1i [1,16,1080,1920]
r2i [1,20,1080,1920]
r3i [1,40,1080,1920]
r4i [1,64,1080,1920]

I am expecting sizes for the rxi inputs with different dimensions like:
r1i_dims = { 1, 16, 192, 320 };
r2i_dims = { 1, 20, 96, 160 };
r3i_dims = { 1, 40, 48, 80 };
r4i_dims = { 1, 64, 24, 40 };

I see also that the sizes for the outputs "fgr" anf "pha" are recalculated but for the rxo outputs:
fgr [1,3,1080,1920]
pha[1,1,1080,1920]
r1o [1,16,height,width]
r2o [1,20,height,width]
r3o [1,40,height,width]
r4o [1,64,height,width]

Could you please give me some clue as to how I could generate an onnx model with a certain size of "src" that correctly sets the sizes of the "rxi" inputs and also the outputs?

Best regards

Relevant Log Output

No response

URL or source code for simple inference testing code

No response