input and output tensors.

Question

input and output tensors.

slimcdk opened this issue 5 years ago · comments

I'm in the progress of converting the model to Tensorflow Lite, but I'm not very experienced with Tensorflow yet.

For the conversion I need to use the input and output tensor sizes. Where am I able to find those?

Will the input be the image size and color channels? Eg [None, FLAGS.input_size, FLAGS.input_size, 3] ?
And for output, would that be just the num_of_joints number?

To clarify my question, I'm using the second code snippet provided by Pannag Sanketi : https://stackoverflow.com/questions/50632152/tensorflow-convert-pb-file-to-tflite-using-python

Heba Mekawi · Answer 1 · Sun May 12 2019 09:47:19 GMT+0800 (China Standard Time)

Hello @slimcdk did you find out how to convert it to tensorflow lite yet ?
Im searching for any way to do that but dont know where to start

C H R I S T I A N · Answer 2 · Sun May 12 2019 10:02:03 GMT+0800 (China Standard Time)

Hi

Yea, take a look at my fork of the repo: https://github.com/slimcdk/convolutional-pose-machines-tensorflow
I found that the first tensor has a misspelling in the provided weights, which is corrected in the model source code.

I did also manage to do inference on the model, but processing time were between 2-3 seconds, on a Galaxy S10. I still need to create the kalman filter and possible the tracker module, or you could just feed the model with a fixed resolution.

Heba Mekawi · Answer 3 · Sun May 26 2019 07:40:34 GMT+0800 (China Standard Time)

Hello @slimcdk , really thank you so much for your help finally I managed to convert it to a tflite file thanks to your comment

I also managed to make an inference on iOS device and processing time is better there

But there is a problem with the output,
How can we get the labels from the output there, forgive me Im completely new to that field and would appreciate any help

This is an example for the output

[[[[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  ...

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]]]

C H R I S T I A N · Answer 4 · Sun May 26 2019 11:29:33 GMT+0800 (China Standard Time)

If you think about how the three color channels (red, green, blue) in regular images form a stacked layer approach.

The output of this model is similar, but insted of three color channels you get 21 channels (heatmaps). One for each joint. Each heatmap is a 2d array of zeros (black pixels) except those where a joint has been recognized, those spots are ones (white ) -> hence the name heatmap.

Each layer/heatmap is a 2d array, which can be seen as a x and y coordinate system. What you would do, is first to find which value is the highest in the heatmap and afterwards find the x and y indexes of that value.

The above calculation is done right here: https://github.com/timctho/convolutional-pose-machines-tensorflow/blob/master/run_demo_hand_with_tracker.py#L298-L299

This image visualizes all 21 heatmaps as a single layer, but behind the scenes, they are in their own layer.

luchen828 · Answer 5 · Thu Apr 16 2020 08:49:57 GMT+0800 (China Standard Time)

I'm in the progress of converting the model to Tensorflow Lite, but I'm not very experienced with Tensorflow yet.

For the conversion I need to use the input and output tensor sizes. Where am I able to find those?

Will the input be the image size and color channels? Eg [None, FLAGS.input_size, FLAGS.input_size, 3] ?
And for output, would that be just the num_of_joints number?

To clarify my question, I'm using the second code snippet provided by Pannag Sanketi : https://stackoverflow.com/questions/50632152/tensorflow-convert-pb-file-to-tflite-using-python

hi,do you konw the numbers of data in the label txt? is the name of jpg + 4 coordinates of hand bbox + 21 coordinates of hand keypoint?