timctho / convolutional-pose-machines-tensorflow

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

input and output tensors.

slimcdk opened this issue · comments

I'm in the progress of converting the model to Tensorflow Lite, but I'm not very experienced with Tensorflow yet.

For the conversion I need to use the input and output tensor sizes. Where am I able to find those?

Will the input be the image size and color channels? Eg [None, FLAGS.input_size, FLAGS.input_size, 3] ?
And for output, would that be just the num_of_joints number?

To clarify my question, I'm using the second code snippet provided by Pannag Sanketi : https://stackoverflow.com/questions/50632152/tensorflow-convert-pb-file-to-tflite-using-python

Hello @slimcdk did you find out how to convert it to tensorflow lite yet ?
Im searching for any way to do that but dont know where to start

Hi

Yea, take a look at my fork of the repo: https://github.com/slimcdk/convolutional-pose-machines-tensorflow
I found that the first tensor has a misspelling in the provided weights, which is corrected in the model source code.

I did also manage to do inference on the model, but processing time were between 2-3 seconds, on a Galaxy S10. I still need to create the kalman filter and possible the tracker module, or you could just feed the model with a fixed resolution.

Hello @slimcdk , really thank you so much for your help finally I managed to convert it to a tflite file thanks to your comment

I also managed to make an inference on iOS device and processing time is better there

But there is a problem with the output,
How can we get the labels from the output there, forgive me Im completely new to that field and would appreciate any help

This is an example for the output

[[[[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  ...

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]

  [[ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   ...
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]
   [ 4.3078829e-03  7.9320744e-05 -1.2679170e-03 ... -1.5761635e-03
    -2.9271552e-03 -8.5114062e-02]]]]


If you think about how the three color channels (red, green, blue) in regular images form a stacked layer approach.

The output of this model is similar, but insted of three color channels you get 21 channels (heatmaps). One for each joint. Each heatmap is a 2d array of zeros (black pixels) except those where a joint has been recognized, those spots are ones (white ) -> hence the name heatmap.

Each layer/heatmap is a 2d array, which can be seen as a x and y coordinate system. What you would do, is first to find which value is the highest in the heatmap and afterwards find the x and y indexes of that value.

The above calculation is done right here: https://github.com/timctho/convolutional-pose-machines-tensorflow/blob/master/run_demo_hand_with_tracker.py#L298-L299

This image visualizes all 21 heatmaps as a single layer, but behind the scenes, they are in their own layer.
image

I'm in the progress of converting the model to Tensorflow Lite, but I'm not very experienced with Tensorflow yet.

For the conversion I need to use the input and output tensor sizes. Where am I able to find those?

Will the input be the image size and color channels? Eg [None, FLAGS.input_size, FLAGS.input_size, 3] ?
And for output, would that be just the num_of_joints number?

To clarify my question, I'm using the second code snippet provided by Pannag Sanketi : https://stackoverflow.com/questions/50632152/tensorflow-convert-pb-file-to-tflite-using-python

hi,do you konw the numbers of data in the label txt? is the name of jpg + 4 coordinates of hand bbox + 21 coordinates of hand keypoint?