google / CFU-Playground

Want a faster ML processor? Do it yourself! -- A framework for playing with custom opcodes to accelerate TensorFlow Lite for Microcontrollers (TFLM). . . . . . Online tutorial: https://google.github.io/CFU-Playground/ For reference docs, see the link below.

Home Page:http://cfu-playground.rtfd.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trying different input for mnv2 model

noorassan opened this issue · comments

Hi!

I'm hoping to test out running classification on some of my own images using the CFU-Playground framework with the mnv2 model. I wanted to ask exactly what kind of preprocessing was applied to the mnv2 input files provided.

From taking a look at them, it seems that they're 3 channel 160 x 160 pixel images with uint8 data (correct me if I'm wrong), but I'm unsure if any extra processing was applied to the image data. Any info would be helpful.

So far, I tried using something like this to get the model input data from a 160x160 jpeg, but I didn't get a good classification:

    img_jpeg = tf.io.read_file(filename)
    img = tf.image.decode_image(img_jpeg, dtype=tf.dtypes.uint8)

    img.numpy().tofile("out.dat")

Also, I was wondering if you have any info on interpreting the output of the model. Does it directly correspond to the mnv2 label?

commented

Hi, @noorassan :

You could do some code trace to find out the input image format in framebuffer display code which I provide in "CSR_VIDEO_FRAMEBUFFER_BASE" defined section in
https://github.com/google/CFU-Playground/blob/8dddce38085becf9480121ee173ef7d90f81ef70/common/src/models/mnv2/mnv2.cc

BR, Akio

Thanks -- that was helpful. Could you give me some guidance on how to interpret the output? All of the trained mobilenet models that I've worked with return an output with length equal to the number of labels in the model with each value proportional to the likelihood of that label. It seems like the mobilenet model used in cfu playground just returns an output of length two. How should this output be interpreted?

commented

Hi, @noorassan :

the output part is a relative value. for the most simple way is find out the 1st max value, the index will be the respect class. the 2nd max value's index is the 2nd class.

for the confidence ratio, you could using a approximation way for softmax or just a simple percentage ratio to evaluate the confidence value.

for me, i go the simple way.

BR, Akio

Sorry -- not sure if you misinterpreted my question or if I'm misinterpreting you, but I appreciate the response nonetheless. You said you can find the max value in the output and the index will be the respective class. However, the output of the mobilenetv2 model included in cfu playground only has 2 values while there are far more than 2 classes, correct? How should this output be interpreted?

For example, I ran an image of a persian cat through the model and received this output -- note that I didn't use CFU Playground for this, I just took the tflite file from the repo and ran the model on my local machine. How can I use this output to get a label?

[ 48 -48]

Again, thanks so much for the response -- hopefully I'm not missing something obvious...

commented

Hi @noorassan ! Sorry for my delay helping out. I just found a very useful site for viewing the raw data files: https://rawpixels.net/. The parameters for the .dat files in the repository are width:160, height:160, offset:0, Predfined format:RGB24. I didn't need to change anything else, but verify that you have: Pixel Format:RGBA, Ignore Alpha (yes checked), bpp1..4: 8,8,8,0.

I don't know where we got this particular trained model, but I'm pretty sure it is only trying to recognize person / no person. @alanvgreen , do you remember where you got the MNV2 model?

If you get a different MNV2 model or train your own, you might need to adjust some of the C++ code to account for different image format and different number of classes in the output vector.

commented

Hi, @noorassan :

the output interpret method is based on the network output. if the last layer is softmax, the interpretation will be what I am said. if the network is like yolo v1/v2/v3, you have to
know the structure to do interpretation.

due to there is no FPGA board in my hand, I could not get the output of MNV2.

Would you like to give the output of MNV2 ?

BR, Akio

Sorry -- not sure if you misinterpreted my question or if I'm misinterpreting you, but I appreciate the response nonetheless. You said you can find the max value in the output and the index will be the respective class. However, the output of the mobilenetv2 model included in cfu playground only has 2 values while there are far more than 2 classes, correct? How should this output be interpreted?

For example, I ran an image of a persian cat through the model and received this output -- note that I didn't use CFU Playground for this, I just took the tflite file from the repo and ran the model on my local machine. How can I use this output to get a label?

[ 48 -48]

Again, thanks so much for the response -- hopefully I'm not missing something obvious...

@akioolin Sure, I included the output in the post that you quoted. I think that @tcal-x has cleared things up for me though. Nonetheless, thanks for your responses!

@tcal-x Thanks for the explanation -- it makes sense that the model would be recognizing person/no-person based on the output

Hi @noorassan , Finally, how your code converting from .jpg to .dat file looks like? I have the same issue. Thank you.

Hi – sorry for the late response. I ended up using the following script. Hope it helps!

import sys

import tensorflow as tf

from tensorflow.python.ops.numpy_ops import np_config


if __name__ == "__main__":
    filename = sys.argv[1]
    np_config.enable_numpy_behavior()

    img_jpeg = tf.io.read_file(filename)
    img = tf.image.decode_image(img_jpeg, dtype=tf.dtypes.uint8)

    img.numpy().tofile("out.dat")