uTensor / utensor_cgen

C++ code generator for uTensor https://utensor-cgen.readthedocs.io/en/latest/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Allow inline optimizer to spit out a single variable

janjongboom opened this issue · comments

To do weight updates it's much easier if only a single block of memory is responsible for all the weights. One way of doing this would be to concat all the variables that the inline optimizer spits out, and reference the memory directly from the trained.cpp file. Bonus points if you can pass a block of memory into trained.cpp, so you can place them in memory-mapped flash (on many ST boards the QSPI is mapped at 0x90000000).

I figured at first that using the existing file system functions would help with the latter, but LittleFS on (at least my) QSPI boards is extremely slow and inferencing time goes from 320 ms. to 2000 ms. on DISCO-L475VG-IOT01A because the slow fopen() calls (see ARMmbed/mbed-os#11085).

@janjongboom This is interesting.
Is the intention to optimize the firmware for delta updates?

Please help me to envision how the latter should be done. Currently, the weights are being compiled as .text. It is injected into the flash as a part of the application.
Declaring a tensor pointing to a specific memory region should be straight forward. Though, how are you planning on injecting the weight into the QSPI.

tagging @Knight-X

@neil-tan It's in .text but split over many variables. So if I want to update them through some sort of delta update process I need to get the location in flash of every variable and overwrite that portion. If it's just a single blob and uTensor knows the offset of each file in that blob you only need to do one update and don't have to do any book-keeping yourself.

@janjongboom what's the benefit in doing this over just forcing the individual constants into a specified region of rom? Given they are in the same translation unit they probably end up contiguous anyways.

@mbartling Because then I need to keep track of 1) all variables, 2) the location of these variables in ROM. Which means I need a compiler to figure this out. If the location of the weights is abstracted away by uTensor I don't need a compiler. I just need some Python code to generate the new weights definition and send it to the board.

Given they are in the same translation unit they probably end up contiguous anyways.

Yes, but no guarantees here.