xboot / libonnx

A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Hello model RAM size required

noomio opened this issue · comments

Hi,

I'm trying to run the hello example on a small embedded system but im unsure of the memory required to allocate this model ( when running onnx_context_alloc).

I have roughly 2MB, is that enough?
Is there a smaller model that I can test with the model defined as a const char array?
Like the static const unsigned char mnist_onnx[] = { ... }

Add this function to show memory information.

`static void display_mallinfo(void)
{
struct mallinfo mi = mallinfo();

printf("Total non-mmapped bytes (arena):       %d\n", mi.arena);
printf("of free chunks (ordblks):            %d\n", mi.ordblks);
printf("of free fastbin blocks (smblks):     %d\n", mi.smblks);
printf("of mapped regions (hblks):           %d\n", mi.hblks);
printf("Bytes in mapped regions (hblkhd):      %d\n", mi.hblkhd);
printf("Max. total allocated space (usmblks):  %d\n", mi.usmblks);
printf("Free bytes held in fastbins (fsmblks): %d\n", mi.fsmblks);
printf("Total allocated space (uordblks):      %d\n", mi.uordblks);
printf("Total free space (fordblks):           %d\n", mi.fordblks);
printf("Topmost releasable block (keepcost):   %d\n", mi.keepcost);

}`

============== Before alloc context ==============
Total non-mmapped bytes (arena): 138816
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 3536
Total free space (fordblks): 135280
Topmost releasable block (keepcost): 135280

============== After alloc context ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536

============== Befor onnx run ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536

============== After onnx run ==============
Total non-mmapped bytes (arena): 450112
of free chunks (ordblks): 3
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 235552
Total free space (fordblks): 214560
Topmost releasable block (keepcost): 133728

2MB memory is enough. mnist is the smallest model, you can usinig xxd -i for other models.

Thanks.

Unfortunately I'm not running on Linux.

It's a cortex-a7 with ThreadX and debugging is very limited (no JTAG).

I'm unable to run much at the moment as it fails and I can trace it easily.

Hi,

I traced the fault down to memalign. I had to add my own implementation as I have done for malloc ,free and realloc.

It run the benchmark but freeing some objects isn't performed well, probably due to memalign.

Thanks!

just using malloc instead of memalign,512 bytes align is not necessary.

So leaving it as align 4 and allocating the len shall be sufficient?

It worked ;)

Must ensure 8-byte alignment, double type。for 32-bits system, malloc usually 8-byte aligned, for 64-bits system, usually 16-byte aligned, the twice of void * type, Confirm your malloc alignment。

I have added this:

UCHAR mem_heap[MALLOC_BYTE_POOL_SIZE] attribute ((aligned (8)));

write customized malloc may be ok. 8-bytes align for onnx_tensor_t's datas.

It seems to work. Im also able to run the mnist model.

image

Just the debug output isnt quite right. Need to figure out the printf implementation.

I just need to append LF on every CR as I'm on windows.
So far so good.
Thanks for your help. The library is great!