Hello model RAM size required
noomio opened this issue · comments
Hi,
I'm trying to run the hello example on a small embedded system but im unsure of the memory required to allocate this model ( when running onnx_context_alloc
).
I have roughly 2MB, is that enough?
Is there a smaller model that I can test with the model defined as a const char array?
Like the static const unsigned char mnist_onnx[] = { ... }
Add this function to show memory information.
`static void display_mallinfo(void)
{
struct mallinfo mi = mallinfo();
printf("Total non-mmapped bytes (arena): %d\n", mi.arena);
printf("of free chunks (ordblks): %d\n", mi.ordblks);
printf("of free fastbin blocks (smblks): %d\n", mi.smblks);
printf("of mapped regions (hblks): %d\n", mi.hblks);
printf("Bytes in mapped regions (hblkhd): %d\n", mi.hblkhd);
printf("Max. total allocated space (usmblks): %d\n", mi.usmblks);
printf("Free bytes held in fastbins (fsmblks): %d\n", mi.fsmblks);
printf("Total allocated space (uordblks): %d\n", mi.uordblks);
printf("Total free space (fordblks): %d\n", mi.fordblks);
printf("Topmost releasable block (keepcost): %d\n", mi.keepcost);
}`
============== Before alloc context ==============
Total non-mmapped bytes (arena): 138816
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 3536
Total free space (fordblks): 135280
Topmost releasable block (keepcost): 135280
============== After alloc context ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536
============== Befor onnx run ==============
Total non-mmapped bytes (arena): 286272
of free chunks (ordblks): 1
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 232736
Total free space (fordblks): 53536
Topmost releasable block (keepcost): 53536
============== After onnx run ==============
Total non-mmapped bytes (arena): 450112
of free chunks (ordblks): 3
of free fastbin blocks (smblks): 0
of mapped regions (hblks): 0
Bytes in mapped regions (hblkhd): 0
Max. total allocated space (usmblks): 0
Free bytes held in fastbins (fsmblks): 0
Total allocated space (uordblks): 235552
Total free space (fordblks): 214560
Topmost releasable block (keepcost): 133728
2MB memory is enough. mnist is the smallest model, you can usinig xxd -i for other models.
Thanks.
Unfortunately I'm not running on Linux.
It's a cortex-a7 with ThreadX and debugging is very limited (no JTAG).
I'm unable to run much at the moment as it fails and I can trace it easily.
Hi,
I traced the fault down to memalign. I had to add my own implementation as I have done for malloc ,free and realloc.
It run the benchmark but freeing some objects isn't performed well, probably due to memalign.
Thanks!
just using malloc instead of memalign,512 bytes align is not necessary.
So leaving it as align 4 and allocating the len shall be sufficient?
It worked ;)
Must ensure 8-byte alignment, double type。for 32-bits system, malloc usually 8-byte aligned, for 64-bits system, usually 16-byte aligned, the twice of void * type, Confirm your malloc alignment。
I have added this:
UCHAR mem_heap[MALLOC_BYTE_POOL_SIZE] attribute ((aligned (8)));
write customized malloc may be ok. 8-bytes align for onnx_tensor_t's datas.
I just need to append LF on every CR as I'm on windows.
So far so good.
Thanks for your help. The library is great!