d99kris / stackusage

Measure stack usage in Linux and macOS applications

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

su_use_stack redundant?

iangehc opened this issue · comments

Hi,

su_use_stack() appears to be redundant because the call to memset() is followed by filling the stack byte-by-byte and, in both cases, if a page is not mapped then the OS will catch a page fault and map the requested page.

Was there a particular OS/kernel version that you had in mind when creating su_use_stack()?

Would you accept a patch to remove this function?
Thank for making this very useful tool!

System

i.MX53 (ARM Cortex-A8, ARMv7-A) running custom Yocto (Dunfell, gcc 9.3.0, kernel 5.15) with stack size 100000 and a main thread of 8 MiB.

Debug Patch

diff --git a/src/sumain.c b/src/sumain.c
index 6df2477..9c64fa4 100644
--- a/src/sumain.c
+++ b/src/sumain.c
@@ -288,6 +288,7 @@ static void su_use_stack(char *base, long size)
   memset(arr, rand() % 255, SU_DUMMY_USE);
   if ((labs(&here - base) + SU_DUMMY_USE) < size)
   {
+    printf("labs(%ld - %ld)=%ld + %ld < %ld\n", (intptr_t)&here, base, labs(&here - base), SU_DUMMY_USE, size);
     su_use_stack(base, size);
   }
   else

Debug Output

As we recurse, the address of here grows by only 40 bytes.
(I assume this is due to a compiler optimisation, since arr is 16 KiB.)

Given main thread size of 8386960, then this implies 8386960/40 or > 200K function calls.
(On my system I stopped waiting after many minutes.)

labs(2125405027 - 2125404560)=467 + 16384 < 8386960 
labs(2125404987 - 2125404560)=427 + 16384 < 8386960 
labs(2125404947 - 2125404560)=387 + 16384 < 8386960 
labs(2125404907 - 2125404560)=347 + 16384 < 8386960 
labs(2125404867 - 2125404560)=307 + 16384 < 8386960 
labs(2125404827 - 2125404560)=267 + 16384 < 8386960 
labs(2125404787 - 2125404560)=227 + 16384 < 8386960 
labs(2125404747 - 2125404560)=187 + 16384 < 8386960 
labs(2125404707 - 2125404560)=147 + 16384 < 8386960 
labs(2125404667 - 2125404560)=107 + 16384 < 8386960 
labs(2125404627 - 2125404560)=67 + 16384 < 8386960  
labs(2125404587 - 2125404560)=27 + 16384 < 8386960  
labs(2125404547 - 2125404560)=13 + 16384 < 8386960  
labs(2125404507 - 2125404560)=53 + 16384 < 8386960  
labs(2125404467 - 2125404560)=93 + 16384 < 8386960  
labs(2125404427 - 2125404560)=133 + 16384 < 8386960 
labs(2125404387 - 2125404560)=173 + 16384 < 8386960 
labs(2125404347 - 2125404560)=213 + 16384 < 8386960 
:

Hi @iangehc - thanks for suggesting improvements!
The function su_use_stack() was added in 79a0cc1 to fix #1 - to properly use stack in main-thread under Linux around 2017 (I don't remember the exact Linux kernel version - but it was Ubuntu 16.04 on Intel). If you have a proposed fix which works under current Linux kernels I'd be happy to review and accept a patch.

Thanks for the reference to #1 -- that was useful to re-create the issue on an Ubuntu VM.

The pull request differs from what I initially planned, as the issue seems to be related to optimization of the dummy array.