magao-x / MagAOX

The MagAO-X Software System

Home Page:https://magao-x.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

we have a slow leak

jaredmales opened this issue · comments

Now that we have a fairly stable system and have our software framework up and running for weeks to months at a time, it's clear we have a slow memory leak somewhere.

image

There is no reason at all that koolanceCtrl should be holding 5 GB of RAM.

For reference

[jrmales@exao3 config]$ ps -o etime= -p 1341779 
20-06:27:05

Biggest suspect is of course the INDI subsystem. The other subsystem that is constantly going is logger/telemetry system.

with valgrind found one problem in IndiConnection caused by my fix to the output code.

472 bytes in 1 blocks are still reachable in loss record 6 of 12
==160426==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==160426==    by 0x4F2CF62: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==160426==    by 0x1637E8: pcf::IndiConnection::setOutputFd(int const&) (IndiConnection.cpp:419)

See d054988

Huh, I'm confused how that leaks. I guess glibc allocates for the fdopen and needs an fdclose to deallocate? Sneaky

Oh it's allocating the FILE struct I got it

yeah something in FILE is getting a malloc, and if you do fdopen again without fclose, that hangs around.

tbc I don't think this is causing the GB/week

further valgrind testing with magaoxMaths isn't showing anything interesting. some stuff happening with pthreads causing still accessible chunks at exit, but no actual leaks.

the maths demos are not telemeters but they do log.

The main offenders are all tty users. That needs to be checked!

Found it! Or least some of it. Testing on koolanceCtrl, a user of tty::usb-> udev, we get

==3640043== LEAK SUMMARY:
==3640043==    definitely lost: 2,560 bytes in 20 blocks
==3640043==    indirectly lost: 102,384 bytes in 1,912 blocks
==3640043==      possibly lost: 0 bytes in 0 blocks
==3640043==    still reachable: 8 bytes in 1 blocks
==3640043==         suppressed: 0 bytes in 0 blocks

udev issues fixed by 49dcd5d

valgrind branch merged with dev, now being tested on ICC

after 5 days it looks like problem fixed.