we have a slow leak
jaredmales opened this issue · comments
Now that we have a fairly stable system and have our software framework up and running for weeks to months at a time, it's clear we have a slow memory leak somewhere.
There is no reason at all that koolanceCtrl
should be holding 5 GB of RAM.
For reference
[jrmales@exao3 config]$ ps -o etime= -p 1341779
20-06:27:05
Biggest suspect is of course the INDI subsystem. The other subsystem that is constantly going is logger/telemetry system.
with valgrind found one problem in IndiConnection caused by my fix to the output code.
472 bytes in 1 blocks are still reachable in loss record 6 of 12
==160426== at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==160426== by 0x4F2CF62: fdopen@@GLIBC_2.2.5 (iofdopen.c:122)
==160426== by 0x1637E8: pcf::IndiConnection::setOutputFd(int const&) (IndiConnection.cpp:419)
See d054988
Huh, I'm confused how that leaks. I guess glibc allocates for the fdopen and needs an fdclose to deallocate? Sneaky
Oh it's allocating the FILE struct I got it
yeah something in FILE is getting a malloc, and if you do fdopen again without fclose, that hangs around.
tbc I don't think this is causing the GB/week
further valgrind testing with magaoxMaths isn't showing anything interesting. some stuff happening with pthreads causing still accessible chunks at exit, but no actual leaks.
the maths demos are not telemeters but they do log.
The main offenders are all tty users. That needs to be checked!
Found it! Or least some of it. Testing on koolanceCtrl, a user of tty::usb-> udev, we get
==3640043== LEAK SUMMARY:
==3640043== definitely lost: 2,560 bytes in 20 blocks
==3640043== indirectly lost: 102,384 bytes in 1,912 blocks
==3640043== possibly lost: 0 bytes in 0 blocks
==3640043== still reachable: 8 bytes in 1 blocks
==3640043== suppressed: 0 bytes in 0 blocks
udev issues fixed by 49dcd5d
valgrind branch merged with dev, now being tested on ICC
after 5 days it looks like problem fixed.