Memory leakage detection test

Question

Memory leakage detection test

lwsanty opened this issue 5 years ago · comments

Oleksandr Chabaiev commented 5 years ago

Summary From time to time different teams experience memory leakage issues while working with the elements of our architecture such as bblfshd, separate drivers, clients etc. This is an umbrella issue to discuss methods of memory leakage detection on early stages before the damaged version reached users. It's worth to mention that under this task I assume detection of the fact of the leakage but not a certain point of it.

API clients since API clients are obviously written in different languages there's no single low level pattern we can apply. One of the simplest solutions is a utility that will:

pre-compile a program that's based on a client and processes some data
start this program and measure RAM per pid over the time
process the RAM change dynamics

Depending on time consumption we can trigger this tests execution either in Travis or in Jenkins CI

@dennwc @bzz @kuba-- @creachadair @smola @ncordon wdyt?

M. J. Fromberger · Answer 1 · Tue Aug 06 2019 00:37:49 GMT+0800 (China Standard Time)

I think the idea is good, but the implementation will be complicated because of our cross-language stack. For the low-level components (e.g., the C extensions) we can probably use valgrind/memcheck or stub in a malloc shim like libefence to induce a failure on a test workload.

For code in the native client language, however, solutions will vary, and detecting leaks in the presence of a GC is tricky. Running "for a while" only works if we know something about the GC settings of the underlying process, and can push a workload sufficient to drive GC cycles. I've had some luck in the past repeating an artificial workload with repeated manual invocations of the GC in between, and verifying that rusage does not diverge. This is very language- and runtime-specific, though. C++ and Go have excellent memory profiling tools. I know that some heap-tracing tools also exist for JVM, although I do not know the current recommended practice there.

So: I think for the short term we should probably confine ourselves to the places we've already had issues around the C extensions. End-to-end profiling is likely to be a much bigger effort, and it's less clear the impact will be worth it yet.