Memory leaking while testing tcserver

Question

Memory leaking while testing tcserver

doanquocnam opened this issue 3 months ago · comments

Run tcserver -B 0.0.0.0:1935 -b 0.0.0.0:1936 now that tcserver use 1.2MB of memory
Using 100 rtmfp clients or more connect to a tcserver to play a stream, after that closing all these streams from each client. The memory of tcserver process increases to 3.9MB ...

After thousand of rtmfp connections & disconnect and now tcserver increases to 2GB memory.

Edit: I've found that when the client is killed or crashed (shutdown function is not called from client) to server, so server leaked memory though the function onException is called by RTMFPClient.

Please fix!! Thanks!!

Michael Thornburgh · Answer 1 · Mon May 13 2024 23:20:44 GMT+0800 (China Standard Time)

thanks for the report and for tracking down where the memory is leaking. i'll take a look when i can after my Day Job™. :)

Michael Thornburgh · Answer 2 · Mon May 13 2024 23:43:15 GMT+0800 (China Standard Time)

just for clarification: are you saying that the flow->onException callback is being called (which then calls RTMFPClient::close(), which should tear everything down), or it is not being called at all when you abruptly kill the client or it crashes?

Michael Thornburgh · Answer 3 · Tue May 14 2024 10:02:16 GMT+0800 (China Standard Time)

i believe what you're seeing is not actually a memory leak, but an unfortunate interplay of dynamically-resized data structures and memory fragmentation.

when you abruptly kill (or crash) a client that's actively playing a stream, the client can't send the "session close" message to the server, so the server still thinks the clients are still there until the server detects they're dead (after the "session retransmit limit"). while the server thinks the clients are still there and playing a stream, frame messages will continue to be queued for them. the queued frames will expire, but the expiration runs less and less often as the retransmissions get further apart. after the session retransmit limit, the clients will all be closed, which is when the onException callback is called on the RTMFPClient's control flow, all the flows in the RTMFP session are aborted, and all the queued data should be released.

unfortunately, at this point, potentially a lot of memory was used to queue those frame messages, and other memory is allocated, or data structures resized, all the time. plus, a bunch of state for all those sessions still exists for the next 90 seconds while RTMFP tries to send session close messages to all of the dead clients. at the end of the NEARCLOSE_PERIOD (90 seconds) timer, the sessions are finally deallocated; however, there could still be legitimate objects that RTMFP allocated or resized that are in the fragmented memory regions, which could keep them from being completely freed by the C/C++ runtime or the OS.

i did a test publishing a 1 Mbps stream into a tcserver, and then starting players (tcconn) to play that stream, and then killing them with killall -9 tcconn, over and over. usually i'd start 50 of them; toward the end of the test i was starting 100 at a time. after killing them, i'd wait 90 seconds after tcserver detected the clients were dead and closed them (which usually took about 20-25 seconds). during the 20-25 seconds memory use would climb up to about 103MB. during the 90 second NEARCLOSE_PERIOD memory use would stay about the same. afterward, memory use would drop down to about 60MB. i did that over and over until i'd had over 1000 connections, and right now the memory use is still just 60.6MB.

{"@pid":6915,"@server":"bb5328697c","@timestamp":1715651458.108155,"@type":"stats","accepts":{"@all":1024,"rtmfp":1024,"rtmp":0,"rtws":0},"apps":1,"broadcasts":0,"clients":1,"connects":1024,"intros":0,"lookups":0,"playedDuration":35327,"playing":0,"plays":1023,"publishedDuration":5447,"publishes":1,"publishing":1,"relaysIn":0,"relaysOut":0}

given the persistent memory used by the process appears related to how many dead clients were still getting messages queued for them at once, and not how many dead clients had messages queued for them overall, i don't think memory is being leaked. i suspect that if you retry your test with 1000 dead clients subscribed to a stream, wait for tcserver to notice they're dead (logging a "close" for each of them), wait 90 seconds, and then start 1000 more clients and kill those, you won't see the memory use double.

there is a possibility that queued messages in the SendFlows aren't released while in the NEARCLOSE_PERIOD, and continue using memory until the session is fully destroyed. i'll double-check for that case. queued messages should be released as soon as the SendFlows are aborted when the session moves to the S_NEARCLOSE state.

Michael Thornburgh · Answer 4 · Tue May 14 2024 10:31:23 GMT+0800 (China Standard Time)

i see that queued messages in a SendFlow belonging to a closed session will stick around until the session itself is destroyed. i'll fix that.

doanquocnam · Answer 5 · Tue May 14 2024 10:53:36 GMT+0800 (China Standard Time)

Thanks for reviewing!

I run tcserver, and it always keeps the last memory size which were allocated for tcconn connections. I made many many connections to tcserver, it never deallocate after these client disconnect..

And there is an issue more, the single thread of server has an issue when I run it on a CentOS server with Intel CPU Xeon, the CPU has each single core speed 2.3GHz and then I make > 3000 connections to play the same one publishing channel. It has a latency just the cpu usage is 100%. On my desktop my CPU is 4.8GHz for a single core, so make 6000 connections to play one channel, the server get 100% of single core CPU. And it seem to be frozen, the clients now get a stutter while playing a rtmfp stream (I writed some code to decode video & audio tag from onAudio & onVideo from tcconn client side)

Michael Thornburgh · Answer 6 · Tue May 14 2024 11:21:24 GMT+0800 (China Standard Time)

i would expect you to hit 100% CPU well before 3000 or 6000 connections that are playing streams. tcserver is indeed single-threaded. in extreme overload it won't be able to keep up, and it might not be able to service all sessions in a timely manner. try adding clients slowly until you get to 100% CPU. tcserver should still be working just as you reach 100%, and should probably work ok with an extra 10-20% more clients after that, depending on your operating system.

note that tcserver is not intended to be a sophisticated use-all-your-cores server. it's just intended to be a pretty-good demonstration server. :) however, for some use cases (like publishing one stream to many many subscribers) you could get some fan-out and horizontal scaling by running several instances of tcserver, registering each to redirector (an RTMFP load balancer), and then publishing the same stream simultaneously to all of the tcserver instances. clients would start out connecting to the redirector, which would then direct the clients to connect to one of the tcservers. the default settings of redirector should do a pretty good job of load leveling by doing round-robin and having the client try two tcservers in parallel. clients will select the first tcserver to reply. see https://www.rfc-editor.org/rfc/rfc7016.html#section-3.5.1.7 and https://www.rfc-editor.org/rfc/rfc7016.html#section-3.5.1.4 .

Michael Thornburgh · Answer 7 · Tue May 14 2024 11:40:48 GMT+0800 (China Standard Time)

46c0e73 should release memory for queued messages earlier now when a session closes, so you shouldn't need to wait for the NEARCLOSE_PERIOD.

doanquocnam · Answer 8 · Tue May 14 2024 11:56:53 GMT+0800 (China Standard Time)

Thanks!

I can control this case to use tcserver for multi-instance, multi-process or multi-threads. But I think that the cpu usage reaches to 100% just be cause of fragmenting in the function basicWrite(...)

The relay worker, I want it would work in a different way. E.g: 2 tcconn connections connect to the relay, and both these tcconn play the same live stream. That means the relay would create only one connection to tcserver. But for each connection the relay would create one connectoin to server.
So that if I have N connections that play the same live channel and these N tcconn connect to one relay worker and all of them play the same live channel (e.g: rtmfp://exam.live.com/live#test0) , so that the relay only create one connection to server and forward all packet to N tcconn.

doanquocnam · Answer 9 · Tue May 14 2024 12:16:51 GMT+0800 (China Standard Time)

46c0e73 should release memory for queued messages earlier now when a session closes, so you shouldn't need to wait for the NEARCLOSE_PERIOD.

Thanks!!

git pull to update code
2.. Build src & test and run server: tcserver -B 0.0.0.0:1935 -b 0.0.0.0:1936
open monitor on ubuntu now the memory of tcserver is 1.2MB
make 120 tcconn clients connect to tcserver now that memory of tcserver increase to 3.7MB
Disconnect all these 120 connections and wait for 5 minutes, the memory is still keeping 3.7MB
Make 550 tcconn clients connect to tcserver now that memory of tcserver increase to 17MB
Disconnect all these 550 connections and wait for 10 minutes, the memory is still keeping 17 MB

...
It doesnt deallocate the memory what it allocated before.

Michael Thornburgh · Answer 10 · Tue May 14 2024 12:43:46 GMT+0800 (China Standard Time)

It doesnt deallocate the memory what it allocated before.

as i explained above, this is not surprising given how memory allocation, reallocation, and freeing works in C/C++ and Unix/Linux environments. in the examples you have here, no media was being published to the subscribers, no extra data was getting queued up in any SendFlows, so the change i just pushed wouldn't make any difference.

i believe if, after disconnecting the 550 tcconn connections and waiting 90 seconds after tcserver says "close" (so the sessions themselves are deallocated), connecting another 550 tcconn connections and disconnecting them shouldn't leave you using substantially more memory. the problem is that objects allocated in memory don't get moved around to compact/defrag the memory map. that 17MB of memory allocated to the process likely has tons of holes in the in-process malloc space that were free()ed, and that freed memory can be allocated to new objects without growing the process's memory usage.

doanquocnam · Answer 11 · Tue May 14 2024 13:06:22 GMT+0800 (China Standard Time)

I disconnect all these 550 connections and wait for 10 minutes and till now the memory still keep 17MB.

So that is only on my test. I run a process of tcserver on my server for a long time and many connection with one publishing media, and made many connections before and now, disconnected all these for along time but the memory is till keeping for now.

It keeps the highest memory what it had already allocated, and then if there are more number connections than the last It will allocate more and keep it.

E.g: if I make 660 connections, the memory is 17MB, so if I disconnect all these 660 connections and make the number connections less than 660, e.g 550 the memory still keep 17MB, if I have more 660, e.g 1000 connections the tcserver will allocate more and keep it. Maybe it does not leak but it does not free. Something looks like memory pool.

This picture is captured from my server, the RES reaches to 1020MB and it is never freed. The number of playing clients is now 0.

doanquocnam · Answer 12 · Tue May 14 2024 14:01:08 GMT+0800 (China Standard Time)

I was reported that tcserver leaks memory. I just test with no media publishing. So that in this case I make many tcconn connections only & disconnect them. And then, after disconnecting all these tcconn (with no media publishing). And the memory what are allocated for these tcconn connections on tcserver are not freed . Even as you explained above, I wait for 90 second or 10 minutes or more (on my server all connections are disconnectd for a day) the memory are not freed.

Even on server with media publishing & tcconn connections play this stream, so after disconnecting, tcserver doesnt free memory.

Many thanks!! :)

Michael Thornburgh · Answer 13 · Tue May 14 2024 23:39:44 GMT+0800 (China Standard Time)

Maybe it does not leak but it does not free. Something looks like memory pool.

neither tcserver nor the com::zenomt foundation explicitly make memory pools. however, the C/C++ runtime might implement malloc/free and new/delete with memory pools and never release the memory back to the OS.

MacOS's C/C++ runtime does release completely freed memory regions back to the OS, so the process size can shrink. however, since some persistent objects in both tcserver and RTMFP itself are dynamically resized or reallocated as they grow or are used, some of those persistent objects might end up scattered around different memory regions, keeping those regions from being completely empty and releasable back to the OS. however, all of the "free" space in those memory regions can be allocated again (for example, when new clients connect to tcserver) without requiring new memory regions from the OS.

Michael Thornburgh · Answer 14 · Tue May 14 2024 23:49:52 GMT+0800 (China Standard Time)

The relay worker, I want it would work in a different way. E.g: 2 tcconn connections connect to the relay, and both these tcconn play the same live stream. That means the relay would create only one connection to tcserver.

Adobe Media Server called this an "Edge Server". this repo doesn't have such an "edge relay". i've taken note of your interest in one, but i don't foresee having time for that any time soon.

ROBERT MCDOWELL · Answer 15 · Tue May 14 2024 23:52:51 GMT+0800 (China Standard Time)

@zenomt this edge relay would be fantastic to be present with rtmfp-cpp

Michael Thornburgh · Answer 16 · Wed May 15 2024 00:02:33 GMT+0800 (China Standard Time)

your second is noted. one of the things i've been thinking about how to handle an edge server is how to handle all the other functions tcserver has (relay/broadcast and watching), as well as its permission model. an edge server that's just for fan-out of a stream would be straight-forward, but an edge server that could also handle publish-through, relay/broadcast/onRelay, and watch/onDisconnected is much bigger and would require accommodation in tcserver as well to support the notion of "downstream clients".

Michael Thornburgh · Answer 17 · Wed May 15 2024 01:06:23 GMT+0800 (China Standard Time)

@doanquocnam regarding the "memory leak" for this Issue #15, since i don't believe there is actually a leak, that what you're seeing is an artifact of how memory allocation in C/C++ runtimes work, and i don't think there's anything else to do, i intend to close this issue.

@doanquocnam @ROBERT-MCDOWELL regarding an "edge relay": i invite you to create a new Discussion (i'd prefer that over an Issue, since it's not a bug that there's no edge relay server in this repo :) ).