Pegs CPU at 100%
JLSchuler99 opened this issue · comments
Love this container and run it in kubernetes.
The only issue is every few days/hours, the container starts using 100% of CPU across all cores on the node running it, requiring a restart.
If this issue was fixed it would be great software!
I use it also for extended periods and have noticed similar behavior. I’m actually watching it with Go’s profiler to see if I can catch the cause for this.
...I built a new image with the latest version of Go, so we'll see if that helps too.
The tagged image is itzg/mc-router:1.4.4-1
Just making a note of recent profiling I captured after mc-router had been running for over a week:
(pprof) top10
Showing nodes accounting for 419.95hrs, 89.91% of 467.09hrs total
Dropped 578 nodes (cum <= 2.34hrs)
Showing top 10 nodes out of 29
flat flat% sum% cum cum%
352.35hrs 75.43% 75.43% 386.32hrs 82.71% syscall.Syscall
10.75hrs 2.30% 77.74% 11.86hrs 2.54% runtime.ifaceeq
9.56hrs 2.05% 79.78% 15.30hrs 3.28% runtime.reentersyscall
8.19hrs 1.75% 81.54% 466.49hrs 99.87% github.com/itzg/mc-router/mcproto.ReadFrame
8.15hrs 1.75% 83.28% 8.15hrs 1.75% runtime.casgstatus
7.32hrs 1.57% 84.85% 10.69hrs 2.29% runtime.deferreturn
6.66hrs 1.43% 86.28% 449.01hrs 96.13% net.(*conn).Read
5.88hrs 1.26% 87.53% 17.71hrs 3.79% runtime.exitsyscall
5.87hrs 1.26% 88.79% 6.04hrs 1.29% runtime.newdefer
5.21hrs 1.12% 89.91% 8.10hrs 1.73% runtime.exitsyscallfast
Looks like an strace
of the pegged process might reveal why most time is spent in "Syscall".
Release 1.6.0 now includes a --debug
command line argument for helping to diagnose the initial frame/packet reading. That seems to be where high CPU is spent.
@JLSchuler99 the latest release https://github.com/itzg/mc-router/releases/tag/1.7.0 includes several fixes to do connection rate limiting and timing out of slow/stalled handshakes. I had finally discovered I could recreate the issue by rapidly refreshing the server list in the Minecraft client, so I was able to greatly reduce fix and test cycles.
Fantastic. I just deployed the new image, I'll let you know how it works out. Thanks for taking time to look into this issue.
@JLSchuler99 , I finally found the mystery packet that was getting the router into a tight loop. There is a legacy message type that even a modern client seems to send sporadically. Release 1.8.0 includes the handling of that.
I haven't had this problem in a few months. Seems like that fix did the trick. Closing this issue, thanks again.