l42111996 / java-Kcp

基于java的netty实现的可靠udp网络库(kcp算法),包含fec实现,可用于游戏,视频,加速等业务

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

线上出现cpu100%的问题

minerva78 opened this issue · comments

线上运行一段时间后,时间不定,可能出现cpu 100%的问题,目前没排查到哪里触发
以下是某个线程当时的状态stack
"epollEventLoopGroup-5-3" #28 prio=10 os_prio=0 cpu=14498125.28ms elapsed=61602.15s tid=0x00007f88917db800 nid=0x7262 runnable [0x00007f8818dcc000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.10/LockSupport.java:357)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
at com.ath.io.kcp.base.threadPool.thread.DisruptorSingleExecutor.execute(DisruptorSingleExecutor.java:132)
at com.ath.io.kcp.base.baseKcp.ServerChannelHandler.channelRead(ServerChannelHandler.java:87)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.EpollDatagramChannel.read(EpollDatagramChannel.java:679)
at io.netty.channel.epoll.EpollDatagramChannel.access$100(EpollDatagramChannel.java:58)
at io.netty.channel.epoll.EpollDatagramChannel$EpollDatagramChannelUnsafe.epollInReady(EpollDatagramChannel.java:497)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:475)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)

去掉disruptor库,使用默认eventloop解决了

其实这个问题是,你使用的 disruptor 等待策略采用了 YieldingWaitStrategy 这种,并不是默认的 BlockingWaitStrategy。
YieldingWaitStrategy 仅适合在消费者线程远小于 cpu的物理核心数时使用,它会采用 自旋 + yield 的方式,肯定cpu 占用高了。

其实这个问题是,你使用的 disruptor 等待策略采用了 YieldingWaitStrategy 这种,并不是默认的 BlockingWaitStrategy。 YieldingWaitStrategy 仅适合在消费者线程远小于 cpu的物理核心数时使用,它会采用 自旋 + yield 的方式,肯定cpu 占用高了。

默认使用的就是 BlockingWaitStrategy