Under certain disconnection circumstances, the copycat-client-io thread takes 100% CPU
JPWatson opened this issue · comments
James Watson commented
jstack stack traces alternates between these frames:
"copycat-client-io-1" #17 prio=5 os_prio=0 tid=0x00007f5e60755800 nid=0x7412 runnable [0x00007f5e4cd57000]
java.lang.Thread.State: RUNNABLE
at java.lang.Throwable.fillInStackTrace(Native Method)
at java.lang.Throwable.fillInStackTrace(Throwable.java:783)
- locked <0x00000000f10b3008> (a io.atomix.copycat.session.ClosedSessionException)
at java.lang.Throwable.<init>(Throwable.java:265)
at java.lang.Exception.<init>(Exception.java:66)
at java.lang.RuntimeException.<init>(RuntimeException.java:62)
at java.lang.IllegalStateException.<init>(IllegalStateException.java:55)
at io.atomix.copycat.session.ClosedSessionException.<init>(ClosedSessionException.java:29)
at io.atomix.copycat.client.session.ClientSessionSubmitter.submit(ClientSessionSubmitter.java:144)
at io.atomix.copycat.client.session.ClientSessionSubmitter.access$300(ClientSessionSubmitter.java:51)
at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt.lambda$fail$0(ClientSessionSubmitter.java:370)
at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt$$Lambda$109/2008009084.run(Unknown Source)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
at io.atomix.catalyst.concurrent.Runnables$$Lambda$31/1944702768.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
"copycat-client-io-1" #17 prio=5 os_prio=0 tid=0x00007f5e60755800 nid=0x7412 runnable [0x00007f5e4cd57000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:328)
at java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
at java.util.concurrent.ScheduledThreadPoolExecutor.execute(ScheduledThreadPoolExecutor.java:622)
at io.atomix.catalyst.concurrent.SingleThreadContext$1.execute(SingleThreadContext.java:29)
at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt.fail(ClientSessionSubmitter.java:370)
at io.atomix.copycat.client.session.ClientSessionSubmitter.submit(ClientSessionSubmitter.java:144)
at io.atomix.copycat.client.session.ClientSessionSubmitter.access$300(ClientSessionSubmitter.java:51)
at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt.lambda$fail$0(ClientSessionSubmitter.java:370)
at io.atomix.copycat.client.session.ClientSessionSubmitter$CommandAttempt$$Lambda$109/2008009084.run(Unknown Source)
at io.atomix.catalyst.concurrent.Runnables.lambda$logFailure$0(Runnables.java:20)
at io.atomix.catalyst.concurrent.Runnables$$Lambda$31/1944702768.run(Unknown Source)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Jonathan Hart commented
I can reproduce this pretty easily using ONOS.
I have a 3-node ONOS cluster. If I partition one node away, then bring it back to the cluster, that node will have persistent high CPU usage coming from the constant creation of ClosedSessionExceptions as shown in the first stack trace.
James Watson commented
Interesting - the behaviour I saw was on the client.
Jordan Halterman commented
This is fixed by #336 according to our tests