atomix / atomix

A Kubernetes toolkit for building distributed applications using cloud native principles

Home Page:https://atomix.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

onos to atomix get timeout

xinchengwuxian opened this issue · comments

Expected behavior
my atomix and onos on the same device, I compiler onos-2.2.4 with atomix-3.1.8(openjdk-11),
When I tested with 2000 devices, onos will search some stats messages from atomix,I will get many timeout,
This situation will make onos get oom
Actual behavior
when testing,I get this exception:

2020-10-27T19:47:46,355 | DEBUG | raft-partition-group-raft-6 | RaftSessionConnection            | 129 - io.atomix.utils - 3.1.8 | SessionClient{29}{type=AtomicCounterType{name=atomic-counter}, name=sys-clock-counter} - CommandRequest{session=29, sequence=1106859, operation=PrimitiveOperation{id=DefaultOperationId{id=incrementAndGet, type=COMMAND}, value=null}} failed! Reason: {}
java.util.concurrent.TimeoutException: Request type raft-partition-1-command timed out in 5000 milliseconds
        at io.atomix.cluster.messaging.impl.AbstractClientConnection$Callback.timeout(AbstractClientConnection.java:159) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:834) [?:?]

I print the log to try to find the problem, like this:

final class RemoteServerConnection extends AbstractServerConnection {
  private static final byte[] EMPTY_PAYLOAD = new byte[0];
  private final Logger log = LoggerFactory.getLogger(getClass());

  private final Channel channel;

  RemoteServerConnection(HandlerRegistry handlers, Channel channel) {
    super(handlers);
    this.channel = channel;
  }

  @Override
  public void reply(ProtocolRequest message, ProtocolReply.Status status, Optional<byte[]> payload) {
    ProtocolReply response = new ProtocolReply(
        message.id(),
        payload.orElse(EMPTY_PAYLOAD),
        status);
    log.info("RemoteServerConnection reply, message subject {} message type {} message id {} message status {}", message.subject(), message.type(), message.id(), status.name());
    channel.writeAndFlush(response, channel.voidPromise());
  }
}

Then,I found a strange problem,if this log is info, the Timeout will get fewer,But if log is not info or not added,The timeout will get more. So, I have two question:

  1. why timeout?
  2. What is the impact of adding logs

Environment

  • Atomix: [e.g. 3.1.8]
  • OS: [e.g. ubuntu-18.04]
  • JVM [e.g. openjdk-11]