atomix / copycat

A novel implementation of the Raft consensus algorithm

Home Page:http://atomix.io/copycat

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

StackOverflowError when large number of requests sent

ccleve opened this issue · comments

I'm load testing Copycat by sending a large number of small, asynchronous requests. After a certain number of requests accumulate, the system throws a StackOverflowError. I've pasted the stack trace below. If anyone wants to work on this, I'll generate a test case.

2016-12-14 17:35:50 DESKTOP-KVRE8GD io.atomix.catalyst.concurrent.SingleThreadContext 31283 ERROR An uncaught exception occurred
java.lang.StackOverflowError: null
at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1286)
at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1231)
at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1294)
at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1231)
at java.io.ObjectOutputStream.writeNonProxyDesc(ObjectOutputStream.java:1294)
at java.io.ObjectOutputStream.writeClassDesc(ObjectOutputStream.java:1231)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1427)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548)
at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509)
at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at io.atomix.catalyst.serializer.util.JavaSerializableSerializer.write(JavaSerializableSerializer.java:36)
at io.atomix.catalyst.serializer.Serializer.writeById(Serializer.java:896)
at io.atomix.catalyst.serializer.Serializer.writeObject(Serializer.java:885)
at io.atomix.copycat.server.storage.entry.CommandEntry.writeObject(CommandEntry.java:79)
at io.atomix.catalyst.serializer.util.CatalystSerializableSerializer.write(CatalystSerializableSerializer.java:51)
at io.atomix.catalyst.serializer.util.CatalystSerializableSerializer.write(CatalystSerializableSerializer.java:45)
at io.atomix.catalyst.serializer.Serializer.writeById(Serializer.java:896)
at io.atomix.catalyst.serializer.Serializer.writeObject(Serializer.java:885)
at io.atomix.catalyst.serializer.Serializer.writeObject(Serializer.java:820)
at io.atomix.copycat.server.storage.Segment.append(Segment.java:311)
at io.atomix.copycat.server.storage.Log.append(Log.java:294)
at io.atomix.copycat.server.state.LeaderState.applyCommand(LeaderState.java:516)
at io.atomix.copycat.server.state.LeaderState.lambda$sequenceCommand$169(LeaderState.java:493)
at io.atomix.copycat.server.state.ServerSessionContext.setRequestSequence(ServerSessionContext.java:267)
at io.atomix.copycat.server.state.LeaderState.applyCommand(LeaderState.java:524)
at io.atomix.copycat.server.state.LeaderState.lambda$sequenceCommand$169(LeaderState.java:493)
at io.atomix.copycat.server.state.ServerSessionContext.setRequestSequence(ServerSessionContext.java:267)
at io.atomix.copycat.server.state.LeaderState.applyCommand(LeaderState.java:524)
at io.atomix.copycat.server.state.LeaderState.lambda$sequenceCommand$169(LeaderState.java:493)
at io.atomix.copycat.server.state.ServerSessionContext.setRequestSequence(ServerSessionContext.java:267)
at io.atomix.copycat.server.state.LeaderState.applyCommand(LeaderState.java:524)
at io.atomix.copycat.server.state.LeaderState.lambda$sequenceCommand$169(LeaderState.java:493)
at io.atomix.copycat.server.state.ServerSessionContext.setRequestSequence(ServerSessionContext.java:267)
at io.atomix.copycat.server.state.LeaderState.applyCommand(LeaderState.java:524)
at io.atomix.copycat.server.state.LeaderState.lambda$sequenceCommand$169(LeaderState.java:493)
at io.atomix.copycat.server.state.ServerSessionContext.setRequestSequence(ServerSessionContext.java:267)

(Last three lines repeat several hundred more times.)

I think the right solution here is to add some kind of backpressure. When you call copycatClient.submit(operation), it should block temporarily until the system is ready to accept more requests.

Hmm... so the problem here is in sequencing on the leader. Under high load, if a request is lost and a sequence number is skipped, a queue of requests grows fairly large. Once the lost request is resent with the missing sequence number, it causes the queue to be drained into the log which causes a huge stack. We should probably reject requests with sequence numbers x larger than the last in sequence request to prevent memory consumption issues from the queued requests anyways, and that should reduce the stack size when the queue is emptied. There may also be a way to empty the queue without recursion.

This was fixed in #267