redis / riot

🧨 Get data in & out of Redis with RIOT

Home Page:http://redis.github.io/riot

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[RIOT-Redis] Error during live replication: "io.lettuce.core.output.StatusOutput does not support set(long)"

StephanHener opened this issue Β· comments

Hey,

We are facing an error when attempting to do a live replication. During the migration the logs show this error:

Encountered an error executing step snapshot-replication in job live-replication: io.lettuce.core.output.StatusOutput does not support set(long)

This error seems to happen a couple of times ultimately leading to the connection closing and riot shutting down

Here is the command we use for replication

riot-redis --info -h <source_host> -p <source_port> -a <source_password> -n <source_database> --timeout 60 --metrics replicate -h <target_host> -p <target_port> -a <target_password> --tls --tls-verify NONE -n 0 --metrics --mode live --batch 100 --scan-count 2000 --reader-threads 1 --reader-batch 100 --reader-queue 2000 --scan-match "*" --threads 1 --no-verify

The source is a read only replica running redis 6.2.8, the target is running 6.2.6.
We use riot redis 2.18.5 in a docker container via fieldengineering/riot-redis:v2.18.5, but we also have the same error running it directly locally.

Here is an example info log:

Job: [FlowJob: [name=live-replication]] launched with the following parameters: [{}]
Executing step: [snapshot-replication]
Executing step: [live-replication]
Listening ? % β”‚β–ˆ β”‚ 0/? (0:00:00 / ?) ?/sJob: [SimpleJob: [name=live-reader]] launched with the following parameters: [{}]
Executing step: [live-reader]
Scanning 0% β”‚ β”‚ 0/41558 (0:00:00 / ?) ?/sJob: [SimpleJob: [name=scan-reader]] launched with the following parameters: [{}]
Executing step: [scan-reader]
Listening ? % β”‚ β–ˆ β”‚ 9/? (0:00:00 / ?) ?/s
Encountered an error executing step live-replication in job live-replication: io.lettuce.core.output.StatusOutput does not support set(long)
Step: [live-replication] executed in 845ms
Listening ? % β”‚ β–ˆ β”‚ 27/? (0:00:00 / ?) ?/s
Scanning 1% β”‚β–Ž β”‚ 700/41558 (0:00:00 / 0:00:35) ?/s
Scanning 3% β”‚β–‹ β”‚ 1500/41558 (0:00:00 / 0:00:24) ?/s
Scanning 5% β”‚β–Š β”‚ 2200/41558 (0:00:01 / 0:00:21) 2200.0/s
Scanning 5% β”‚β–Š β”‚ 2400/41558 (0:00:01 / 0:00:24) 2400.0/s
Scanning 7% β”‚β–ˆβ– β”‚ 3200/41558 (0:00:01 / 0:00:21) 3200.0/s
Scanning 9% β”‚β–ˆβ– β”‚ 4000/41558 (0:00:02 / 0:00:19) 2000.0/s
Scanning 11% β”‚β–ˆβ–Š β”‚ 4900/41558 (0:00:02 / 0:00:17) 2450.0/s
Scanning 13% β”‚β–ˆβ–ˆ β”‚ 5700/41558 (0:00:02 / 0:00:16) 2850.0/s
Scanning 15% β”‚β–ˆβ–ˆβ–Ž β”‚ 6400/41558 (0:00:03 / 0:00:16) 2133.3/s
Scanning 17% β”‚β–ˆβ–ˆβ–‹ β”‚ 7300/41558 (0:00:03 / 0:00:15) 2433.3/s
Scanning 19% β”‚β–ˆβ–ˆβ–‰ β”‚ 8000/41558 (0:00:03 / 0:00:15) 2666.7/s
Scanning 21% β”‚β–ˆβ–ˆβ–ˆβ– β”‚ 8900/41558 (0:00:03 / 0:00:14) 2966.7/s
Scanning 23% β”‚β–ˆβ–ˆβ–ˆβ–Œ β”‚ 9800/41558 (0:00:04 / 0:00:13) 2450.0/s
Scanning 25% β”‚β–ˆβ–ˆβ–ˆβ–Š β”‚ 10500/41558 (0:00:04 / 0:00:13) 2625.0/s
Scanning 27% β”‚β–ˆβ–ˆβ–ˆβ–ˆ β”‚ 11400/41558 (0:00:04 / 0:00:12) 2850.0/s
Scanning 29% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–Ž β”‚ 12100/41558 (0:00:05 / 0:00:12) 2420.0/s
Scanning 30% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–Œ β”‚ 12500/41558 (0:00:05 / 0:00:12) 2500.0/s
Exception while closing step execution resources in step live-replication in job live-replication
Scanning 32% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–‰ β”‚ 13700/41558 (0:00:07 / 0:00:15) 1957.1/s
Scanning 34% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ– β”‚ 14400/41558 (0:00:07 / 0:00:14) 2057.1/s
Scanning 35% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž β”‚ 14800/41558 (0:00:08 / 0:00:14) 1850.0/s
Encountered an error executing step snapshot-replication in job live-replication: io.lettuce.core.output.StatusOutput does not support set(long)
Step: [snapshot-replication] executed in 8s695ms
Scanning 37% β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ β”‚ 15500/41558 (0:00:08 / 0:00:14) 1937.5/sException while closing step execution resources in step snapshot-replication in job live-replication
Job: [FlowJob: [name=live-replication]] completed with the following parameters: [{}] and the following status: [FAILED] in 13s724ms
Encountered an error executing step live-reader in job live-reader: Connection closed
Step: [live-reader] executed in 13s755ms
Closing with items still in queue
Exception while closing step execution resources in step live-reader in job live-reader

Running the tool with the debug flag produces the following stacktrace for the error:

io.lettuce.core.RedisException: java.lang.UnsupportedOperationException: io.lettuce.core.output.StatusOutput does not support set(long)
at io.lettuce.core.internal.Exceptions.fromSynchronization(Exceptions.java:106)
at io.lettuce.core.internal.Futures.awaitAll(Futures.java:226)
at io.lettuce.core.LettuceFutures.awaitAll(LettuceFutures.java:59)
at com.redis.spring.batch.RedisItemWriter.write(RedisItemWriter.java:44)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.writeItems(SimpleChunkProcessor.java:193)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.doWrite(SimpleChunkProcessor.java:159)
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor$3.doWithRetry(FaultTolerantChunkProcessor.java:348)
at org.springframework.retry.support.RetryTemplate.doExecute(RetryTemplate.java:329)
at org.springframework.retry.support.RetryTemplate.execute(RetryTemplate.java:255)
at org.springframework.batch.core.step.item.BatchRetryTemplate.execute(BatchRetryTemplate.java:217)
at org.springframework.batch.core.step.item.FaultTolerantChunkProcessor.write(FaultTolerantChunkProcessor.java:444)
at org.springframework.batch.core.step.item.SimpleChunkProcessor.process(SimpleChunkProcessor.java:217)
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:77)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:407)
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:331)
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:140)
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:273)
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:82)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:262)
at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate.getNextResult(TaskExecutorRepeatTemplate.java:125)
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215)
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:145)
at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:258)
at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:208)
at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:152)
at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:68)
at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:68)
at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:169)
at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:144)
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:94)
at org.springframework.batch.core.job.flow.support.state.SplitState$1.call(SplitState.java:91)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.UnsupportedOperationException: io.lettuce.core.output.StatusOutput does not support set(long)
at io.lettuce.core.output.CommandOutput.set(CommandOutput.java:107)
at io.lettuce.core.protocol.RedisStateMachine.safeSet(RedisStateMachine.java:778)
at io.lettuce.core.protocol.RedisStateMachine.handleInteger(RedisStateMachine.java:404)
at io.lettuce.core.protocol.RedisStateMachine$State$Type.handle(RedisStateMachine.java:206)
at io.lettuce.core.protocol.RedisStateMachine.doDecode(RedisStateMachine.java:334)
at io.lettuce.core.protocol.RedisStateMachine.decode(RedisStateMachine.java:295)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:842)
at io.lettuce.core.protocol.CommandHandler.decode0(CommandHandler.java:793)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:767)
at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:659)
at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:599)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1373)
at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1236)
at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1285)
at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:519)
at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:458)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:280)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more

Some other notes:

  • running snapshot mode instead of live mode prodcues the same error
    • after aborting the target contains significant less keys that the source, so it looks like it dies before even starting the ongoing migration
  • using --dry-run option works without error, looks like this only happens when trying to write to the target cluster
  • in terms of key datatype structure we have roughly 60k keys in redis:
    • roughly 60% are strings that are usually just created, updated once or twice and deleted
    • other roughly 40% are zsets with very low entries (<10) that are rarely update
    • the rest are
      • static keys that don't change
      • zsets that are updated frequently with a higher amount entries (most of them under 500) and one with ~20k entries with represent job queues, which is a pattern according to documentation riot redis might struggle with, but not sure if this is relevant yet as we seem to fail at the initial replication already

Unfortunately we didn't find much about that error online. Any idea what could cause this?

Does the error still happen in RIOT 3.x?

Hey,

We have already migrated our Redis instances, and we decided to use a different approach without riot-redis.

I am closing the issue.