alibaba / MongoShake

MongoShake is a universal data replication platform based on MongoDB's oplog. Redundant replication and active-active replication are two most important functions. 基于mongodb oplog的集群复制工具,可以满足迁移和同步的需求,进一步实现灾备和多活功能。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

全量同步完成后,写入一个非法的checkpoint[9223372036854775807[2147483647, 4294967295]]

zhangxianaa opened this issue · comments

MongoShake的版本:v2.8.1
源和目的MongoDB的版本:v4.2.15分片集群
部分配置项:
sync_mode = all
mongo_urls =
mongo_cs_url =
mongo_s_url =mongodb://root:xxxxxx@192.169.7.81:40000/admin?connect=direct
tunnel = direct
tunnel.address = mongodb://root:xxxxxx@192.169.7.84:40000;
tunnel.message = raw
mongo_connect_mode = secondaryPreferred
filter.namespace.white=test
filter.ddl_enable = false
checkpoint.storage.db = mongoshake
checkpoint.storage.collection = ckpt_test
checkpoint.start_position = 1970-01-01T00:00:00Z
incr_sync.mongo_fetch_method = change_stream
日志:
[2023/07/06 17:28:55 CST] [INFO] metric[name[mongos] stage[full]] exit
[2023/07/06 17:28:55 CST] [INFO] try to set checkpoint with map[map[mongos:{0 9223372036854775807}]]
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.81:40000/admin?connect=direct successfully
[2023/07/06 17:28:55 CST] [INFO] mongos Regenerate checkpoint but won't persist. content: {"name":"mongos","ckpt":1,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}
[2023/07/06 17:28:55 CST] [INFO] mongos Record new checkpoint in MongoDB success [2147483647]
[2023/07/06 17:28:55 CST] [INFO] document syncer sync end
[2023/07/06 17:28:55 CST] [INFO] Close client with mongodb://root:
@192.169.7.84:40000
[2023/07/06 17:28:55 CST] [INFO] ------------------------full sync done!------------------------
[2023/07/06 17:28:55 CST] [INFO] finish full sync, start incr sync with timestamp: fullBeginTs[{"_data": "8264A6894E000000022B0229296E04"}], fullFinishTs[9223372036854775807[2147483647, 4294967295]]
[2023/07/06 17:28:55 CST] [INFO] start incr replication
[2023/07/06 17:28:55 CST] [INFO] RealSourceIncrSync[0]: url[mongodb://root:@192.169.7.81:40000/admin?connect=direct], name[mongos], startTimestamp[{"_data": "8264A6894E000000022B0229296E04"}]
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:
@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-0 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-1 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:
@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-2 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-3 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:
@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-4 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-5 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:
@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-6 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.84:40000 successfully
[2023/07/06 17:28:55 CST] [INFO] Collector-worker-7 start working with jobs batch queue. buffer capacity 64
[2023/07/06 17:28:55 CST] [INFO] Syncer[mongos] poll oplog syncer start. ckpt_interval[5000ms], gid[[]], shard_key[collection]
[2023/07/06 17:28:55 CST] [INFO] Oplog sync[mongos] create checkpoint manager with url[mongodb://root:
@192.169.7.81:40000/admin?connect=direct] table[mongoshake.ckpt_test] start-position[{"_data": "8264A6894E000000022B0229296E04"}[-1, -1]]
[2023/07/06 17:28:55 CST] [INFO] set query timestamp: {"_data": "8264A6894E000000022B0229296E04"}[-1, -1]
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.81:40000/admin?connect=direct successfully
[2023/07/06 17:28:55 CST] [INFO] mongos Load exist checkpoint. content {"name":"mongos","ckpt":9223372036854775807,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}
[2023/07/06 17:28:55 CST] [INFO] load checkpoint value: {"name":"mongos","ckpt":9223372036854775807,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}
[2023/07/06 17:28:55 CST] [INFO] persister replset[mongos] update fetch status to: store memory and apply
[2023/07/06 17:28:55 CST] [INFO] mongos Load exist checkpoint. content {"name":"mongos","ckpt":9223372036854775807,"version":2,"fetch_method":"","oplog_disk_queue":"","oplog_disk_queue_apply_finish_ts":1}
[2023/07/06 17:28:55 CST] [INFO] start EventReader[src:mongodb://root:
@192.169.7.81:40000/admin?connect=direct replset:mongos] fetcher with src[mongodb://root:@192.169.7.81:40000/admin?connect=direct] replica-name[mongos] query-ts[{"_data": "8264A6894E000000022B0229296E04"}[-1, -1]]
[2023/07/06 17:28:55 CST] [INFO] EventReader[src:mongodb://root:
@192.169.7.81:40000/admin?connect=direct replset:mongos] ensure network
[2023/07/06 17:28:55 CST] [INFO] New session to mongodb://root:@192.169.7.81:40000/admin?connect=direct successfully
[2023/07/06 17:28:55 CST] [INFO] new change stream with options: BatchSize[0] MaxAwaitTime[24h0m0s] StartAfter[{"_data": "8264A6894E000000022B0229296E04"}]
[2023/07/06 17:29:00 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:05 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:10 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:15 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:20 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:25 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
[2023/07/06 17:29:30 CST] [INFO] [name=mongos, stage=incr, get=1, filter=1, write_success=0, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={0[0, 0], 1970-01-01 08:00:00}]]
/省略部分日志...../
[2023/07/06 17:44:45 CST] [WARN] CheckpointOperation updated is not suitable. lowest [0]. current [9223372036854775807[2147483647, 4294967295]]. inputTs [0]. reason : smallest candidates is zero
[2023/07/06 17:44:45 CST] [INFO] New session to mongodb://root:
@192.169.7.84:40000 successfully
[2023/07/06 17:44:45 CST] [INFO] Replayer-7 Executor-7 doSync oplogRecords received[1] merged[1]. merge to 100.00% chunks
[2023/07/06 17:44:45 CST] [INFO] Collector-worker-7 transfer retransmit:false send [1] logs. reply_acked [7252639332605886465[1688636684, 1]], list_unack [0]
[2023/07/06 17:44:45 CST] [INFO] [name=mongos, stage=incr, get=2, filter=1, write_success=1, tps=1, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={7252639332605886465[1688636684, 1], 2023-07-06 17:44:44}]]
[2023/07/06 17:44:50 CST] [INFO] [name=mongos, stage=incr, get=2, filter=1, write_success=1, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={7252639332605886465[1688636684, 1], 2023-07-06 17:44:44}]]
[2023/07/06 17:44:55 CST] [INFO] CheckpointOperation calculated[7252639332605886465[1688636684, 1]] is smaller than value in memory[9223372036854775807[2147483647, 4294967295]]
[2023/07/06 17:44:55 CST] [INFO] [name=mongos, stage=incr, get=2, filter=1, write_success=1, tps=0, ckpt_times=0, lsn_ckpt={0[0, 0], 1970-01-01 08:00:00}, lsn_ack={7252639332605886465[1688636684, 1], 2023-07-06 17:44:44}]]

我也遇到了同样的问题:
全量同步完之后就写入了这样一个checkpoint(如下),并且ckpt值也不会更新,导致断点续传功能失效。
{
_id: ObjectId("64ec60c696a719ead730186d"),
name: 'mongos',
ckpt: 9223372036854776000,
fetch_method: '',
oplog_disk_queue: '',
oplog_disk_queue_apply_finish_ts: 1,
version: 2
}