同步集群后发现缺少sensitive_event集合，但不清楚什么原因造成的

Question

同步集群后发现缺少sensitive_event集合，但不清楚什么原因造成的

1029542886 opened this issue 2 months ago · comments

1、mongo-shake-v2.8.4
2、目的集合结果（缺少sensitive_event集合）：
replSet2001:PRIMARY> show tables;

3、源集合结果：
replSet2001:PRIMARY> show tables;
sensitive_event

4、查看日志：

collector.log.1:561:[2024/04/30 14:37:38 CST] [INFO] DBSyncer id[0] source[mongodb://admin:@xxxxx:2001] target[mongodb://admin:@xxxxx:2001,XXXXXX:2001,XXXXX:2001] startTime[2024-04-30 14:37:21.304367154 +0800 CST m=+0.371448341] collExecutor-0 sync ns {argus_antifraud sensitive_event} to {argus_antifraud sensitive_event} begin
collector.log.1:564:[2024/04/30 14:37:38 CST] [INFO] NewDocumentSplitter db[argus_antifraud] col[sensitive_event] res[{0 0 4096}], pieceByteSize[0]
collector.log.1:565:[2024/04/30 14:37:38 CST] [INFO] splitter[DocumentSplitter src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] count[0] pieceByteSize[0 MB] pieceNumber[0]] disable split or no need
collector.log.1:566:[2024/04/30 14:37:38 CST] [INFO] splitter[DocumentSplitter src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] count[0] pieceByteSize[0 MB] pieceNumber[0]] exits
collector.log.1:567:[2024/04/30 14:37:38 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]]] client is empty, create one
collector.log.1:569:[2024/04/30 14:37:38 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] generates new cursor
collector.log.1:570:[2024/04/30 14:37:38 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] finish
collector.log.1:571:[2024/04/30 14:37:38 CST] [INFO] splitter reader finishes: DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]
collector.log.1:572:[2024/04/30 14:37:38 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] close
collector.log.1:577:[2024/04/30 14:37:38 CST] [INFO] DBSyncer id[0] source[mongodb://admin:@1Xxxxxx:2001] target[mongodb://admin:@xxxxxx:2001,xxxxxx:2001,xxxxxx:2001] startTime[2024-04-30 14:37:21.304367154 +0800 CST m=+0.371448341] collExecutor-0 sync ns {argus_antifraud sensitive_event} to {argus_antifraud sensitive_event} successful. db syncer-0 progress 96%
collector.log.2024-04-29.001:36:[2024/04/29 20:52:35 CST] [INFO] all namespace: map[{argus_antifraud afs_feature_offline_calculate_result}:{} {argus_antifraud area_district}:{} {argus_antifraud area_district_polyline}:{} {argus_antifraud areas_info}:{} {argus_antifraud banned_record}:{} {argus_antifraud banned_rule_config}:{} {argus_antifraud credit_loan_interval}:{} {argus_antifraud custom_feature_config}:{} {argus_antifraud event_config}:{} {argus_antifraud event_field_code}:{} {argus_antifraud event_info_history}:{} {argus_antifraud event_info_last7d}:{} {argus_antifraud event_info_loading}:{} {argus_antifraud feature_history_result}:{} {argus_antifraud feature_info}:{} {argus_antifraud home_screen_config}:{} {argus_antifraud mapping_config}:{} {argus_antifraud naming_list_config}:{} {argus_antifraud naming_list_info}:{} {argus_antifraud operate_record}:{} {argus_antifraud param_config}:{} {argus_antifraud phone_info_base}:{} {argus_antifraud rule_library_config}:{} {argus_antifraud sensitive_event}:{} {argus_antifraud system_enum_config}:{} {argus_antifraud tag_config_org}:{} {argus_antifraud tigger_rule_list}:{} {argus_antifraud users}:{}]
collector.log.2024-04-29.001:47:[2024/04/29 20:52:35 CST] [INFO] index namespace list: [{argus_antifraud event_info_loading} {argus_antifraud param_config} {argus_antifraud phone_info_base} {argus_antifraud system_enum_config} {argus_antifraud feature_info} {argus_antifraud naming_list_config} {argus_antifraud naming_list_info} {argus_antifraud event_info_history} {argus_antifraud area_district_polyline} {argus_antifraud mapping_config} {argus_antifraud event_config} {argus_antifraud banned_rule_config} {argus_antifraud feature_history_result} {argus_antifraud tag_config_org} {argus_antifraud credit_loan_interval} {argus_antifraud event_info_last7d} {argus_antifraud users} {argus_antifraud banned_record} {argus_antifraud event_field_code} {argus_antifraud rule_library_config} {argus_antifraud afs_feature_offline_calculate_result} {argus_antifraud operate_record} {argus_antifraud areas_info} {argus_antifraud area_district} {argus_antifraud tigger_rule_list} {argus_antifraud custom_feature_config} {argus_antifraud sensitive_event} {argus_antifraud home_screen_config}]
collector.log.2024-04-29.001:64:[2024/04/29 20:52:35 CST] [INFO] collection[{argus_antifraud sensitive_event}] -> [[{v 2} {key [{_id 1}]} {name id} {ns argus_antifraud.sensitive_event}]]
collector.log.2024-04-29.001:110:[2024/04/29 20:52:35 CST] [INFO] Create indexes for ns {argus_antifraud sensitive_event} of dest mongodb finish
collector.log.2024-04-29.001:561:[2024/04/29 20:52:52 CST] [INFO] DBSyncer id[0] source[mongodb://admin:@1Xxxxxx:2001] target[mongodb://admin:@xxxxxx:2001,xxxxxx:2001,xxxxxx:2001] startTime[2024-04-29 20:52:35.468433857 +0800 CST m=+0.433298839] collExecutor-0 sync ns {argus_antifraud sensitive_event} to {argus_antifraud sensitive_event} begin
collector.log.2024-04-29.001:564:[2024/04/29 20:52:52 CST] [INFO] NewDocumentSplitter db[argus_antifraud] col[sensitive_event] res[{0 0 4096}], pieceByteSize[0]
collector.log.2024-04-29.001:565:[2024/04/29 20:52:52 CST] [INFO] splitter[DocumentSplitter src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] count[0] pieceByteSize[0 MB] pieceNumber[0]] disable split or no need
collector.log.2024-04-29.001:566:[2024/04/29 20:52:52 CST] [INFO] splitter[DocumentSplitter src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] count[0] pieceByteSize[0 MB] pieceNumber[0]] exits
collector.log.2024-04-29.001:567:[2024/04/29 20:52:52 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]]] client is empty, create one
collector.log.2024-04-29.001:569:[2024/04/29 20:52:52 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] generates new cursor
collector.log.2024-04-29.001:570:[2024/04/29 20:52:52 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] finish
collector.log.2024-04-29.001:571:[2024/04/29 20:52:52 CST] [INFO] splitter reader finishes: DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]
collector.log.2024-04-29.001:572:[2024/04/29 20:52:52 CST] [INFO] reader[DocumentReader id[0], src[mongodb://admin:@1Xxxxxx:2001] ns[{argus_antifraud sensitive_event}] query[map[]] docCursorId[0]] close
collector.log.2024-04-29.001:577:[2024/04/29 20:52:52 CST] [INFO] DBSyncer id[0] source[mongodb://admin:@1Xxxxxx:2001] target[mongodb://admin:@xxxxxx:2001,xxxxxx:2001,xxxxxx:2001] startTime[2024-04-29 20:52:35.468433857 +0800 CST m=+0.433298839] collExecutor-0 sync ns {argus_antifraud sensitive_event} to {argus_antifraud sensitive_event} successful. db syncer-0 progress 96%

5、查看监控：
curl -s http://localhost/progress | python -m json.tool
{
"collection_metric": {
"argus_antifraud.afs_feature_offline_calculate_result": "100.00% (5520/5520)",
"argus_antifraud.area_district": "100.00% (3210/3210)",
"argus_antifraud.area_district_polyline": "100.00% (118/118)",
"argus_antifraud.areas_info": "100.00% (359/359)",
"argus_antifraud.banned_record": "100.00% (477/477)",
"argus_antifraud.banned_rule_config": "100.00% (804/804)",
"argus_antifraud.credit_loan_interval": "100.00% (22/22)",
"argus_antifraud.custom_feature_config": "100.00% (50/50)",
"argus_antifraud.event_config": "100.00% (18/18)",
"argus_antifraud.event_field_code": "100.00% (73/73)",
"argus_antifraud.event_info_history": "100.00% (11292/11292)",
"argus_antifraud.event_info_last7d": "100.00% (3/3)",
"argus_antifraud.event_info_loading": "100.00% (14365/14365)",
"argus_antifraud.feature_history_result": "100.00% (90191/90191)",
"argus_antifraud.feature_info": "100.00% (472/472)",
"argus_antifraud.home_screen_config": "100.00% (1/1)",
"argus_antifraud.mapping_config": "100.00% (990/990)",
"argus_antifraud.naming_list_config": "100.00% (74/74)",
"argus_antifraud.naming_list_info": "100.00% (2033/2033)",
"argus_antifraud.operate_record": "100.00% (30/30)",
"argus_antifraud.param_config": "100.00% (41/41)",
"argus_antifraud.phone_info_base": "100.00% (503912/503912)",
"argus_antifraud.rule_library_config": "100.00% (7/7)",
"argus_antifraud.sensitive_event": "100% (0/0)",
"argus_antifraud.system_enum_config": "100.00% (11/11)",
"argus_antifraud.tag_config_org": "100.00% (4/4)",
"argus_antifraud.tigger_rule_list": "100.00% (125/125)",
"argus_antifraud.users": "100% (0/0)"
},
"finished_collection_number": 28,
"processing_collection_number": 0,
"progress": "100.00%",
"total_collection_number": 28,
"wait_collection_number": 0
}

6、文件配置

if you have any problem, please visit https://github.com/alibaba/MongoShake/wiki/FAQ

for the detail explanation, please visit xxxx

如果有问题，请先查看FAQ文档以及wiki上的说明。

关于各个参数的详细说明，请参考：xxx

current configuration version, do not modify.

当前配置文件的版本号，请不要修改该值。

conf.version = 10

--------------------------- global configuration ---------------------------

collector name

id用于输出pid文件等信息。

id = mongoshake

high availability option.

enable master election if set true. only one mongoshake can become master

and do sync, the others will wait and at most one of them become master once

previous master die. The master information stores in the `mongoshake` db in the source

database by default.

This option is useless when there is only one mongoshake running.

如果开启主备mongoshake拉取同一个源端，此参数需要开启。

master_quorum = false

http api interface. Users can use this api to monitor mongoshake.

`curl 127.0.0.1:9100`.

We also provide a restful tool named "mongoshake-stat" to

print ack, lsn, checkpoint and qps information based on this api.

usage: `./mongoshake-stat --port=9100`

全量和增量的restful监控端口，可以用curl查看内部监控metric统计情况。详见wiki。

full_sync.http_port = 9101
incr_sync.http_port = 9100

profiling on net/http/profile

profiling端口，用于查看内部go堆栈。

system_profile_port = 9200

global log level: debug, info, warning, error. lower level message will be filter

log.level = info

log directory. log and pid file will be stored into this file.

if not set, default is "./logs/"

log和pid文件的目录，如果不设置默认打到当前路径的logs目录。

log.dir =

log file name.

log文件名。

log.file = collector.log

log flush enable. If set false, logs may not be print when exit. If

set true, performance will be decreased extremely

设置log刷新，false表示包含缓存，如果true那么每条log都会直接刷屏，但对性能有影响；

反之，退出不一定能打印所有的log，调试时建议配置true。

log.flush = false

sync mode: all/full/incr. default is incr.

all means full synchronization + incremental synchronization.

full means full synchronization only.

incr means incremental synchronization only.

同步模式，all表示全量+增量同步，full表示全量同步，incr表示增量同步。

sync_mode = all

connect source mongodb, set username and password if enable authority. Please note: password shouldn't contain '@'.

split by comma(,) if use multiple instance in one replica-set. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC

split by semicolon(;) if sharding enable. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ

源MongoDB连接串信息，逗号分隔同一个副本集内的结点，分号分隔分片sharding实例，免密模式

可以忽略“username:password@”，注意，密码里面不能含有'@'符号。

举例：

副本集：mongodb://username1:password1@primaryA,secondaryB,secondaryC

分片集：mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ

mongo_urls = mongodb://admin:SNunPBENqNaTgqr@xxxxxx:2001

please fill the source config server url if source mongodb is sharding.

mongo_cs_url =

please give at least one mongos address if source is sharding.

如果源端采用change stream拉取，这里还需要配置至少一个mongos的地址，多个mongos地址以逗号（,）分割

mongo_s_url =

enable source ssl

mongo_ssl_root_ca_file =

tunnel pipeline type. now we support rpc,file,kafka,mock,direct

通道模式。

tunnel = direct

tunnel target resource url

for rpc. this is remote receiver socket address

for tcp. this is remote receiver socket address

for file. this is the file path, for instance "data"

for kafka. this is the topic and brokers address which split by comma, for

instance: topic@brokers1,brokers2, default topic is "mongoshake"

for mock. this is uesless

for direct. this is target mongodb address which format is the same as `mongo_urls`. If

the target is sharding, this should be the mongos address.

direct模式用于直接写入MongoDB，其余模式用于一些分析，或者远距离传输场景，

注意，如果是非direct模式，需要通过receiver进行解析，具体参考FAQ文档。

此处配置通道的地址，格式与mongo_urls对齐。

tunnel.address = mongodb://admin:SNunPBENqNaTgqr@xxxxxx:2001,xxxxxx:2001,xxxxxx:2001

the message format in the tunnel, used when tunnel is kafka.

"raw": batched raw data format which has good performance but encoded so that users

should parse it by receiver.

"json": single oplog format by json.

"bson": single oplog format by bson.

通道数据的类型，只用于kafka和file通道类型。

raw是默认的类型，其采用聚合的模式进行写入和

读取，但是由于携带了一些控制信息，所以需要专门用receiver进行解析。

json以json的格式写入kafka，便于用户直接读取。

bson以bson二进制的格式写入kafka。

tunnel.message = raw

how many partitions will be written, use some hash function in "incr_sync.shard_key".

如果目的端是kafka，最多启用多少个partition，最大不超过"incr_sync.worker"。默认1

tunnel.kafka.partition_number = 1

tunnel json format, it'll only take effect in the case of tunnel.message = json

and tunnel == kafka. Set canonical_extended_json if you want to use "Canonical

Extended JSON Format", #559.

写入异构通道的json格式。如果希望使用Canonical Extended Json Format，则设置为

canonical_extended_json

tunnel.json.format =

if tunnel == driect or kafka and enable ssl

tunnel.mongo_ssl_root_ca_file =

connect mode:

primary: fetch data from primary.

secondaryPreferred: fetch data from secondary if has, otherwise primary.(default)

standalone: fetch data from given 1 node, no matter primary, secondary or hidden. This is only

support when tunnel type is direct.

连接模式，primary表示从主上拉取，secondaryPreferred表示优先从secondary拉取（默认建议值），

standalone表示从任意单个结点拉取。

mongo_connect_mode = secondaryPreferred

filter db or collection namespace. at most one of these two parameters can be given.

if the filter.namespace.black is not empty, the given namespace will be

filtered while others namespace passed.

if the filter.namespace.white is not empty, the given namespace will be

passed while others filtered.

all the namespace will be passed if no condition given.

db and collection connected by the dot(.).

different namespaces are split by the semicolon(;).

filter: filterDbName1.filterCollectionName1;filterDbName2

黑白名单过滤，目前不支持正则，白名单表示通过的namespace，黑名单表示过滤的namespace，

不能同时指定。分号分割不同namespace，每个namespace可以是db，也可以是db.collection。

filter.namespace.black =
filter.namespace.white =argus_antifraud

some databases like "admin", "local", "mongoshake", "config", "system.views" are

filtered, users can enable these database based on some special needs.

different database are split by the semicolon(;).

e.g., admin;mongoshake.

pay attention: collection isn't support like "admin.xxx" except "system.views"

正常情况下，不建议配置该参数，但对于有些非常特殊的场景，用户可以启用admin，mongoshake等库的同步，

以分号分割，例如：admin;mongoshake。

filter.pass.special.db =

only transfer oplog commands for syncing. represent

by oplog.op are "i","d","u".

DDL will be transferred if disable like create index, drop databse,

transaction in mongodb 4.0.

是否需要开启DDL同步，true表示开启，源是sharding暂时不支持开启。

如果目的端是sharding，暂时不支持applyOps命令，包括事务。

filter.ddl_enable = true

filter oplog gid if enabled.

如果MongoDB启用了gid，但是目的端MongoDB不支持gid导致同步会失败，可以启用gid过滤，将会去掉gid字段。

谨慎建议开启，shake本身性能受损很大。

filter.oplog.gids = false

checkpoint info, used in resuming from break point.

checkpoint存储信息，用于支持断点续传。

context.storage.url is used to mark the checkpoint store database. E.g., mongodb://127.0.0.1:20070

if not set, checkpoint will be written into source mongodb(db=mongoshake)

checkpoint的具体写入的MongoDB地址，如果不配置，对于副本集和分片集群都将写入源库(db=mongoshake)

2.4版本以后不需要配置为源端cs的地址。

checkpoint.storage.url = mongodb://admin:SNunPBENqNaTgqr@xxxxxx:2001,xxxxxx:2001,xxxxxx:2001

checkpoint db's name.

checkpoint存储的db的名字

checkpoint.storage.db = mongoshake

checkpoint collection's name.

checkpoint存储的表的名字，如果启动多个mongoshake拉取同一个源可以修改这个表名以防止冲突。

checkpoint.storage.collection = ckpt_default

set if enable ssl

checkpoint.storage.url.mongo_ssl_root_ca_file =

real checkpoint: the fetching oplog position.

pay attention: this is UTC time which is 8 hours latter than CST time. this

variable will only be used when checkpoint is not exist.

本次开始拉取的位置，如果checkpoint已经存在（位于上述存储位置）则该参数无效，

如果需要强制该位置开始拉取，需要先删除原来的checkpoint，详见FAQ。

若checkpoint不存在，且该值为1970-01-01T00:00:00Z，则会拉取源端现有的所有oplog。

若checkpoint不存在，且该值不为1970-01-01T00:00:00Z，则会先检查源端oplog最老的时间是否

大于给定的时间，如果是则会直接报错退出。

checkpoint.start_position = 1970-01-01T00:00:00Z

transform from source db or collection namespace to dest db or collection namespace.

at most one of these two parameters can be given.

transform: fromDbName1.fromCollectionName1:toDbName1.toCollectionName1;fromDbName2:toDbName2

转换命名空间，比如a.b同步后变成c.d，谨慎建议开启，比较耗性能。

transform.namespace =

--------------------------- full sync configuration ---------------------------

the number of collection concurrence

并发最大拉取的表个数，例如，6表示同一时刻shake最多拉取6个表。

#full_sync.reader.collection_parallel = 6
full_sync.reader.collection_parallel = 1

the number of document writer thread in each collection.

同一个表内并发写的线程数，例如，8表示对于同一个表，将会有8个写线程进行并发写入。

full_sync.reader.write_document_parallel = 8

full_sync.reader.write_document_parallel = 1

number of documents in a batch insert in a document concurrence

目的端写入的batch大小，例如，128表示一个线程将会一次聚合128个文档然后再写入。

full_sync.reader.document_batch_size = 128

number of documents in a batch in fetch from source db

源端拉取batch中最大条数

full_sync.reader.fetch_batch_size = 1000

max number of fetching thread per table. default is 1

单个表最大拉取的线程数，默认是单线程拉取。需要具备splitVector权限。

注意：对单个表来说，仅支持索引对应的value是同种类型，如果有不同类型请勿启用该配置项！

full_sync.reader.parallel_thread = 1

the parallel query index if set full_sync.reader.parallel_thread. index should only has

1 field.

如果设置了full_sync.reader.parallel_thread，还需要设置该参数，并行拉取所扫描的index，value

必须是同种类型。对于副本集，建议设置_id；对于集群版，建议设置shard_key。key只能有1个field。

full_sync.reader.parallel_index = _id

drop the same name of collection in dest mongodb in full synchronization

同步时如果目的库存在，是否先删除目的库再进行同步，true表示先删除再同步，false表示不删除。

full_sync.collection_exist_drop = true

create index option.

none: do not create indexes.

foreground: create indexes when data sync finish in full sync stage.

background: create indexes when starting.

全量期间数据同步完毕后，是否需要创建索引，none表示不创建，foreground表示创建前台索引，

background表示创建后台索引。

full_sync.create_index = background

convert insert to update when duplicate key found

如果_id存在在目的库，是否将insert语句修改为update语句。

full_sync.executor.insert_on_dup_update = true

filter orphan document for source type is sharding.

源端是sharding，是否需要过滤orphan文档

full_sync.executor.filter.orphan_document = false

enable majority write in full sync.

the performance will degrade if enable.

全量阶段写入端是否启用majority write

full_sync.executor.majority_enable = false

--------------------------- incrmental sync configuration ---------------------------

fetch method:

oplog: fetch oplog from source mongodb (default)

change_stream: use change to receive change event from source mongodb, support MongoDB >= 4.0.

we recommand to use change_stream if possible.

incr_sync.mongo_fetch_method = oplog

After the document is updated, the fields that only need to be updated are set to false,

and the contents of all documents are set to true

更新文档后,只需要更新的字段则设为false,需要全部文档内容则设为true

只在mongo_fetch_method = change_stream 模式下生效，且性能有所下降

incr_sync.change_stream.watch_full_document = false

global id. used in active-active replication.

this parameter is not supported on current open-source version.

gid用于双活防止环形复制，目前只用于阿里云云上MongoDB，如果是阿里云云上实例互相同步

希望开启gid，请联系阿里云售后，sharding的有多个gid请以分号(;)分隔。

incr_sync.oplog.gids =

distribute data to different worker by hash key to run in parallel.

[auto] decide by if there has unique index in collections.

use `collection` if has unique index otherwise use `id`.

[id] shard by ObjectId. handle oplogs in sequence by unique _id

[collection] shard by ns. handle oplogs in sequence by unique ns

hash的方式，id表示按文档hash，collection表示按表hash，auto表示自动选择hash类型。

如果没有索引建议选择id达到非常高的同步性能，反之请选择collection。

incr_sync.shard_key = collection

if shard_key is collection, and users want to improve performance when some collections

do not have unique key.

对于按collection哈希，如果某些表不具有唯一索引，则可以设置按_id哈希以提高并发度。

用户需要确认该表不会创建唯一索引，一旦检测发现存在唯一索引，则会立刻crash退出。

例如，db1.collection1;db2.collection2，不支持仅指定db

incr_sync.shard_by_object_id_whitelist =

oplog transmit worker concurrent

if the source is sharding, worker number must equal to shard numbers.

内部发送（写目的DB）的worker数目，如果机器性能足够，可以提高worker个数。

incr_sync.worker = 8

how many writing threads will be used in one worker.

对于目的端是kafka等非direct tunnel，启用多少个序列化线程，必须为"incr_sync.worker"的倍数。

默认为"incr_sync.worker"的值。

incr_sync.tunnel.write_thread = 8

set the sync delay just like mongodb secondary slaveDelay parameter. unit second.

设置目的端的延迟，比如延迟源端20分钟，类似MongoDB本身主从同步slaveDelay参数，单位：秒

0表示不启用

incr_sync.target_delay = 0

memory queue configuration, plz visit FAQ document to see more details.

do not modify these variables if the performance and resource usage can

meet your needs.

内部队列的配置参数，如果目前性能足够不建议修改，详细信息参考FAQ。

batch_queue_size：每个worker线程的队列长度，worker线程从此队列取任务

batching_max_size：一次分发给worker的任务最多包含多少个文档

buffer_capacity：PendingQueue队列中一个buffer至少包含的文档个数，进行序列化

incr_sync.worker.batch_queue_size = 64
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 256
incr_sync.reader.fetch_batch_size = 8192

--- direct tunnel only begin ---

if tunnel type is direct, all the below variable should be set

下列参数仅用于tunnel为direct的情况。

oplog changes to Insert while Update found non-exist (_id or unique-index)

如果_id不存在在目的库，是否将update语句修改为insert语句。

incr_sync.executor.upsert = false

oplog changes to Update while Insert found duplicated key (_id or unique-index)

如果_id存在在目的库，是否将insert语句修改为update语句。

incr_sync.executor.insert_on_dup_update = true

options:db,none

db. write duplicated logs to mongoshake_conflict

如果写入存在冲突，记录冲突的文档。选项：db, none

db：冲突写到目的DB的mongshake_conflict库中

incr_sync.conflict_write_to = none

enable majority write in incrmental sync.

the performance will degrade if enable.

增量阶段写入端是否启用majority write

incr_sync.executor.majority_enable = false

--- direct tunnel only end ---

特殊字段，标识源端类型，默认为空。阿里云MongoDB serverless集群请配置aliyun_serverless

special.source.db.flag =

同步集群后发现缺少sensitive_event集合，但不清楚什么原因造成的

if you have any problem, please visit https://github.com/alibaba/MongoShake/wiki/FAQ

for the detail explanation, please visit xxxx

如果有问题，请先查看FAQ文档以及wiki上的说明。

关于各个参数的详细说明，请参考：xxx

current configuration version, do not modify.

当前配置文件的版本号，请不要修改该值。

--------------------------- global configuration ---------------------------

collector name

id用于输出pid文件等信息。

high availability option.

enable master election if set true. only one mongoshake can become master

and do sync, the others will wait and at most one of them become master once

previous master die. The master information stores in the mongoshake db in the source

database by default.

This option is useless when there is only one mongoshake running.

如果开启主备mongoshake拉取同一个源端，此参数需要开启。

http api interface. Users can use this api to monitor mongoshake.

curl 127.0.0.1:9100.

We also provide a restful tool named "mongoshake-stat" to

print ack, lsn, checkpoint and qps information based on this api.

usage: ./mongoshake-stat --port=9100

全量和增量的restful监控端口，可以用curl查看内部监控metric统计情况。详见wiki。

profiling on net/http/profile

profiling端口，用于查看内部go堆栈。

global log level: debug, info, warning, error. lower level message will be filter

log directory. log and pid file will be stored into this file.

if not set, default is "./logs/"

log和pid文件的目录，如果不设置默认打到当前路径的logs目录。

log file name.

log文件名。

log flush enable. If set false, logs may not be print when exit. If

set true, performance will be decreased extremely

设置log刷新，false表示包含缓存，如果true那么每条log都会直接刷屏，但对性能有影响；

反之，退出不一定能打印所有的log，调试时建议配置true。

sync mode: all/full/incr. default is incr.

all means full synchronization + incremental synchronization.

full means full synchronization only.

incr means incremental synchronization only.

同步模式，all表示全量+增量同步，full表示全量同步，incr表示增量同步。

connect source mongodb, set username and password if enable authority. Please note: password shouldn't contain '@'.

split by comma(,) if use multiple instance in one replica-set. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC

split by semicolon(;) if sharding enable. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ

源MongoDB连接串信息，逗号分隔同一个副本集内的结点，分号分隔分片sharding实例，免密模式

可以忽略“username:password@”，注意，密码里面不能含有'@'符号。

举例：

副本集：mongodb://username1:password1@primaryA,secondaryB,secondaryC

分片集：mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ

please fill the source config server url if source mongodb is sharding.

please give at least one mongos address if source is sharding.

如果源端采用change stream拉取，这里还需要配置至少一个mongos的地址，多个mongos地址以逗号（,）分割

enable source ssl

tunnel pipeline type. now we support rpc,file,kafka,mock,direct

通道模式。

tunnel target resource url

for rpc. this is remote receiver socket address

for tcp. this is remote receiver socket address

for file. this is the file path, for instance "data"

for kafka. this is the topic and brokers address which split by comma, for

instance: topic@brokers1,brokers2, default topic is "mongoshake"

for mock. this is uesless

for direct. this is target mongodb address which format is the same as mongo_urls. If

the target is sharding, this should be the mongos address.

direct模式用于直接写入MongoDB，其余模式用于一些分析，或者远距离传输场景，

注意，如果是非direct模式，需要通过receiver进行解析，具体参考FAQ文档。

此处配置通道的地址，格式与mongo_urls对齐。

the message format in the tunnel, used when tunnel is kafka.

"raw": batched raw data format which has good performance but encoded so that users

should parse it by receiver.

"json": single oplog format by json.

"bson": single oplog format by bson.

通道数据的类型，只用于kafka和file通道类型。

raw是默认的类型，其采用聚合的模式进行写入和

读取，但是由于携带了一些控制信息，所以需要专门用receiver进行解析。

json以json的格式写入kafka，便于用户直接读取。

bson以bson二进制的格式写入kafka。

how many partitions will be written, use some hash function in "incr_sync.shard_key".

如果目的端是kafka，最多启用多少个partition，最大不超过"incr_sync.worker"。默认1

tunnel json format, it'll only take effect in the case of tunnel.message = json

and tunnel == kafka. Set canonical_extended_json if you want to use "Canonical

previous master die. The master information stores in the `mongoshake` db in the source

`curl 127.0.0.1:9100`.

usage: `./mongoshake-stat --port=9100`

for direct. this is target mongodb address which format is the same as `mongo_urls`. If