hpgrahsl / kafka-connect-mongodb

**Unofficial / Community** Kafka Connect MongoDB Sink Connector -> integrated 2019 into the official MongoDB Kafka Connector here: https://www.mongodb.com/kafka-connector

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fail to write to DB 'The $v update field is only recognized internally'

mottish opened this issue · comments

Hi,

I'm currently testing the Sink connector and trying to Sync 2 mongoDB clusters where the source cluster is using Debezium CDC and the destination MongoDB is using your MongoDB Sink connector.

my config is:

{
  "name": "mongodb-dst-sink",
  "config": {
    "key.converter": "org.apache.kafka.connect.json.JsonConverter",
    "key.converter.schemas.enable": "true",
    "value.converter": "org.apache.kafka.connect.json.JsonConverter",
    "value.converter.schemas.enable": "true",
    "connector.class": "at.grahsl.kafka.connect.mongodb.MongoDbSinkConnector",
    "topics": "dbserver1.inventory.customers",
    "mongodb.connection.uri": "mongodb://debezium:dbz@mongodb-dst:27017/inventory",
    "mongodb.change.data.capture.handler": "at.grahsl.kafka.connect.mongodb.cdc.debezium.mongodb.MongoDbHandler",
    "mongodb.delete.on.null.values": "false",
    "mongodb.collections": "customers",
    "mongodb.collection.dbserver1.inventory.customers": "customers"
  }
}

Initially I was able to sync the clusters, however after I was trying to update a document in the source DB, I started getting the below exception.

The exception I get:

2019-07-09 15:04:36,175 ERROR  ||  error on mongodb operation   [at.grahsl.kafka.connect.mongodb.MongoDbSinkTask]
com.mongodb.MongoBulkWriteException: Bulk write operation error on server mongodb-dst:27017. Write errors: [BulkWriteError{index=0, code=9, message='The $v update field is only recognized internally', details={}}]. 
	at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:177)
	at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:206)
	at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:147)
	at com.mongodb.operation.BulkWriteBatch.getResult(BulkWriteBatch.java:227)
	at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:280)
	at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:70)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:203)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.OperationHelper.withReleasableConnection(OperationHelper.java:424)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:69)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:193)
	at com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:468)
	at com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:448)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.processSinkRecords(MongoDbSinkTask.java:148)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$null$0(MongoDbSinkTask.java:118)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$put$1(MongoDbSinkTask.java:117)
	at java.util.HashMap.forEach(HashMap.java:1289)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.put(MongoDbSinkTask.java:112)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2019-07-09 15:04:36,177 ERROR  ||  writing 2 document(s) into collection [inventory.customers] failed -> remaining retries (2)   [at.grahsl.kafka.connect.mongodb.MongoDbSinkTask]
2019-07-09 15:04:36,177 ERROR  ||  WorkerSinkTask{id=mongodb-dst-sink-0} RetriableException from SinkTask:   [org.apache.kafka.connect.runtime.WorkerSinkTask]
org.apache.kafka.connect.errors.RetriableException: Bulk write operation error on server mongodb-dst:27017. Write errors: [BulkWriteError{index=0, code=9, message='The $v update field is only recognized internally', details={}}]. 
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.processSinkRecords(MongoDbSinkTask.java:170)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$null$0(MongoDbSinkTask.java:118)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$put$1(MongoDbSinkTask.java:117)
	at java.util.HashMap.forEach(HashMap.java:1289)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.put(MongoDbSinkTask.java:112)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: com.mongodb.MongoBulkWriteException: Bulk write operation error on server mongodb-dst:27017. Write errors: [BulkWriteError{index=0, code=9, message='The $v update field is only recognized internally', details={}}]. 
	at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:177)
	at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:206)
	at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:147)
	at com.mongodb.operation.BulkWriteBatch.getResult(BulkWriteBatch.java:227)
	at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:280)
	at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:70)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:203)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.OperationHelper.withReleasableConnection(OperationHelper.java:424)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:69)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:193)
	at com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:468)
	at com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:448)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.processSinkRecords(MongoDbSinkTask.java:148)
	... 16 more
2019-07-09 15:04:36,665 INFO   ||  172.20.0.1 - - [09/Jul/2019:15:04:36 +0000] "GET /connectors/mongodb-dst-sink/status HTTP/1.1" 200 170  3   [org.apache.kafka.connect.runtime.rest.RestServer]
2019-07-09 15:04:41,181 ERROR  ||  error on mongodb operation   [at.grahsl.kafka.connect.mongodb.MongoDbSinkTask]
com.mongodb.MongoBulkWriteException: Bulk write operation error on server mongodb-dst:27017. Write errors: [BulkWriteError{index=0, code=9, message='The $v update field is only recognized internally', details={}}]. 
	at com.mongodb.connection.BulkWriteBatchCombiner.getError(BulkWriteBatchCombiner.java:177)
	at com.mongodb.connection.BulkWriteBatchCombiner.throwOnError(BulkWriteBatchCombiner.java:206)
	at com.mongodb.connection.BulkWriteBatchCombiner.getResult(BulkWriteBatchCombiner.java:147)
	at com.mongodb.operation.BulkWriteBatch.getResult(BulkWriteBatch.java:227)
	at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:280)
	at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:70)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:203)
	at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.OperationHelper.withReleasableConnection(OperationHelper.java:424)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:194)
	at com.mongodb.operation.MixedBulkWriteOperation.execute(MixedBulkWriteOperation.java:69)
	at com.mongodb.client.internal.MongoClientDelegate$DelegateOperationExecutor.execute(MongoClientDelegate.java:193)
	at com.mongodb.client.internal.MongoCollectionImpl.executeBulkWrite(MongoCollectionImpl.java:468)
	at com.mongodb.client.internal.MongoCollectionImpl.bulkWrite(MongoCollectionImpl.java:448)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.processSinkRecords(MongoDbSinkTask.java:148)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$null$0(MongoDbSinkTask.java:118)
	at java.util.ArrayList.forEach(ArrayList.java:1257)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$put$1(MongoDbSinkTask.java:117)
	at java.util.HashMap.forEach(HashMap.java:1289)
	at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.put(MongoDbSinkTask.java:112)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

What is the problem and how can I overcome it?

Some more inputs, the following is the events structure with the $v in the patch :

  "payload": {
    "after": null,
    "patch": "{\"$v\" : 1,\"$set\" : {\"last_name\" : \"MyNewName\"}}",
    "source": {
      "version": "0.9.5.Final",
      "connector": "mongodb",
      "name": "dbserver5",
      "rs": "rs0",
      "ns": "inventory.customers",
      "sec": 1562769763,
      "ord": 1,
      "h": 4519476723350356000,
      "initsync": false
    },
    "op": "u",
    "ts_ms": 1562769763351
  }
}

Hi @mottish!

Thx for reporting your issue. It seems that this is indeed a bug at least for MongoDB versions 3.6+ Obviously this field $v has been added to the oplog format and is not expected by the current implementation. When you look at the documentation of the update format in the Debezium docs https://debezium.io/docs/connectors/mongodb/#change-events-value you can see that the "$v" field is not mentioned there. However there is the following note:

The content of the patch field is provided by MongoDB itself and its exact format depends on the version. You can thus expect that the messages will not be same for MongoDB 3.4 and 3.6 and you should be careful while upgrading the MongoDB instance to a new version. All examples in this document were obtained from MongoDB 3.4 and might differ if you use a different one.

So from what I've quickly seen I'm afraid you need to wait until a fix is provided.

PR is of course welcome :)

Hans,
I am also stuck at this error. I am testing insert,delete and update operations and it fails for update operation.

mongos> db.demo_collA.updateOne({"_id" : "10129-2015-CMPL", "certificate_number" : 5346909},{ $set: {"business_name" : "RJ A&C CHIMNEY CORP."}})
{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
mongos> db.demo_collA.find({"_id" : "10129-2015-CMPL", "certificate_number" : 5346909})
{ "_id" : "10129-2015-CMPL", "certificate_number" : 5346909, "business_name" : "RJ A&C CHIMNEY CORP.", "date" : "Apr 22 2015", "result" : "Violation Issued", "sector" : "Home Improvement Contractor - 100", "address" : { "city" : "QUEENS VLG", "zip" : 11428, "street" : "210TH ST", "number" : 9440 } }
mongos> db.demo_collB.find({"_id" : "10129-2015-CMPL", "certificate_number" : 5346909})
{ "_id" : "10129-2015-CMPL", "certificate_number" : 5346909, "business_name" : "A&C CHIMNEY CORP.", "date" : "Apr 22 2015", "result" : "Violation Issued", "sector" : "Home Improvement Contractor - 100", "address" : { "city" : "QUEENS VLG", "zip" : 11428, "street" : "210TH ST", "number" : 9440 } }

Above update() triggering below exception.
Caused by: com.mongodb.MongoBulkWriteException: Bulk write operation error on server localhost:27017. Write errors: [BulkWriteError{index=0, code=65, message='multiple errors for op : The $v update field is only recognized internally :: and :: The $v update field is only recognized internally :: and :: The $v update field is only recognized internally', details={"causedBy": [{"index": 0, "code": 9, "errmsg": "The $v update field is only recognized internally"}, {"index": 0, "code": 9, "errmsg": "The $v update field is only recognized internally"}, {"index": 0, "code": 9, "errmsg": "The $v update field is only recognized internally"}]}}].

My current source and target db version is 3.6.8.

As per your explanation, it seems this issue is related to 3.6 and above versions. Do you have a plan to fix this bug in upcoming releases? I am assuming upgrading to 4.0 may not help. Correct me if I am wrong. Alos downgrading to 3.4 is not an option in my case.

THX for checking this! Of course a fix is planned for the next patch release 1.3.2. Cannot tell right now when there will be an official release but as soon as there is a fix on master I'll let you know :) Ideally someone from the community steps up and sends a PR - would you like to?

Thanks, Hans for a reply. I wish I have a java skill already to fix this issue. I am an operational guy with little coding knowledge :).
Quick question: I know MongoDB officially announced Kafka connector which is developed on top of your code. Do you think even that connector also will have the same issue?

I'm pretty sure the official connector is also affected. AFAIK there were no considerable functional changes for the sink side - at least none that I'm aware of. Additionally, the bug is related to processing events coming from another 3rd party project, namely Debezium so I think it would currently not have a very high priority. What I'll do however is provide the fix in the official repo as well after seeing an knowing that it works here.

Thanks, Hans. Appreciate it.

I already looked into it. The fix itself is a trivial one for this very issue. But it will still take me a bit to adapt unit tests, documentation etc. and do a hotfix release. Try to find some time at the beginning of next week.

Hi @hpgrahsl, Thank you for looking into it quickly!
Will wait for your fix, I was looking in the code to resolve it but couldn't find yet a proper fix for it beside a workaround to drop this field using regex.
I was also trying to use Kafka connect ReplaceField to omit this field, however the "patch" value is not a json and treated as Text so it wasn't possible.

Thanks again for your great work with this connector!

Hi @hpgrahsl , I continue to do testing with large dataset 34 million records in a live test environment and I got another bulk write operation error as below. Thought I will share with you.

2019-07-12 07:20:13,606] ERROR writing 472 document(s) into collection [oz_next_new.assets] failed -> remaining retries (3) (at.grahsl.kafka.connect.mongodb.MongoDbSinkTask:161)
[2019-07-12 07:20:13,606] ERROR WorkerSinkTask{id=oz_next_assets_sink_01-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask:551)
org.apache.kafka.connect.errors.RetriableException: Bulk write operation error on server localhost:27017. Write errors: [BulkWriteError{index=135, code=28, message='Cannot create field 'renditions' in element {2: null}', details={}}].
at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.processSinkRecords(MongoDbSinkTask.java:170)
at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$null$0(MongoDbSinkTask.java:118)
at java.util.ArrayList.forEach(ArrayList.java:1257)
at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.lambda$put$1(MongoDbSinkTask.java:117)
at java.util.HashMap.forEach(HashMap.java:1289)
at at.grahsl.kafka.connect.mongodb.MongoDbSinkTask.put(MongoDbSinkTask.java:112)
at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:538)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:321)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:224)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:192)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:175)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:219)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

@rajarameshyv thx for sharing. honestly I haven't seen this error before... never :)

[BulkWriteError{index=135, code=28, message='Cannot create field 'renditions' in element {2: null}', details={}}]

Try to find out what it even means cause from this rather cryptic error message I have no idea what exactly went wrong or why the write didn't succeed. Will keep you updated.

@mottish @rajarameshyv good news see above :-) beware though that this is with regard to the $v field issue not the other one.

Thank you, it works!

@rajarameshyv yes but in fact the download from confluent hub will only include this when the next release will be done and published there. until then you have to use the latest build from master branch.

@rajarameshyv ah and what about your other issue you posted above? did you find anything why this might happen? I think it would be a good idea if you extract this part and open a separate issue for this so that we don't loose track :-) THX in advance.

@hpgrahsl So far search using google, looks like an issue related to updating array with NULL value with some values triggering this error. But I am not 100% sure. I am planning to reproduce this from mongo shell.

And Han, I don't find a procedure how to build jar file using master. Appreciate if you can post me a link or procedure. Sorry if this is a trouble for you.

ok. I see. so directly in the project root you simple run
mvn clean package
this then builds different artefacts in the target/ folder. what you then take is the uber-jar containing all dependencies which can be found here target/kafka-connect-mongodb/kafka-connect-mongodb-1.3.2-SNAPSHOT-jar-with-dependencies.jar after a successful build. this is what you can deploy to your kafka-connect installation.

Thank You @hpgrahsl. I will try this. Appreciate for sharing steps.

You are welcome! Good luck :)

@hpgrahsl Fix for $v update field issue working. Thank You. I will create a separate issue for other errors I am getting.