The MongoDB sink connector is a tool for scalably and reliably streaming data between Apache Kafka and MongoDB. It exports Avro data from Kafka topics into the MongoDB.
Currently it supports only two types of data:
The current version proofs how to extract data coming from an Empatica E4 device using the RADAR-CNS Android application and analysed by the RADAR-CNS Kafka Backend
The following assumes you have Kafka and the Confluent Schema Registry running.
- Build the project. Go inside the project folder and run
./gradlew clean build
- Modify
sink.properties
file according your cluster. The following properties are supported:
Name | Description | Type | Default | Valid Values | Importance |
---|---|---|---|---|---|
mongo.database | MongoDB database name | string | high | ||
mongo.host | MongoDB host name to write data to | string | high | ||
topics | List of topics to be streamed. | list | high | ||
collection.format | A format string for the destination collection name, which may contain `${topic}`as a placeholder for the originating topic name. For example, `kafka_${topic}` for the topic `orders` will map to the collection name `kafka_orders`. | string | {$topic} | medium | |
mongo.password | Password to connect to MongoDB database. If not set, no credentials are used. | string | null | medium | |
mongo.username | Username to connect to MongoDB database. If not set, no credentials are used. | string | null | medium | |
record.converter.class | RecordConverterFactory that returns classes to convert Kafka SinkRecords to BSON documents. | class | class org.radarcns.serialization.RecordConverterFactory | medium | |
buffer.capacity | Maximum number of items in a MongoDB writer buffer. Once the buffer becomes full,the task fails. | int | 20000 | [1,...] | low |
mongo.port | MongoDB port | int | 27017 | [1,...] | low |
- (optional) Modify
standalone.properties
andstandalone.properties
file according your cluster instances. You may need to update the bootstraps and Schema Registry locations.
bootstrap.servers=
key.converter.schema.registry.url=
-
Copy your jar file inside your Kafka Server
-
Copy all configuration files inside your Kafka Server
- sink.properties
- standalone.properties (optional)
- cluster.properties (optional)
-
Put the connector
build/libs/kafka-connect-mongodb-sink-*.jar
in the foldershare/java
.- standalone mode
/bin/connect-standalone standalone.properties sink.properties
- distributed mode
/bin/connect-distributed cluster.properties sink.properties
-
stop your connector using
CTRL-C
To use further data types, extend org.radarcns.serialization.RecordConverterFactory
and set the new class name in the record.converter.class
property.
The only available setting is the number of records returned in a single call to poll()
(i.e. consumer.max.poll.records
param inside standalone.properties
)
Connectors can be run inside any machine where Kafka has been installed. Therefore, you can fire them also inside a machine that does not host a Kafka broker.
To reset a connector running in standalone mode
you have to stop it and then modify name
and offset.storage.file.filename
respectively inside sink.properties
and standalone.properties