lbradstreet / onyx-kafka-0.8

Onyx plugin for Kafka 0.8

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

onyx-kafka

Onyx plugin providing read and write facilities for Kafka. This plugin automatically discovers broker locations from ZooKeeper and updates the consumers when there is a broker failover.

Installation

In your project file:

[org.onyxplatform/onyx-kafka "0.9.6.0"]

In your peer boot-up namespace:

(:require [onyx.plugin.kafka])

Functions

read-messages

Reads segments from a Kafka topic. Peers will automatically be assigned to each of the topics partitions, unless :kafka/partition is supplied in which case only one partition will be read from. :onyx/min-peers and :onyx/max-peers must be used to fix the number of the peers for the task to the number of partitions read by the task.

NOTE: The :done sentinel (i.e. batch processing) is not supported if more than one partition is auto-assigned i.e. the topic has more than one partition and :kafka/partition is not fixed. An exception will be thrown if a :done is read under this circumstance.

Catalog entry:

{:onyx/name :read-messages
 :onyx/plugin :onyx.plugin.kafka/read-messages
 :onyx/type :input
 :onyx/medium :kafka
 :kafka/topic "my topic"
 :kafka/group-id "onyx-consumer"
 :kafka/fetch-size 307200
 :kafka/chan-capacity 1000
 :kafka/zookeeper "127.0.0.1:2181"
 :kafka/offset-reset :smallest
 :kafka/force-reset? true
 :kafka/empty-read-back-off 500
 :kafka/commit-interval 500
 :kafka/deserializer-fn :my.ns/deserializer-fn
 :kafka/wrap-with-metadata? false
 :onyx/min-peers <<NUMBER-OF-PARTITIONS>>
 :onyx/max-peers <<NUMBER-OF-PARTITIONS>>
 :onyx/batch-size 100
 :onyx/doc "Reads messages from a Kafka topic"}

Lifecycle entry:

{:lifecycle/task :read-messages
 :lifecycle/calls :onyx.plugin.kafka/read-messages-calls}
Attributes
key type default description
:kafka/topic string The topic name to connect to
:kafka/partition string Optional: partition to read from if auto-assignment is not used
:kafka/group-id string The consumer identity to store in ZooKeeper
:kafka/zookeeper string The ZooKeeper connection string
:kafka/offset-reset keyword Offset bound to seek to when not found - :smallest or :largest
:kafka/force-reset? boolean Force to read from the beginning or end of the log, as specified by :kafka/offset-reset. If false, reads from the last acknowledged messsage if it exists
:kafka/chan-capacity integer 1000 The buffer size of the Kafka reading channel
:kafka/fetch-size integer 307200 The size in bytes to request from ZooKeeper per fetch request
:kafka/empty-read-back-off integer 500 The amount of time to back off between reads when nothing was fetched from a consumer
:kafka/commit-interval integer 2000 The interval in milliseconds to commit the latest acknowledged offset to ZooKeeper
:kafka/deserializer-fn keyword A keyword that represents a fully qualified namespaced function to deserialize a message. Takes one argument - a byte array
:kafka/wrap-with-metadata? boolean false Wraps message into map with keys :offset, :partitions, :topic and :message itself
write-messages

Writes segments to a Kafka topic using the Kafka "new" producer.

Catalog entry:

{:onyx/name :write-messages
 :onyx/plugin :onyx.plugin.kafka/write-messages
 :onyx/type :output
 :onyx/medium :kafka
 :kafka/topic "topic"
 :kafka/zookeeper "127.0.0.1:2181"
 :kafka/serializer-fn :my.ns/serializer-fn
 :kafka/request-size 307200
 :onyx/batch-size batch-size
 :onyx/doc "Writes messages to a Kafka topic"}

Lifecycle entry:

{:lifecycle/task :write-messages
 :lifecycle/calls :onyx.plugin.kafka/write-messages-calls}

Segments supplied to a write-messages task should be in in the following form: {:message message-body} with optional partition and key values e.g. {:message message-body :key optional-key :partition optional-partition}.

Attributes
key type default description
:kafka/topic string The topic name to connect to
:kafka/zookeeper string The ZooKeeper connection string
:kafka/serializer-fn keyword A keyword that represents a fully qualified namespaced function to serialize a message. Takes one argument - the segment
:kafka/request-size number The maximum size of request messages. Maps to the max.request.size value of the internal kafka producer.
:kafka/no-seal? boolean false Do not write :done to the topic when task receives the sentinel signal (end of batch job)

Test Utilities

A take-segments utility function is provided for use when testing the results of jobs with kafka output tasks. take-segments reads from a topic until a :done is reached, and then returns the results. Note, if a :done is never written to a topic, this will hang forever as there is no timeout.

(ns your-ns.a-test
  (:require [onyx.kafka.utils :as kpu]))

;; insert code to run a job here

;; retrieve the segments on the topic
(def results
  (kpu/take-segments (:zookeeper/addr peer-config) "yourtopic" your-decompress-fn))

(last results)
; :done

Embedded Kafka Server

An embedded Kafka server is included for use in test cases where jobs output to kafka output tasks. Note, stopping the server will not perform a graceful shutdown - please do not use this embedded server for anything other than tests.

This can be used like so:

(ns your-ns.a-test
  (:require [onyx.kafka.embedded-server :as ke]
            [com.stuartsierra.component :as component]))

(def kafka-server
  (component/start
    (ke/map->EmbeddedKafka {:hostname "127.0.0.1"
                            :port 9092
                            :broker-id 0
			    :num-partitions 1
			    ; optional log dir name - randomized dir will be created if none is supplied
			    ; :log-dir "/tmp/embedded-kafka"
			    :zookeeper-addr "127.0.0.1:2188"})))

;; insert code to run a test here

;; stop the embedded server
(component/stop kafka-server)

Contributing

Pull requests into the master branch are welcomed.

License

Copyright © 2015 Michael Drogalis

Distributed under the Eclipse Public License, the same as Clojure.

About

Onyx plugin for Kafka 0.8

License:Eclipse Public License 1.0


Languages

Language:Clojure 95.3%Language:Shell 4.7%