- This lab is based on the first part of Lab02Part1 (https://github.com/scs-edpo/lab02Part1-kafka-producer-consumer), which is also a prerequisite for this lab
- The procedure to run the code is similar to Lab02. We recommend importing the project to IntelliJ and let the IDE handle everything
- Note that only the new procedures and concepts are described in this lab
- This lab simulates a system where user clicks and eye-tracking data coming from two eye-trackers are streamed
- The eye-tracking data captures the gazes of two developers doing pair programming
- We use Kafka producers and consumers to simulate this system
- Experimenting several producers and consumers with different configurations for topics and partitions
- Hands-on a custom serializer and partitioner
- Experimenting Kafka rebalancing and how it affects the distribution of partitions among consumers
- Experimenting offsets and manual offset commits
This lab consists of two parts.
In the first part, we create two producers (ClickStream-Producer and EyeTrackers-Producer)
- ClickStream-Producer Module: produces click stream data
- EyeTrackers-Producer Module: produces gaze data
In the second part, we consume the messages of the producers using consumers with different configurations. All the consumers are available in the "consumer" Module, within the Package "com.examples"
- ConsumerForAllEvents: consumes the events coming from both ClickStream-Producer and EyeTrackers-Producer
- ConsumerForGazeEventsForSingleEyeTracker: consumes the events coming from a single eye-tracker
- ConsumerCustomOffset: consumes the events coming from ClickStream-Producer starting from a specific user-defined offset
- rebalancingExample.*: two classes that demonstrate how Kafka does rebalancing when a new consumer is added (within the same group)
- singleAcessToPartitionAndRebalancingExample.*: two classes demonstrating how Kafka allows only one consumer to read from a partition and how rebalancing occurs when a running consumer is out
- customCommit.singleAcessToPartitionAndRebalancingExample.*: two classes demonstrating the use of manual offset commit (occuring after every n consumer polls) and the impact of rebalancing on events duplication and loss
- customCommit.commitLargerOffset.*: two classes demonstrating events loss when a manual offset that is larger than the offset of the latest processed event is set
-
Open a terminal in the directory: docker/.
-
Start the Kafka and Zookeeper processes using Docker Compose:
$ docker-compose up
-
Main Class: com.examples.ClicksProducer
- Overview: This producer produces click events and sends them through the "click-events" topic.
- Procedure (#P1):
-
Specify topic
// Specify Topic String topic = "click-events";
-
Read Kafka properties file
// Read Kafka properties file Properties properties; try (InputStream props = Resources.getResource("producer.properties").openStream()) { properties = new Properties(); properties.load(props); }
- The following is the content of the used properties file producer.properties
acks=all retries=0 bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=com.utils.JavaSerializer
- Notice that we use a custom (value) serializer (see com.utils.JavaSerializer) to serialize Java Objects before sending them
- The custom serializer is specified in producer.properties with: value.serializer=com.utils.JavaSerializer
- The following is the content of the used properties file producer.properties
-
Create Kafka producer with the loaded properties
// Create Kafka producer KafkaProducer<String, Clicks> producer = producer = new KafkaProducer<>(properties);
-
For the sake of simulation, delete any existing topic with the same topic name (i.e., click-events) and create a new topic with 1 partition. Note that, we use a single partition inside the "click-events" topic so that all the events will be stored into that unique partition of the "click-events" topic
/// delete existing topic with the same name deleteTopic(topic, properties); // create new topic with 1 partition createTopic(topic, 1, properties);
-
Define a counter which will be used as an eventID
// define a counter which will be used as an eventID int counter = 0;
-
At each random time interval in range [500ms, 5000ms]
- Generate a random click event using constructor Clicks(int eventID, long timestamp, int xPosition, int yPosition, String clickedElement) (see com.data.Clicks). Note that the counter is used as an eventID
// generate a random click event using constructor Clicks(int eventID, long timestamp, int xPosition, int yPosition, String clickedElement) Clicks clickEvent = new Clicks(counter,System.nanoTime(), getRandomNumber(0, 1920), getRandomNumber(0, 1080), "EL"+getRandomNumber(1, 20));
- Send the click event and print the event to the producer console
// send the click event producer.send(new ProducerRecord<String, Clicks>( topic, // topic clickEvent // value )); // print to console System.out.println("clickEvent sent: "+clickEvent.toString()); ```
- Increment the counter (i.e., the eventID) for future use
// increment counter i.e., eventID counter++;
-
- Explore the different classes within ClickStream-Producer and examine the control-flow within the class "com.examples.ClicksProducer"
- Main Class: com.examples.EyeTrackersProducer
- Overview: This producer produces gaze events and sends them through the "gaze-events" topic.
- Procedure (#P2):
-
Specify topic
// Specify Topic String topic = "gaze-events";
-
Read Kafka properties file (similar to Procedure P1)
- The following is the content of the used properties file producer.properties
acks=all retries=0 bootstrap.servers=localhost:9092 key.serializer=org.apache.kafka.common.serialization.StringSerializer value.serializer=com.utils.JavaSerializer partitioner.class = com.utils.CustomPartitioner
- Remember that in our use-case, we have 2 eye-trackers, and we would like to store the data from each eye-tracker into a distinct partition. Therefore, we use a custom partitioner (see com.utils.CustomPartitioner) to ensure that the events coming from each eye-tracker are always stored into the same distinct partition
- Reason: with the default partitioner, Kafka guarantees that events with the same key will go to the same partition, but not the other way around i.e., events with different keys will go always to different partitions. Knowing that events are assigned to partitions as follows: "partitionID = hash(key)%num_partitions", with a low partition number (e.g., num_partitions=2), it is very likely that 2 events with different keys will still go to the same partition.
- The custom partitioner is specified in resources/producer.properties with: partitioner.class=com.utils.CustomPartitioner
- Similar to P1, we use a custom (value) serializer (see com.utils.JavaSerializer) to serialize Java Objects before sending them
- Remember that in our use-case, we have 2 eye-trackers, and we would like to store the data from each eye-tracker into a distinct partition. Therefore, we use a custom partitioner (see com.utils.CustomPartitioner) to ensure that the events coming from each eye-tracker are always stored into the same distinct partition
- The following is the content of the used properties file producer.properties
-
Create Kafka producer with the loaded properties (similar to P1)
-
For the sake of simulation, delete any existing topic with the same topic name (i.e., gaze-events) and create a new topic with 2 partitions (i.e., corresponding to two eye-trackers)
/// delete existing topic with the same name deleteTopic(topic, properties); // create new topic with 2 partitions createTopic(topic, 2, properties);
-
Define a counter which will be used as an eventID
// define a counter which will be used as an eventID int counter = 0;
-
At each 8ms
-
Select a deviceID corresponding to a random eye-tracker (among the two available eye-trackers)
// select random device int deviceID =getRandomNumber(0, deviceIDs.length);
-
Generate a random gaze event using the constructor Gaze(int eventID, long timestamp, int xPosition, int yPosition, int pupilSize) (see com.data.Gaze). Note that the counter is used as an eventID
// generate a random gaze event using constructor Gaze(int eventID, long timestamp, int xPosition, int yPosition, int pupilSize) Gaze gazeEvent = new Gaze(counter,System.nanoTime(), getRandomNumber(0, 1920), getRandomNumber(0, 1080), getRandomNumber(3, 4));
-
Send the gaze event and print the event to the producer console. Notice that we use the deviceID as a key in the send method. This deviceID will be mapped to the corresponding partition in the "gaze-events" topic.
// send the gaze event producer.send(new ProducerRecord<String, Gaze>( topic, // topic String.valueOf(deviceID), // key gazeEvent // value )); // print to console System.out.println("gazeEvent sent: "+gazeEvent.toString()+" from deviceID: "+deviceID);
-
Increment the counter (i.e., the eventID) for future use (Similar to P1)
-
-
- Explore the different classes within EyeTrackers-Producer and examine the control-flow within the class com.examples.EyeTrackersProducer
-
Prerequisite for running ConsumerForAllEvents:
- Stop all previously running producers and consumers
- (Re)Run EyeTrackers-Producer (Main Class com.examples.EyeTrackersProducer) and ClickStream-Producer (Main Class com.examples.ClicksProducer)
-
Main Class: com.examples.ConsumerForAllEvents
-
Overview: this consumer consumes the events coming from both ClickStream-Producer and EyeTrackers-Producer
-
Procedure (#P3):
-
Read Kafka properties file and create Kafka consumer with the given properties
// Read Kafka properties file and create Kafka consumer with the given properties KafkaConsumer<String, Object> consumer; try (InputStream props = Resources.getResource("consumer.properties").openStream()) { Properties properties = new Properties(); properties.load(props); consumer = new KafkaConsumer<>(properties); }
- The following is the content of the used properties file consumer.properties
bootstrap.servers=localhost:9092 key.deserializer=org.apache.kafka.common.serialization.StringDeserializer value.deserializer=com.utils.JavaDeserializer group.id=grp1 auto.offset.reset=earliest
- Notice that we use a custom (value) deserializer (see com.utils.JavaDeserializer) to deserialize Java Objects
- The custom deserializer is specified in com.utils.JavaDeserializer with: value.deserializer=com.utils.JavaDeserializer
- The following is the content of the used properties file consumer.properties
-
Subscribe to two topics: "gaze-events" and "click-events". The events in the "gaze-events" topic come from two partitions, while the events in the "click-events" come from one partition only
// Subscribe to relevant topics consumer.subscribe(Arrays.asList("gaze-events","click-events"));
- Poll new events at specific rate and process consumer records
// pool new data ConsumerRecords<String, Object> records = consumer.poll(Duration.ofMillis(8)); // process consumer records depending on record.topic() and record.value() for (ConsumerRecord<String, Object> record : records) { // switch/case switch (record.topic()) { //note: record.value() is a linkedHashMap (see utils.JavaDeserializer), use can use the following syntax to access specific attributes ((LinkedHashMap) record.value()).get("ATTRIBUTENAME").toString(); The object can be also reconstructed as Gaze object case "gaze-events": String value = record.value().toString(); System.out.println("Received gaze-events - key: " + record.key() +"- value: " + value + "- partition: "+record.partition()); break; case "click-events": System.out.println("Received click-events - value: " + record.value()+ "- partition: "+record.partition()); break; default: throw new IllegalStateException("Shouldn't be possible to get message on topic " + record.topic()); } }
-
-
- Examine the control-flow within the class ConsumerForAllEvents
-
Prerequisite for running ConsumerForGazeEventsForSingleEyeTracker:
- Stop all previously running producers and consumers
- (Re)Run EyeTrackers-Producer (Main Class com.examples.EyeTrackersProducer)
-
Main Class: com.examples.ConsumerForGazeEventsForSingleEyeTracker
- Overview: consumes the events coming from a single eye-tracker
- Procedure (#P4):
- The procedure is similar to Procedure P3, with the following difference:
- This consumer consumes the events coming from a single eye-tracker (deviceID: 0) (These events were stored in partition "0" within the "gaze-events" topic)
- This is specified using the following code fragment
// Read specific topic and partition TopicPartition topicPartition = new TopicPartition("gaze-events", 0); consumer.assign(Arrays.asList(topicPartition)); ```
- This consumer consumes the events coming from a single eye-tracker (deviceID: 0) (These events were stored in partition "0" within the "gaze-events" topic)
- The procedure is similar to Procedure P3, with the following difference:
- Examine the control-flow within the class com.examples.ConsumerForGazeEventsForSingleEyeTracker
-
Prerequisite for running ConsumerCustomOffset:
- Stop all previously running producers and consumers
- (Re)Run ClickStream-Producer (Main Class com.examples.ClicksProducer)
-
Main Class: com.examples.ConsumerCustomOffset
- Overview: consumes the events coming from ClickStream-Producer starting from a specific user-defined offset
- Procedure (#P5):
- The procedure is similar to Procedure P4, with the following difference:
- The consumer is subscribed to the topic "click-events"
- The consumer starts reading events from a specific user-defined offset (i.e., int offsetToReadFrom)
- This is specified using the following code fragment
// reading from a specific user defined offset int offsetToReadFrom = 5; consumer.seek(topicPartition, offsetToReadFrom);
- The procedure is similar to Procedure P4, with the following difference:
- Examine the control-flow within the class com.examples.ConsumerCustomOffset.
- Stop all previously running producers and consumers and (re)run ClickStream-Producer (Main Class com.examples.ClicksProducer) (prerequisite)
- Wait for the producer to produce more than 5 events (e.g., 10 events)
- Run ConsumerCustomOffset
- Check the console of ConsumerCustomOffset, what do you notice?
-
Prerequisite for running the classes within the package rebalancingExample:
- Stop all previously running producers and consumers
- (Re)Run EyeTrackers-Producer (Main Class com.examples.EyeTrackersProducer)
-
Overview: The package contains two (main) classes ConsumerForGazeEventsForEyeTrackerParitionsRebalancing1 and ConsumerForGazeEventsForEyeTrackerParitionsRebalancing2. These classes demonstrate how Kafka does rebalancing when a new consumer is added (within the same consumer group)
-
For the sake of simulation, the classes ConsumerForGazeEventsForEyeTrackerParitionsRebalancing1 and ConsumerForGazeEventsForEyeTrackerParitionsRebalancing2 have duplicate code.
-
Procedure (#P6): In both classes, a Kafka consumer is created with the properties in consumer.properties, subscribed to the topic "gaze-events" and prints the received events to the console
-
Remember that the topic "gaze-events" has two partitions (referring to the two eye-trackers)
- Stop all previously running producers and consumers and (re)run EyeTrackers-Producer (Main Class com.examples.EyeTrackersProducer) (prerequisite)
- Run ConsumerForGazeEventsForEyeTrackerParitionsRebalancing1
- Check that the consumer consumes and prints to the console the events belonging to both Partition 0 and Partition 1 of the "gaze-events" topic
- Run ConsumerForGazeEventsForEyeTrackerParitionsRebalancing2
- Check that each consumer will start consuming and printing to the console the events of a single partition only
-
Prerequisite for running the classes within the package singleAcessToPartitionAndRebalancingExample:
- Stop all previously running producers and consumers
- (Re)Run ClickStream-Producer (Main Class com.examples.ClicksProducer)
-
Overview: The package contains two (main) classes singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2. These classes demonstrate how Kafka allows only one consumer (within a consumer group) to read from a partition and how rebalancing occurs when a running consumer is out
-
For the sake of simulation, the classes singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2 have duplicate code.
-
Procedure (#P7): In both classes, a Kafka consumer is created, subscribed to the topic "click-events" and prints the received events to the console
-
Remember that the topic "click-events" has 1 partition only and so this partition can be read by only one consumer within a consumer group
- Stop all previously running producers and consumers and (re)run ClickStream-Producer (Main Class com.examples.ClicksProducer) (prerequisite)
- Run singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1
- Check that the consumer consumes and prints to the console the events of the "click-events" topic.
- Run singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2
- Remember that the topic "click-events" has 1 partition only and so this partition can be read by only one consumer within a consumer group
- Check that only one of the two consumers will keep consuming the events while the other consumer will stay idle
- Stop the non-idle consumer i.e., the one still consuming events
- Check that the other consumer will take over and start consuming events after a while
- Assuming that the click events have an incremental eventID, by comparing the output consoles of singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2, check whether any of the events are duplicated or lost
-
Prerequisite for running the classes within the package customCommit.singleAcessToPartitionAndRebalancingExample:
- Stop all previously running producers and consumers
- (Re)Run ClickStream-Producer (Main Class com.examples.ClicksProducer)
-
Overview: The package contains two (main) classes customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2. These classes demonstrate in the context of manual offset commit (occuring after every n consumer polls) the impact of rebalancing on events duplication and loss
-
For the sake of simulation, the classes customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2 have duplicate code.
-
Procedure (#P8):
- In this example, we use a different configuration for the consumer properties which is available in resources/consumerCustomCommit.properties
- What is different in this configuration is that we set "enable.auto.commit=false" to disable automatic offset commits and so allow trying out the manual offset commits
- In both classes, a Kafka consumer is created, subscribed to the topic "click-events" and prints the received events to the console
- In both Classes customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2, everytime consumer.poll() is called and the records are processed, a manual synchronized commit is done using the following code fragment
try { consumer.commitSync(); System.out.println("commit sync done"); } catch (CommitFailedException e) { System.err.println("commit failed"+ e); } ```
- Remember that the topic "click-events" has only 1 partition and so this partition can be read by only one consumer within a consumer group
- In this example, we use a different configuration for the consumer properties which is available in resources/consumerCustomCommit.properties
- Stop all previously running producers and consumers and (re)run ClickStream-Producer (Main Class com.examples.ClicksProducer) (prerequisite)
- Run customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1
- Check that the consumer consumes and prints to the console the events of the "click-events" topic.
- Run customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2
- Remember that the topic "click-events" has only 1 partition and so this partition can be read by only one consumer within a consumer group
- Check that only one of the two consumers will keep consuming the events while the other consumer will stay idle
- Stop the non-idle consumer i.e., the one still consuming events
- Check that the other consumer will take over and start consuming events after a while
- Assuming that the click events have an incremental eventID, by comparing the output consoles of customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2, check whether any of the events are duplicated or lost
- Edit the code in both customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly1 and customCommit.singleAcessToPartitionAndRebalancingExample.ConsumerForClickEventsOnly2 so that the manual offset commit is executed after each 40 consumer polls, repeat the previous instructions, check whether any of the events are duplicated or lost
-
Prerequisite for running the classes within the package customCommit.commitLargerOffset:
- Stop all previously running producers and consumers
- (Re)Run ClickStream-Producer (Main Class com.examples.ClicksProducer)
-
Overview: The package contains two (main) classes two classes customCommit.commitLargerOffset.ConsumerForClickEventsOnly1 and customCommit.commitLargerOffset.ConsumerForClickEventsOnly2. These classes demonstrate events loss when a manual offset that is larger than the offset of the latest processed event is set
-
For the sake of simulation, the classes customCommit.commitLargerOffset.ConsumerForClickEventsOnly1 and customCommit.commitLargerOffset.ConsumerForClickEventsOnly2 have duplicate code.
-
Procedure (#P09):
- Similar to Procedure P8, in this example, we use the configuration in resources/consumerCustomCommit.properties
- This configuration sets "enable.auto.commit=false" to disable automatic offset commits and so allow trying out the manual offset commits
- In both classes, a Kafka consumer is created, subscribed to the topic "click-events" and prints the received events to the console
- In both Classes customCommit.commitLargerOffset.ConsumerForClickEventsOnly1 and customCommit.commitLargerOffset.ConsumerForClickEventsOnly2, everytime a record is processed, a manual synchronized commit is done using the following code fragment. This code fragment, commits an offset which correspond to the current record.offset() + 10
currentOffsets.put( new TopicPartition(record.topic(), record.partition()), new OffsetAndMetadata(record.offset() + 10, "no metadata")); consumer.commitSync(currentOffsets); ```
- Remember that the topic "click-events" has only 1 partition and so this partition can be read by only one consumer within a consumer group
- Similar to Procedure P8, in this example, we use the configuration in resources/consumerCustomCommit.properties
- Stop all previously running producers and consumers and (re)run ClickStream-Producer (Main Class com.examples.ClicksProducer) (prerequisite)
- Run customCommit.commitLargerOffset.ConsumerForClickEventsOnly1
- Check that the consumer consumes and prints to the console the events of the "click-events" topic.
- Run customCommit.commitLargerOffset.ConsumerForClickEventsOnly2
- Remember that the topic "click-events" has only 1 partition and so this partition can be read by only one consumer within a consumer group
- Check that only one of the two consumers will keep consuming the events while the other consumer will stay idle
- Stop the non-idle consumer i.e., the one still consuming events
- Check that the other consumer will take over and start consuming events after a while
- Assuming that the click events have an incremental eventID, by comparing the output consoles of customCommit.commitLargerOffset.ConsumerForClickEventsOnly1 and customCommit.commitLargerOffset.ConsumerForClickEventsOnly2, check whether some events are lost. What do you notice?