mardambey / mypipe

MySQL binary log consumer with the ability to act on changed rows and publish changes to different systems with emphasis on Apache Kafka.

Home Page: http://mardambey.github.io/mypipe

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How do you retrieve a map from the generic kafka structure

rhamnett opened this issue · comments

Hello, wondered if you could kindly help. I'm trying to retrieve a generic insert mutation message from a kafka queue which was posted to the queue using the kafka-generic mypipe.producer.KafkaMutationGenericAvroProducer

The code I have below, is able to return an object but the object is just a string and does not have any dimensions to it, eg hashmap or similar where i can say result.get("myfield")

I am struggling to understand how to iterate through the data. Any help greatly appreciated.

public void run() {
        ConsumerIterator<byte[], byte[]> it = m_stream.iterator();
        while (it.hasNext()) {
            try {
                //System.out.println("Encoded Message received : " + message_received);
                //byte[] input = Hex.decodeHex(it.next().message().toString().toCharArray());
                //System.out.println("Deserializied Byte array : " + input);
                byte[] received_message = it.next().message();
                System.out.println(received_message);
                Schema schema = null;
                schema = new Schema.Parser().parse(new File("mutations.avsc"));
                DatumReader<GenericData.Fixed> reader = new GenericDatumReader<GenericData.Fixed>(schema);
                Decoder decoder = DecoderFactory.get().binaryDecoder(received_message, null);
                GenericData.Fixed payload2 = null;
                reader.setSchema(schema);
                payload2 = reader.read(null, decoder);
                System.out.println("Message received : " + payload2 + "schema "+payload2.getSchema());

            }catch (Exception e) {
                e.printStackTrace();
                System.out.println(e);
            }


        }
        System.out.println("Shutting down Thread: " + m_threadNumber);
    }

I am using the schema from your latest repo https://github.com/mardambey/mypipe/blob/70c692ddcc97deb4d95f914529d9c4380307bca9/mypipe-avro/src/main/avro/mutations.avsc

Thanks

To add to this, I did try to use a GenericRecord but the published type onto the queue appears to be GenericData.Fixed

java.lang.ClassCastException: org.apache.avro.generic.GenericData$Fixed cannot be cast to org.apache.avro.generic.GenericRecord

@rhamnett take a look at:

https://github.com/mardambey/mypipe/blob/bb7acf47f9e80b8792b4ed517fa16d82763bd3c9/mypipe-kafka/src/main/scala/mypipe/kafka/consumer/GenericConsoleConsumer.scala

The Kafka messages do not just consist of the Avro data. They actually have a specific format which is documented here.

You can use KafkaGenericMutationAvroConsumer by extending it, or you can read the bytes yourself to extract the Avro data before passing it to the GenericDatumReader.

Appreciate the reply. Still struggling to decode the messages coming from individual topics such as database_table_generic. Do you have a java example of using the binary decoder on a kafka message with a generic record and existing schema?

Many thanks

Richard

@rhamnett Are you trying to do this totally on your own or making use of KafkaGenericMutationAvroConsumer?

If you do not use KafkaGenericMutationAvroConsumer, you have to decode the bytes yourself based on the docs I pointed to earlier, which I'll add here:

 -----------------
| MAGIC | 1 byte  |
|-----------------|
| MTYPE | 1 byte  |
|-----------------|
| SCMID | N bytes |
|-----------------|
| DATA  | N bytes |
 -----------------

The SCMID (schema id) is usually a short by default. So, you have to read 1 byte + 1 byte + 1 short, after which you'll have a byte array containing the Avro data you want. If you use KafkaGenericMutationAvroConsumer, all of this is done for you.

Hello, thanks again for the reply.

I am not sure the problem is that of the byte offsets, if I use the binarydecoder and a GenericDatumReader.Fixed type I am able to parse the payload.

However, if I replace GenericDatumReader.Fixed. with GenericDatumReader.Record then I get a cast exception. Thus I cannot get access to any of the fields that have been serialized.

I am possibly doing something wrong at an earlier stage eg setting up the schema in the original example I posted. I can definitely see all the data I would expect to be in the record if I dump the kafka message bytes.

Richard

I've created a full example class so it is easy for you to see what I am referring to:

http://pastebin.ca/3385426

Any help appreciated

InsertMutation.avsc is here http://pastebin.ca/3385431

mypipe config used is here http://pastebin.ca/3385495

@rhamnett cheers - I'll take a look at this as soon as I can.

@mardambey thanks, would really like to get to the bottom of this!

Any update on this? I haven't managed to get any further. Thanks in advance @mardambey

@rhamnett I tried taking a look at your link:

http://pastebin.ca/3385426

but it has expired. Reckon you can toss the code up somewhere else, perhaps in a Gist, so I can take a look?

   public static final byte PROTO_MAGIC_V0 = Byte.parseByte("0");
   public static final byte UNKNOWNBYTE = Byte.parseByte("0")
   public static final byte INSERTBYTE = Byte.parseByte("1");
   public static final byte UPDATEBYTE = Byte.parseByte("2");
   public static final byte DELETEBYTE = Byte.parseByte("3");

    ......
    props.put("value.deserializer", ByteArrayDeserializer.class.getName());
    KafkaConsumer<byte[], byte[]> consumer = new KafkaConsumer<>(props);
    consumer.subscribe(Collections.singletonList("mypipe_user_generic"));
    while (true) {
        ConsumerRecords<byte[], byte[]> records = consumer.poll(10);
        for (ConsumerRecord<byte[], byte[]> record : records) {
            try {
                Schema schema = new Schema.Parser().parse(new File("mutations.avsc"));
                byte[] body = record.value();
                byte magic = body[0];
                if (magic != PROTO_MAGIC_V0) {
                    LOG.error("We have encountered an unknown magic byte! Magic Byte: {}", magic);
                } else {
                    byte mutationType = body[1];
                    byte[] data = new byte[body.length - 4];
                    System.arraycopy(body, 4, data, 0, body.length - 4);
                    Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
                    DatumReader<GenericRecord> reader = null;
                    if (mutationType == INSERTBYTE) {
                        for (Schema s : schema.getTypes()) {
                            if (s.getName().equals("InsertMutation")) {
                                reader = new GenericDatumReader<>(s);
                                break;
                            }
                        }
                    }
                    if (mutationType == UPDATEBYTE) {
                        for (Schema s : schema.getTypes()) {
                            if (s.getName().equals("UpdateMutation")) {
                                reader = new GenericDatumReader<>(s);
                                break;
                            }
                        }
                    }
                    if (mutationType == DELETEBYTE) {
                        for (Schema s : schema.getTypes()) {
                            if (s.getName().equals("DeleteMutation")) {
                                reader = new GenericDatumReader<>(s);
                                break;
                            }
                        }
                    }
                    assert reader != null;
                    GenericRecord payload = reader.read(null, decoder);
                    LOG.info("Message received : " + payload);
                }

            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

UP may be helpful for you @rhamnett

@caojiaqing 大哥 你的东西还是报错啊
Error while processing: ConsumerRecord(topic = xmall_customer_generic, partition = 0, offset = 40, CreateTime = -1, checksum = 498637550, serialized key size = -1, serialized value size = 407, key = null, value = java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(Unknown Source) at org.apache.avro.io.BinaryDecoder.readBytes(BinaryDecoder.java:288) at org.apache.avro.io.ResolvingDecoder.readBytes(ResolvingDecoder.java:237) at org.apache.avro.generic.GenericDatum


ps the kafka consumer config must be "value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer"