RADAR-base / RADAR-Backend

Kafka backend for processing device data

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluate whether the REST proxy accepts Content-Encoding: gzip

blootsvoets opened this issue · comments

As an optimisation to reduce data sizes to the server, we should check whether the Kafka REST proxy accepts compressed requests. As long as the server is "close" (low latency, high bandwidth), this won't be needed, but if the service is only hosted in the cloud this could be useful.

Positive: much smaller messages, the message contents are usually very redundant.
Negative: higher CPU load on the client.

The Confluent Platform and Apache Kafka provide out-of-the-box functionalities to compress messages.

Producer configuration - ... you might pass in the compression.type option to enable site-wide compression to reduce storage and network overhead.

Batching and Compression: Kafka producers attempt to collect sent messages into batches to improve throughput. With the Java client, you can use batch.size to control the maximum size in bytes of each message batch. To give more time for batches to fill, you can use linger.ms to have the producer delay sending. Compression can be enabled with the compression.type setting. Compression covers full message batches, so larger batches will typically mean a higher compression ratio...

Furthermore, it is possible to optimise data exchanges configuring batch of messages: instead of sending one message at time, producers can be set up to dispatch batch of messages. This functionality should reduce the overhead added by Kafka at each message exchange and it should cut down the size of each message.

However, enabling compression adds CPU overhead to both client and server. In case of REST Proxy, this cost will be paid twice: first one the message arrive from the client and then when the producer inside the REST-Proxy writes the message inside the required topic.

I'm referring to the REST proxy in this issue, not the internal Kafka system. Once a message is inside the Kafka system I don't think it should be compressed as I expect latencies to be low and bandwidth to be high, moreover, it already uses Avro binary encoding then which is quite small. However, the REST proxy only accepts Avro JSON encoding which is very redundant and benefits a lot from compression.

Next version of Confluent Platform (while I am writing the current version is 3.0.1) will support HTTPS compression between clients and REST Proxy

After some discussion, it seems that the update will not solve our issue (confluentinc/kafka-rest#249). For the time being, if we use Apache Server, we could enable mod_deflate inputFilter to take care of the decompression, so that the Kafka REST proxy always gets decompressed data. This would come at the small cost of an additional hop.

There is a possible vulnerability using compression. We should discuss it with security guys. A possible mitigation

it seems that the update will not solve our issue (confluentinc/kafka-rest#249).

It has been confirmed by Confluent. No compression in the clients -> REST Proxy direction in Confluent 3.1. It has been developed only for REST Proxy -> clients way.

Using an Apache Server as workaround allows us to cover two scenarios:

  • compression
  • Content-Type check to accept only Avro message