The goal of this project is to:
- Explore the pros and cons of a couple of potential Kafka client architectures that deviate from those employed by the Java client and librdkafka today.
- Evaluate the feasibility/value of developing a drop-in replacement for librdkafka written in Rust.
Progress: I've spent just one weekend on this so far, most of that taken up with just thinking about things and leveling-up on my Rust skills - there isn't much in this repo yet. This project may or may not go anywhere (erring on the side of the latter).
This project builds on the awesome (and very convenient for me) kafka-protocol-rs by Gardner Vickers, which provides an auto-generated stub for the Kafka protocol in Rust.
What we'd like is architecture which is as simple as possible (making maintenance, support and growth as easy as possible), whilst not compromising on performance in any practically significant way. By that, I mean I'm mostly concerned about efficiency, not raw throughput potential, given throughput is almost always dominated by the network anyway.
I'm focusing on the Consumer because it has the most complexity, so provides the best playground for testing ideas which are aimed at reducing that.
Architectures to explore:
- A single background thread (in addition to the application thread), that manages practically everything, possibly farming off some CPU intensive tasks to threadpool threads (e.g. decompression). c.f. Java where most work happens on the application thread, and librdkafka which has a main background thread and an additional thread per broker.
- The key thought here is that concurrency (of which there is structurally a lot, and which is the key driver of complexity) will be easier to deal with if as much logic as possible is happening linearly.
- The downside to this approach is I think a lot of the coordination between tasks will end up being explicit and not well encapsulated.
- Leveraging async/await syntax to
- A lot of the complexity in maintaining librdkafka is that
The benefits of rust
(by Gardner Vickers):
A collection of crates for working with the Kafka protocol. The Kafka protocol is a request/response orientated binary protocol. It consists of several independantly versioned RPC calls. The Kafka protocol supports fixed and variably sized big-endian encoded "leaf" types.
Traits implemented by all Kafka RPC types.
pub trait KafkaRpc: KafkaRpcType {
fn version_added() -> i16;
fn version_removed() -> Option<i16>;
}
pub trait KafkaRpcType: Sized {
fn read<R: io::Read>(ctx: &mut DeserializeCtx<R>) -> Result<Self, CodecError>;
fn size(&self, version: i16) -> usize;
fn write<W: io::Write>(&self, ctx: &mut SerializeCtx<W>) -> Result<(), CodecError>;
}
Also includes KafkaRpcType
implementations for several of the "leaf" Kafka
RPC datatypes. These include bool, i8, i16, i32, i64, String, Vec<u8>
.
In addition, each "leaf" type can also be wrapped in an Option
or a Vec
.
proc-macro
for deriving the KafkaRpc
and KafkaRpcType
trait.
RPC versioning is expressed with the added
and removed
field attributes. An
optional default
can be supplied for each field as well. When serializing a
derived KafkaRpc
, the provided version: i16
will determine the wire format.
Fields which are not present for the provided version will be ignored.
When deserializing, fields which are not expected will be set to either the
default specified by the default
field attribute, or the result of calling
Default::default()
.
use kafka_protocol_derive::KafkaRpc;
#[derive(KafkaRpc)]
#[kafka(added = 0)]
struct MetadataRequestTopic {
#[kafka(added = 0)]
name: String
}
#[derive(KafkaRpc)]
struct MetadataRequest {
topics: Option<Vec<MetadataRequestTopic>>,
#[kafka(added = 4, default = "true")]
allow_auto_creation: bool,
#[kafka(added = 8)]
include_cluster_authorize_operations: bool,
#[kafka(added = 8)]
include_topic_authorized_operations: bool,
}
Contains Kafka RPC types and errors. Includes a KafkaRequest
and KafkaResponse
type which represent a combined header and body. kafka-api
provides methods for
serializing and deserializing requests/responses given a type implementing
std::io::Read
or std::io::Write
.
These types are generated by the codegen
crate, currently including all RPC types
for Kafka version 2.2.1
.
ClientTransport
and ServerTransport
implementations supporting asynchronous
RPC communication. Requires #![feature(async_await)]
.
This is still a work-in-progress.