mhowlett / librskafka

rust implementation of librdkafka? how does that sound?

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

librskafka

The goal of this project is to:

  1. Explore the pros and cons of a couple of potential Kafka client architectures that deviate from those employed by the Java client and librdkafka today.
  2. Evaluate the feasibility/value of developing a drop-in replacement for librdkafka written in Rust.

Progress: I've spent just one weekend on this so far, most of that taken up with just thinking about things and leveling-up on my Rust skills - there isn't much in this repo yet. This project may or may not go anywhere (erring on the side of the latter).

This project builds on the awesome (and very convenient for me) kafka-protocol-rs by Gardner Vickers, which provides an auto-generated stub for the Kafka protocol in Rust.

Architecture

What we'd like is architecture which is as simple as possible (making maintenance, support and growth as easy as possible), whilst not compromising on performance in any practically significant way. By that, I mean I'm mostly concerned about efficiency, not raw throughput potential, given throughput is almost always dominated by the network anyway.

I'm focusing on the Consumer because it has the most complexity, so provides the best playground for testing ideas which are aimed at reducing that.

Architectures to explore:

  1. A single background thread (in addition to the application thread), that manages practically everything, possibly farming off some CPU intensive tasks to threadpool threads (e.g. decompression). c.f. Java where most work happens on the application thread, and librdkafka which has a main background thread and an additional thread per broker.
    • The key thought here is that concurrency (of which there is structurally a lot, and which is the key driver of complexity) will be easier to deal with if as much logic as possible is happening linearly.
    • The downside to this approach is I think a lot of the coordination between tasks will end up being explicit and not well encapsulated.
  2. Leveraging async/await syntax to
    • A lot of the complexity in maintaining librdkafka is that

Rust vs C

The benefits of rust

kafka-protocol-rs

(by Gardner Vickers):

Supported Kafka version 2.2.1

A collection of crates for working with the Kafka protocol. The Kafka protocol is a request/response orientated binary protocol. It consists of several independantly versioned RPC calls. The Kafka protocol supports fixed and variably sized big-endian encoded "leaf" types.

kafka-protocol

Traits implemented by all Kafka RPC types.

pub trait KafkaRpc: KafkaRpcType {
    fn version_added() -> i16;
    fn version_removed() -> Option<i16>;
}

pub trait KafkaRpcType: Sized {
    fn read<R: io::Read>(ctx: &mut DeserializeCtx<R>) -> Result<Self, CodecError>;
    fn size(&self, version: i16) -> usize;
    fn write<W: io::Write>(&self, ctx: &mut SerializeCtx<W>) -> Result<(), CodecError>;
}

Also includes KafkaRpcType implementations for several of the "leaf" Kafka RPC datatypes. These include bool, i8, i16, i32, i64, String, Vec<u8>. In addition, each "leaf" type can also be wrapped in an Option or a Vec.

kafka-protocol-derive

proc-macro for deriving the KafkaRpc and KafkaRpcType trait.

RPC versioning is expressed with the added and removed field attributes. An optional default can be supplied for each field as well. When serializing a derived KafkaRpc, the provided version: i16 will determine the wire format. Fields which are not present for the provided version will be ignored.

When deserializing, fields which are not expected will be set to either the default specified by the default field attribute, or the result of calling Default::default().

use kafka_protocol_derive::KafkaRpc;
#[derive(KafkaRpc)]
#[kafka(added = 0)]
struct MetadataRequestTopic {
  #[kafka(added = 0)]
  name: String
}
#[derive(KafkaRpc)]
struct MetadataRequest {
  topics: Option<Vec<MetadataRequestTopic>>,
  #[kafka(added = 4, default = "true")]
  allow_auto_creation: bool,
  #[kafka(added = 8)]
  include_cluster_authorize_operations: bool,
  #[kafka(added = 8)]
  include_topic_authorized_operations: bool,
}

kafka-api and codegen

Contains Kafka RPC types and errors. Includes a KafkaRequest and KafkaResponse type which represent a combined header and body. kafka-api provides methods for serializing and deserializing requests/responses given a type implementing std::io::Read or std::io::Write.

These types are generated by the codegen crate, currently including all RPC types for Kafka version 2.2.1.

kafka-transport

ClientTransport and ServerTransport implementations supporting asynchronous RPC communication. Requires #![feature(async_await)].

This is still a work-in-progress.

About

rust implementation of librdkafka? how does that sound?


Languages

Language:Rust 100.0%