ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Add support Clickhouse database as source and sink

Delphin1 opened this issue · comments

It will be great if Arroyo also will be able to work with Clickhouse.

FWIW, if your data is already on Kafka, it's trivial to sync

that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.

commented

What would the high-level design be for implementing this feature and testing procedure?
Looks like a cool one.

Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.

I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.

Maybe try remote select?

Something like:

SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;

CH Docs:
https://clickhouse.com/docs/en/sql-reference/table-functions/remote

With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.

This might be easier and faster to implement than a full-blown integration.