Add support Clickhouse database as source and sink

Question

Delphin1 opened this issue 9 months ago · comments

It will be great if Arroyo also will be able to work with Clickhouse.

kzk2000 · Answer 1 · Sun Oct 08 2023 05:49:01 GMT+0800 (China Standard Time)

FWIW, if your data is already on Kafka, it's trivial to sync

Kafka to Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
Kafka from Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka

that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.

t · Answer 2 · Fri Nov 03 2023 05:09:59 GMT+0800 (China Standard Time)

What would the high-level design be for implementing this feature and testing procedure?
Looks like a cool one.

kzk2000 · Answer 3 · Fri Nov 03 2023 06:17:39 GMT+0800 (China Standard Time)

Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.

I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.

Marvin Hansen · Answer 4 · Mon Mar 04 2024 18:32:06 GMT+0800 (China Standard Time)

Maybe try remote select?

Something like:

SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;

With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.

This might be easier and faster to implement than a full-blown integration.