Add support Clickhouse database as source and sink
Delphin1 opened this issue · comments
It will be great if Arroyo also will be able to work with Clickhouse.
FWIW, if your data is already on Kafka, it's trivial to sync
- Kafka to Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
- Kafka from Clickhouse: https://clickhouse.com/docs/en/integrations/kafka#clickhouse-to-kafka
that said, syncing Arroyo stream to Clickhouse without Kafka would be cool indeed for long-term storage.
What would the high-level design be for implementing this feature and testing procedure?
Looks like a cool one.
Clickhouse has various integrations for data ingestions, Kafka as mentioned above is just one of them.
I'm no expert but maybe any of these https://clickhouse.com/docs/en/integrations -> search for "Data ingestion" work well together with what Arroyo.dev already has.
Maybe try remote select?
Something like:
SELECT * FROM remote('127.0.0.1', db.remote_engine_table) LIMIT 3;
CH Docs:
https://clickhouse.com/docs/en/sql-reference/table-functions/remote
With that in place, you can simply run a remote insert into a ClickHouse table via the tcp protocol.
This might be easier and faster to implement than a full-blown integration.