tabular-io / iceberg-kafka-connect

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

To extract partition fields from a timestamp

shift-alt-del opened this issue · comments

Hi, I'm now working on a PoC to sink logs from Kafka to Iceberg format, I want to partition the logs to under year=YYYY/month=MM/day=DD, but I only have a timestamp inside the log.

I didn't found any configurations on how to partition logs with timestamp, so wondering if there any workarounds existing already?

I think there is an workaround to use SMT to duplicate the ts_ms into year, month, day, then extract data into 3 different fields and set to iceberg.tables.default-partition-by, however it makes the connector config dirty yet requires to code a custom SMT function...

For a detailed example, my log format is like

{
    "ts_ms": 1588252618953,
    "data": "abcd"
}

Thanks.

Close issue, found a duplicate one with a workaround: