ArroyoSystems / arroyo

Distributed stream processing engine in Rust

Home Page:https://arroyo.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Virtual fields are included in source struct

mwylde opened this issue · comments

When creating a table with virtual fields (for example ,to generate a computed timestamp or watermark field), the source struct should only contain the physical fields, as this is used to deserialize data off of the source, then the virtual fields should be added in a post-source filter step.

However (possibly as a consequence of the changes in #184) virtual fields are being included in the source struct.

For example:

create table stream (
  timestamp BIGINT NOT NULL,
  event_time TIMESTAMP NOT NULL GENERATED ALWAYS AS (CAST(from_unixtime(timestamp * 1000000000) as TIMESTAMP))
) WITH (
  connector = 'kafka',
  bootstrap_servers = 'localhost:9092',
  topic = 'stream',
  format = 'json',
  type = 'source',
  event_time_field = 'event_time'
);

select * from stream;

Creates a pipeline with this source struct:

pub struct generated_struct_5280044764730053267 {
    pub timestamp: i64,
    #[serde(with = "arroyo_worker::formats::timestamp_as_rfc3339")]
    pub event_time: std::time::SystemTime,
}

But because the virtual field is non-nullable, this causes deserialization errors when the events (correctly) not include the compute event_time field.