spotify / dbeam

DBeam exports SQL tables into Avro files using JDBC and Apache Beam

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for "ceiling" filtering/partitioning

rulle-io opened this issue · comments

We do a DB table dump daily and would like to include partition date's data or earlier (but not later), so DB dumps are actually reproducible/deterministic.

So, is it possibe to achieve configuration like below?
Parameters: "--table=some_table --partition=2027-07-31 --partitionColumn=col" + some other option (?)
=>
SQL: "SELECT * FROM some_table WHERE 1=1 AND col < '2027-08-01'"),

P.S. It is probably possible to achive this using user-provided SQL file and add some parsing of partition value,
but would much more simple to employ dedicated parameter(s).

Yes. --partitionColumn=col is the way to do it. NO need for user-provided SQL. There are several examples internally in Spotify. I can show some examples, if needed.

Yes, actually tried all(?) the combinations of parameters
["--partitionColumn", "--partition", "--minPartitionPeriod", "--partitionPeriod"] without the success.

So, please, provide an example.