getstrm / pace

Data policy IN, dynamic view OUT: PACE is the Policy As Code Engine. It helps you to programatically create and apply a data policy to a processing platform like Databricks, Snowflake or BigQuery (or plain 'ol Postgres, even!) with definitions imported from Collibra, Datahub, ODD and the like.

Home Page:https://pace.getstrm.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[PACE-19] Add support for data retention

astronomous opened this issue · comments

commented

When implementing a data policy, I would like to specify a retention period, after which data should be omitted from the created view.

Details:

  • In the rulesets, add support for retention
  • take a specified retention period (in days), indicated by a tag
  • In building the dynamic view, check if a row is inside or outside of the retention period
  • Design choice: checks should be conducted from the creation (not update) timestamp of each row
  • If outside of the retention period, filter out the row.
  • Question: would it be possible to delete the source data altogether instead of filtering out from a view only?

PACE-19

Doing this we also need to make sure we can correctly parse row timestamps. There are many timestamp formats in use in SQL tables, sometimes they're just stored as strings.

This was not an action of mine in Linear, probably something with the GitHub sync. I've seen it with other tickets too.