amundsen-io / amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Home Page:https://www.amundsen.io/amundsen/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Proposal: add table/column lineage for mysql backend

xuan616 opened this issue · comments

The request is going to add table/column lineage in amundsen-rds and all the corresponding functions in databuilder/metadata-service to enrich the metadata mysql backend can support.

Expected Behavior or Use Case

It would work similarly to the current lineage feature which has been running in graph db backend for a long time. With the new lineage models in amundsen-rds, lineage data would be ingested into mysql and api in metadata service would be responsible for lineage extraction.

Service or Ingestion ETL

Components impacted:

Possible Implementation

  • amundsen-rds: add two new models(table_lineage, column_lineage) and the corresponding schema migration script
  • databuilder: add iterators for table/column lineage with rds serializable in table_lineage model
  • metadata-service: implement an api endpoint to get lineage results in mysql_proxy

Example Screenshots (if appropriate):

Context

As a fundamental feature for data tracking, lineage is still missing in mysql backend. Implementing this feature would be helpful for analyzing data pipeline and aligning with graph db features for mysql backend users. Thanks.

cc @feng-tao

Cool!