Bilbottom / sql-lineage

Personal SQL lineage generator.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Poetry tests coverage GitHub last commit

code style: prettier code style: black Imports: isort pre-commit.ci status Sourcery


SQL Lineage πŸ”€

Personal SQL lineage generator.

Built on the following awesome libraries:

So... what is this? πŸ€”

Note

This project is still in development and currently only parses the CTEs of an SQL query since that's all I need for now.

I write a lot of SQL, and I often need to understand "lineage" in lots of different ways.

Things like dbt and SQLMesh are great for object lineage (tables, columns, etc.), but I often need to understand "lineage" like the CTE lineage in a query, among other things.

This project is just a personal tool to help me generate lineage diagrams (in Mermaid syntax) for SQL queries to fit that requirement.

Upcoming improvements will (hopefully) include:

  • Semantic edges (e.g. distinguish between JOIN, WHERE, UNION, etc.)
  • Support for parameterised queries (e.g. with Jinja blocks, like dbt)
  • Parsing only the SELECT part of a SQL file that includes DDL/DML commands
  • Lineage for multiple files in a single diagram
  • Column lineage to Mermaid

Installation ⬇️

Grab a copy from PyPI like usual (note the bills- prefix):

pip install bills-sql-lineage

Usage πŸ“–

Warning

This is likely to change significantly as the project evolves.

Pass the path to a SQL file to the lineage command to generate the lineage as a Mermaid diagram:

lineage path/to/file.sql

This will write a Mermaid diagram to path/to/file.mermaid. You can control the target path with the --target argument:

lineage path/to/file.sql --target path/to/output.mermaid

By default, the SQL dialect will be inferred by SQLGlot, but you can specify a dialect with the --dialect argument:

lineage path/to/file.sql --dialect snowflake

Example πŸ“

Given the following SQL query:

with

aaa as (select 1 as aa),
bbb as (select 2 as bb),
ccc as (select 3 as cc, aa from aaa),
ddd as (select 4 as dd, aa from aaa where aa not in (select bb from bbb))

select *
from ccc
    inner join ddd using (aa)

...the following Mermaid diagram will be generated:

graph TD
    aaa
    bbb
    ccc
    ddd
    final
    aaa --> ccc
    aaa --> ddd
    bbb --> ddd
    ccc --> final
    ddd --> final
Loading

Note that the final node is an alias for the final SELECT statement since the final SELECT statement is not a CTE.

About

Personal SQL lineage generator.


Languages

Language:Python 100.0%