opendatadiscovery / odd-airflow-adapter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ODD Airflow adapter

ODD Airflow adapter is used for extracting data transformers and data transformers runs info and metadata from Apache Airflow (versions up to 1.10.15). This adapter is implemetation of push model (see more https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#discovery-models). After installation, your Airflow will push new data transformer on DAG creation, and data transformer runs on every DAG run.

Data entities:

Entity type Entity source
Data Transformer DAG
Data Transformer run DAG's runs

For more information about data entities see https://github.com/opendatadiscovery/opendatadiscovery-specification/blob/main/specification/specification.md#data-model-specification

Quickstart

Installation

pip3 install odd-airflow

Usage

from odd_airflow import DAG

default_args = {
	"data_catalog_base_url": "https://yourcatalog.url", # Data catalog ingestion API url
	"unit_id": "airflow_unit_id" # Host of Airflow source or any name for ODDRN generation (in order to uniquely identify Data entities)
}

dag = DAG(
    dag_id='your_example_dag',
    default_args=default_args,
    schedule_interval=None,
    tags=['example']
)

# Your tasks

Alternatively you can define env variables:

DATA_CATALOG_base_URL=https://yourcatalog.url
AIRFLOW_UNIT_ID=airflow_unit_id

Requirements

  • Python 3.8
  • Airflow <= 1.10.15

Run demo

  docker-compose -f docker/docker-compose.yml up

Airflow UI will be available at localhost:8081

About

License:Apache License 2.0


Languages

Language:Python 90.3%Language:Shell 9.2%Language:Dockerfile 0.4%