mistercrunch / dagster

Dagster is an opinionated system and programming model for data pipelines.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

image

Introduction

Dagster is an opinionated system and programming model for data pipelines. This process goes by many names -- ETL (extract-transform-load), ELT (extract-load-transform), model production, data integration, and so on -- but in essence they all describe the same activity: Performing a set of computations structured as a DAG (directed, acyclic graph) that end up producing data assets, whether those assets be tables, files, machine-learning models, etc.

There are a few tools in this repo:

  • Dagster: The core programming model and abstraction stack; a stateless single-node and -process execution engine; and a CLI tool for driving that engine.
  • Dagit: Dagit is a rich viewer for Dagster assets.
  • Dagster GE: A Dagster integration with Great Expectations. (see https://github.com/great-expectations/great_expectations)
  • Dagstermill: An experimental prototype for integrating productionized notebooks into dagster pipelines. Built on the papermill library (https://github.com/nteract/papermill)

Go to https://dagster.readthedocs.io/en/latest/ for documentation!

About

Dagster is an opinionated system and programming model for data pipelines.


Languages

Language:Python 80.8%Language:TypeScript 14.6%Language:Jupyter Notebook 3.1%Language:Makefile 1.1%Language:HTML 0.3%Language:Shell 0.0%Language:JavaScript 0.0%