maxboone / euridice

⚛️ Simple and scalable data quality improvement framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Euridice (Extensible Data Cleansing Framework)

Euridice Logo

Euridice is a framework designed to provide the tools for intuitive data quality improvement through the process of data cleansing. Built for a thesis project at Leiden University, the software was developed to clean data used in data science pipelines through interactive and intuitive workflows. It should be used in the context of a consultant assisting a user of the to-be-cleaned data with building workflows to clean data.

The framework consists of multiple core applications / services:

  • Lens: React-based Workflow Editor
  • Lake: Dask-wrapping DataFrame storage
  • Flow: Workflow Templating & Rendering (DAG -> BFS -> IPython)
  • Engine: GraphQL API mapping for JupyterLab Server

Euridice features a complete REST API as well as a GraphQL API for easily developing and integrating with other tools and systems.

Euridice runs as multiple APIs in the Django Python framework with a PostgreSQL database.

About

⚛️ Simple and scalable data quality improvement framework

License:Apache License 2.0