Apache Airflow (Incubating)
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
Mirror of Apache Bigtop
Apache Commons Daemon
The Metadata Platform for the Modern Data Stack
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
A native Rust library for Delta Lake, with bindings into Python and Ruby.
Open Metadata and Governance
python implementation of the parquet columnar file format.
Python library for extracting feature models from natural language specifications of software products
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
Apache Groovy: A powerful multi-faceted programming language for the JVM platform
The Java gRPC implementation. HTTP/2 based RPC
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
Collect, aggregate, and visualize a data ecosystem's metadata
Mirror of Apache NiFi
An Open Standard for lineage metadata collection
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
Scalable, redundant, and distributed object store for Apache Hadoop
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
Fast Read-only Data Dictionary Attached to Each Spark Executor
Visualize data lineage in Spark SQL
Whisper is a file-based time-series database format for Graphite.
Mirror of Apache Zeppelin