Kengo Seki's repositories
airflow
Apache Airflow (Incubating)
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
bigtop
Mirror of Apache Bigtop
commons-daemon
Apache Commons Daemon
datahub
The Metadata Platform for the Modern Data Stack
delta
An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.
delta-rs
A native Rust library for Delta Lake, with bindings into Python and Ruby.
egeria
Open Metadata and Governance
fastparquet
python implementation of the parquet columnar file format.
FeatureX
Python library for extracting feature models from natural language specifications of software products
fineract
Apache Fineract
gobblin
Gobblin is a distributed big data integration framework (ingestion, replication, compliance, retention) for batch and streaming systems. Gobblin features integrations with Apache Hadoop, Apache Kafka, Salesforce, S3, MySQL, Google etc.
groovy
Apache Groovy: A powerful multi-faceted programming language for the JVM platform
grpc-java
The Java gRPC implementation. HTTP/2 based RPC
incubator-liminal
Apache Liminals goal is to operationalise the machine learning process, allowing data scientists to quickly transition from a successful experiment to an automated pipeline of model training, validation, deployment and inference in production. Liminal provides a Domain Specific Language to build ML workflows on top of Apache Airflow.
marquez
Collect, aggregate, and visualize a data ecosystem's metadata
nifi
Mirror of Apache NiFi
OpenLineage
An Open Standard for lineage metadata collection
OpenMetadata
Open Standard for Metadata. A Single place to Discover, Collaborate and Get your data right.
ozone
Scalable, redundant, and distributed object store for Apache Hadoop
parquet-mr
Apache Parquet
redash
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
sdkman-db-migrations
Database migrations for the sdkman API
spark-executor-dict-plugin
Fast Read-only Data Dictionary Attached to Each Spark Executor
spark-sql-flow-plugin
Visualize data lineage in Spark SQL
whisper
Whisper is a file-based time-series database format for Graphite.
zeppelin
Mirror of Apache Zeppelin