monkey_boy's repositories
incubator-xtable
Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
gravitino
World's most powerful open data catalog for building a high-performance, geo-distributed and federated metadata lake.
unitycatalog
Open, Multi-modal Catalog for Data & AI
datafusion-comet
Apache DataFusion Comet Spark Accelerator
spark
Mirror of Apache Spark
olap-performance
OLAP Database Performance Tuning Guide
incubator-paimon
Apache Paimon(incubating) is a streaming data lake platform that supports high-speed data ingestion, change data tracking and efficient real-time analytics.
incubator-celeborn
Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
picocli
Picocli is a modern framework for building powerful, user-friendly, GraalVM-enabled command line apps with ease. It supports colors, autocompletion, subcommands, and more. In 1 source file so apps can include as source & avoid adding a dependency. Written in Java, usable from Groovy, Kotlin, Scala, etc.
skypilot
SkyPilot is a framework for easily running machine learning workloads on any cloud through a unified interface.
zio-quill
Compile-time Language Integrated Queries for Scala
incubator-uniffle
Uniffle is a high performance, general purpose Remote Shuffle Service.
starrocks
StarRocks is a next-gen sub-second MPP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics and ad-hoc query.
connectors
This library allows Scala and Java-based projects (including Apache Flink, Apache Hive, Apache Beam, and PrestoDB) to read from and write to Delta Lake.
delta-rs
A native Rust library for Delta Lake, with bindings into Python and Ruby.
parquet-mr
Apache Parquet
incubator-doris
Apache Doris(Incubating) is an MPP-based interactive SQL data warehousing for reporting and analysis.
arrow-datafusion
Apache Arrow DataFusion SQL Query Engine
llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
gazelle_plugin
Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.
alluxio
Alluxio, data orchestration for analytics and machine learning in the cloud
Firestorm
Firestorm is a Remote Shuffle Service, and provides the capability for Apache Spark applications to store shuffle data on remote servers
RemoteShuffleService
Remote shuffle service for Apache Spark to store shuffle data on remote servers.
mockito
Most popular Mocking framework for unit tests written in Java
sparklens
Qubole Sparklens tool for performance tuning Apache Spark
incubator-yunikorn-core
Apache YuniKorn Core
styleguide
Style guides for Google-originated open-source projects