Optimized Analytics Package for Spark Platform (OAP) (oap-project)

Optimized Analytics Package for Spark Platform (OAP)

oap-project

Geek Repo

Home Page:https://oap-project.github.io/

Github PK Tool:Github PK Tool

Optimized Analytics Package for Spark Platform (OAP)'s repositories

raydp

RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.

Language:PythonLicense:Apache-2.0Stargazers:289Issues:14Issues:147

gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Language:ScalaLicense:Apache-2.0Stargazers:257Issues:19Issues:550

Gluten-Trino

Gluten: Plugin to Boost Trino's Performance

Language:JavaLicense:Apache-2.0Stargazers:66Issues:8Issues:26

sql-ds-cache

Spark* plug-in for accelerating Spark* SQL performance by using cache and index at SQL data source layer.

Language:ScalaLicense:Apache-2.0Stargazers:37Issues:6Issues:91

cloudtik

Cloud Scale Platform for Distributed Analytics and AI

Language:PythonLicense:Apache-2.0Stargazers:23Issues:3Issues:44

oap-mllib

Optimized Spark package to accelerate machine learning algorithms in Apache Spark MLlib.

Language:ScalaLicense:Apache-2.0Stargazers:20Issues:5Issues:185

remote-shuffle

Spark* shuffle plugin for support shuffling data through a remote Hadoop-compatible file system, as opposed to vanilla Spark's local-disks.

Language:ScalaLicense:Apache-2.0Stargazers:19Issues:6Issues:36

oap-tools

Tools for building, packaging, and OAP public cloud integrations such as AWS EMR, Google Dataproc and K8S.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:16Issues:5Issues:22

velox

A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Language:C++License:Apache-2.0Stargazers:16Issues:3Issues:17

pmem-shuffle

Spark* Shuffle plugin for support shuffling through remote persistent memory over fabrics, which leverages the RDMA network and remote persistent memory (for read) to provide extremely high performance and low latency shuffle solutions for Spark*.

Language:C++License:Apache-2.0Stargazers:14Issues:4Issues:26

pmem-spill

Spark plug-in package for accelerating Spark runtime spill functions using PMem such as RDD cache PMem extension.

Language:ScalaLicense:Apache-2.0Stargazers:7Issues:5Issues:25

arrow

Apache Arrow is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, Java, JavaScript, Python, and Ruby.

Language:C++License:Apache-2.0Stargazers:6Issues:3Issues:0

arrow-data-source

Spark DataSouce plugin for reading files from various formats like Parquet into Arrow compatible columnar vectors.

Language:ScalaLicense:Apache-2.0Stargazers:6Issues:4Issues:8
Language:PythonStargazers:5Issues:0Issues:0

pmem-common

Common library for accessing PMEM native library functions including memkind, vmemcache and so on.

Language:JavaLicense:Apache-2.0Stargazers:3Issues:5Issues:11

libhdfs3

HDFS file read access for ClickHouse

Language:C++License:Apache-2.0Stargazers:2Issues:1Issues:0
Language:PythonLicense:Apache-2.0Stargazers:2Issues:5Issues:1

oap-project.github.io

The OAP project web site

Language:HTMLLicense:Apache-2.0Stargazers:0Issues:3Issues:0

solution-navigator

Example solutions or code for using OAP features.

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:0Issues:3Issues:0

libhdfs3-downstream

a native c/c++ hdfs client (downstream fork from apache-hawq)

Language:C++License:Apache-2.0Stargazers:0Issues:2Issues:0

protobuf

A Intel customized Protocol Buffers - Google's data interchange format

Language:C++License:NOASSERTIONStargazers:0Issues:1Issues:0

pyspark-ai

English SDK for Apache Spark

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

spark-ai-kit

Gluten: Plugin to Double SparkSQL's Performance

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:0