CodingCat

followers

following

stars

OpenAI

Seattle

http://codingcat.me/

Organizations

dmlc

Nan Zhu's repositories

xgboost4j-spark-scalability

a benchmark to test scalability of xgboost4j-spark and relevant projects

Language:Scala22 7 2

Self-Learning-Notebooks

RLLearning

Language:HTML1 10

spark

Mirror of Apache Spark

Language:ScalaApache-2.01 20

analytics-zoo

Distributed Tensorflow, Keras and BigDL on Apache Spark

Language:Jupyter NotebookApache-2.0010

arrow-datafusion

Apache Arrow DataFusion and Ballista query engines

Language:RustApache-2.0010

BigDL

BigDL: Distributed Deep Learning Library for Apache Spark

Language:ScalaApache-2.0010

celeborn-website

Apache Celeborn Site

Apache-2.0000

cockroachdb-todo-apps

CockroachDB To-Do Apps

Language:PythonApache-2.0010

cockroachdb_playground

some programs to play around cockroachdb

Language:PythonApache-2.0020

delta

An open-source storage layer that brings scalable, ACID transactions to Apache Spark™ and big data workloads.

Language:ScalaApache-2.0010

dmlc-core

A common bricks library for building scalable and portable distributed machine learning.

Language:C++NOASSERTION010

ec2-selector-cli

the cli tool to select ec2 instances based on filters

Language:RustApache-2.0010

frameless

Expressive types for Spark.

Language:ScalaApache-2.0000

gazelle_plugin

Native SQL Engine plugin for Spark SQL with vectorized SIMD optimizations.

Language:ScalaApache-2.0010

github-markdown-toc

Easy TOC creation for GitHub README.md

Language:ShellMIT010

gluten

Language:ScalaApache-2.0000

how-query-engines-work

This is the companion repository for the book How Query Engines Work.

Language:KotlinApache-2.0010

iceberg

Apache Iceberg

Language:JavaApache-2.0010

incubator-celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.

Language:JavaApache-2.0000

incubator-sedona

A cluster computing framework for processing large-scale geospatial data

Language:JavaApache-2.0010

incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

Language:JavaApache-2.0000

morpheus

Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.

Language:ScalaApache-2.0010

noisepage

Self-Driving Database Management System from Carnegie Mellon University

Language:C++MIT010

rabit

Reliable Allreduce and Broadcast Interface for distributed machine learning

Language:C++BSD-3-Clause010

spark-lineage

Spark SQL listener to record lineage information

Language:ScalaApache-2.0010

spark-sql-macros

Spark SQL Macros provides a mechanism similar to Spark User-Defined function registration; with the key enhancement being that custom code gets compiled to equivalent Catalyst Expressions at macro define time.

Language:ScalaApache-2.0010

string_encoder

Language:RustApache-2.0010

terraform-aws-eks-node-group

Terraform module to provision a fully managed AWS EKS Node Group

Language:HCLApache-2.0010

velox-intel

A new C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.

Language:C++Apache-2.0000

xgboost

Large-scale and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, on single node, hadoop yarn and more.

Language:C++Apache-2.0020