Holden Karau (holdenk)

holdenk

Geek Repo

Company:Open Source Big Data Dev

Location:San Francisco, CA, USA

Home Page:http://www.holdenkarau.com/resume.pdf?q=github

Twitter:@holdenkarau

Github PK Tool:Github PK Tool


Organizations
high-performance-spark
PigsCanFlyLabs
scalingpythonml
sparklingpandas

Holden Karau's repositories

spark-testing-base

Base classes to use when writing tests with Spark

Language:ScalaLicense:Apache-2.0Stargazers:1525Issues:77Issues:207

spark-flowchart

Flowchart for debugging Spark applications

sparkProjectTemplate.g8

Template for Spark Projects

Language:ScalaLicense:Apache-2.0Stargazers:101Issues:9Issues:7

spark-upgrade

Magic to help Spark pipelines upgrade

Language:PythonLicense:Apache-2.0Stargazers:34Issues:7Issues:18

high-performance-spark-examples

Examples for High Performance Spark

Language:ScalaLicense:NOASSERTIONStargazers:15Issues:5Issues:0

distributedcomputing4kids

distributedcomputing4kids

Language:Jupyter NotebookStargazers:7Issues:6Issues:0

spark

Mirror of Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:7Issues:4Issues:0

resume

latex resume

Language:TeXStargazers:4Issues:3Issues:0

spark-misc-utils

Misc Utils for Spark

Language:ScalaLicense:Apache-2.0Stargazers:4Issues:3Issues:2
Language:ShellStargazers:2Issues:3Issues:0

explore-dolly

Exploring what we can do with Databrick's Dolly (and similar)

Language:PythonLicense:Apache-2.0Stargazers:2Issues:2Issues:0

mydotfiles

My dotfiles. You probably don't care about this.

Language:ShellLicense:GPL-2.0Stargazers:2Issues:3Issues:0

sparklingpinkpandas

Website for Sparkling Pink Pandas (queer, trans focused scooter club)

Language:JavaScriptStargazers:2Issues:4Issues:0

data-validator

A tool to validate data, built around Apache Spark.

Language:ScalaLicense:NOASSERTIONStargazers:1Issues:1Issues:0

gluten

Gluten: Plugin to Double SparkSQL's Performance

Language:ScalaLicense:Apache-2.0Stargazers:1Issues:1Issues:0

ray

A fast and simple framework for building and running distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Language:PythonLicense:Apache-2.0Stargazers:1Issues:2Issues:0

spark-connect-rs

Apache Spark Connect Client for Rust

Language:RustLicense:Apache-2.0Stargazers:1Issues:0Issues:0

arrow-datafusion-comet

Apache Arrow DataFusion Comet Spark Accelerator

Language:RustLicense:Apache-2.0Stargazers:0Issues:1Issues:0

bitsandbytes

8-bit CUDA functions for PyTorch

Language:PythonLicense:MITStargazers:0Issues:0Issues:0

django-rest-framework-braces

Collection of utilities for working with django rest framework (DRF)

Language:PythonLicense:NOASSERTIONStargazers:0Issues:0Issues:0

dolly

Databricks’ Dolly, a large language model trained on the Databricks Machine Learning Platform

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

lit-parrot

Implementation of Falcon, StableLM, Pythia, INCITE language models based on nanoGPT. Supports flash attention, LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0

looking-glass

Easy to deploy Looking Glass

Language:PHPLicense:GPL-3.0Stargazers:0Issues:0Issues:0
Language:HTMLLicense:Apache-2.0Stargazers:0Issues:1Issues:0

obico-server

Obico is a community-built, open-source smart 3D printing platform used by makers, enthusiasts, and tinkerers around the world.

Language:PythonLicense:AGPL-3.0Stargazers:0Issues:1Issues:0

onetable

OneTable is an omni-directional converter for table formats that facilitates interoperability across data processing systems and query engines.

Language:JavaLicense:Apache-2.0Stargazers:0Issues:1Issues:0

spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡

Language:PythonLicense:Apache-2.0Stargazers:0Issues:0Issues:0

sparklens

Qubole Sparklens tool for performance tuning Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:0Issues:0Issues:1

uszipcode-project

USA zipcode programmable database, includes up-to-date census and geometry information.

Language:PythonLicense:MITStargazers:0Issues:1Issues:0

vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Language:PythonLicense:Apache-2.0Stargazers:0Issues:1Issues:0