Holden Karau's repositories
spark-testing-base
Base classes to use when writing tests with Spark
spark-flowchart
Flowchart for debugging Spark applications
sparkProjectTemplate.g8
Template for Spark Projects
spark-upgrade
Magic to help Spark pipelines upgrade
high-performance-spark-examples
Examples for High Performance Spark
distributedcomputing4kids
distributedcomputing4kids
spark-misc-utils
Misc Utils for Spark
explore-dolly
Exploring what we can do with Databrick's Dolly (and similar)
mydotfiles
My dotfiles. You probably don't care about this.
sparklingpinkpandas
Website for Sparkling Pink Pandas (queer, trans focused scooter club)
data-validator
A tool to validate data, built around Apache Spark.
spark-connect-rs
Apache Spark Connect Client for Rust
arrow-datafusion-comet
Apache Arrow DataFusion Comet Spark Accelerator
bitsandbytes
8-bit CUDA functions for PyTorch
django-rest-framework-braces
Collection of utilities for working with django rest framework (DRF)
lit-parrot
Implementation of Falcon, StableLM, Pythia, INCITE language models based on nanoGPT. Supports flash attention, LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.
looking-glass
Easy to deploy Looking Glass
obico-server
Obico is a community-built, open-source smart 3D printing platform used by makers, enthusiasts, and tinkerers around the world.
spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
uszipcode-project
USA zipcode programmable database, includes up-to-date census and geometry information.