Databricks's repositories
Spark-The-Definitive-Guide
Spark: The Definitive Guide's Code Repository
spark-sklearn
(Deprecated) Scikit-learn integration package for Apache Spark
spark-avro
Avro Data Source for Apache Spark
spark-corenlp
Stanford CoreNLP wrapper for Apache Spark
spark-perf
Performance tests for Apache Spark
spark-knowledgebase
Spark Knowledge Base
benchmarks
A place in which we publish scripts for reproducible benchmarks.
sbt-databricks
An sbt plugin for deploying code to Databricks Cloud
pig-on-spark
proof-of-concept implementation of Pig-on-Spark integrated at the logical node level
xgb-regressor
MLflow XGBoost Regressor
databricks-accelerators
Accelerate the use of Databricks for customers [public repo]
genomics-pipelines
secondary analysis pipelines parallelized with apache spark
knowledge-repo
A next-generation curated knowledge sharing platform for data scientists and other technical professions.
terraform-databricks-aws-workspace
Terraform module to create Databricks AWS E2 workspace
spark-salesforce
Spark data source for Salesforce
build-tooling
Databricks Education department's curriculum build tool chain
govmm-1
Virtual Machine Manager for Go (govmm) is a suite of packages that provide Go APIs for creating and managing virtual machines.
test-infra
Test infrastructure for the Kubernetes project.