Databricks (databricks)

Databricks

databricks

Geek Repo

Helping data teams solve the world’s toughest problems using data and AI

Location:United States of America

Home Page:https://databricks.com

Github PK Tool:Github PK Tool

Databricks's repositories

Spark-The-Definitive-Guide

Spark: The Definitive Guide's Code Repository

Language:ScalaLicense:NOASSERTIONStargazers:2810Issues:187Issues:49

spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Language:PythonLicense:Apache-2.0Stargazers:1077Issues:94Issues:49

spark-csv

CSV Data Source for Apache Spark 1.x

Language:ScalaLicense:Apache-2.0Stargazers:1052Issues:422Issues:0
Language:ScalaLicense:Apache-2.0Stargazers:575Issues:369Issues:62

spark-avro

Avro Data Source for Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:539Issues:70Issues:167

spark-corenlp

Stanford CoreNLP wrapper for Apache Spark

Language:ScalaLicense:GPL-3.0Stargazers:422Issues:51Issues:32

spark-perf

Performance tests for Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:379Issues:49Issues:55

spark-knowledgebase

Spark Knowledge Base

benchmarks

A place in which we publish scripts for reproducible benchmarks.

Language:PythonLicense:NOASSERTIONStargazers:107Issues:316Issues:5

mlflow

Open source platform for the machine learning lifecycle

Language:PythonLicense:Apache-2.0Stargazers:96Issues:13Issues:0

sbt-databricks

An sbt plugin for deploying code to Databricks Cloud

Language:ScalaLicense:NOASSERTIONStargazers:71Issues:352Issues:23

simr

Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure

Language:ScalaLicense:Apache-2.0Stargazers:37Issues:289Issues:8

pig-on-spark

proof-of-concept implementation of Pig-on-Spark integrated at the logical node level

Language:ScalaLicense:Apache-2.0Stargazers:28Issues:27Issues:0

xgb-regressor

MLflow XGBoost Regressor

Language:PythonStargazers:16Issues:7Issues:0

databricks-accelerators

Accelerate the use of Databricks for customers [public repo]

genomics-pipelines

secondary analysis pipelines parallelized with apache spark

Language:ScalaLicense:Apache-2.0Stargazers:15Issues:6Issues:0

terraform-databricks-mlops-aws-infrastructure

This module sets up multi-workspace model registry between a Databricks AWS development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries.

Language:HCLLicense:Apache-2.0Stargazers:3Issues:0Issues:0

subpar

Subpar is a utility for creating self-contained python executables. It is designed to work well with Bazel.

License:Apache-2.0Stargazers:2Issues:0Issues:0

terraform-databricks-aws-workspace

Terraform module to create Databricks AWS E2 workspace

Language:HCLLicense:Apache-2.0Stargazers:2Issues:0Issues:0

spark-salesforce

Spark data source for Salesforce

Language:ScalaLicense:Apache-2.0Stargazers:1Issues:4Issues:0

terraform-databricks-mlops-azure-infrastructure-with-sp-creation

This module sets up multi-workspace model registry between an Azure Databricks development (dev) workspace, staging workspace, and production (prod) workspace, allowing READ access from dev/staging workspaces to staging & prod model registries. It also creates the relevant Azure Active Directory (AAD) applications for the service principals.

Language:HCLLicense:Apache-2.0Stargazers:1Issues:0Issues:0
Stargazers:0Issues:5Issues:0

build-tooling

Databricks Education department's curriculum build tool chain

Language:PythonLicense:Apache-2.0Stargazers:0Issues:4Issues:0

govmm-1

Virtual Machine Manager for Go (govmm) is a suite of packages that provide Go APIs for creating and managing virtual machines.

Language:GoLicense:Apache-2.0Stargazers:0Issues:2Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

test-infra

Test infrastructure for the Kubernetes project.

License:Apache-2.0Stargazers:0Issues:0Issues:0