Databricks (databricks)

Databricks

databricks

Geek Repo

Helping data teams solve the world’s toughest problems using data and AI

Location:United States of America

Home Page:https://databricks.com

Github PK Tool:Github PK Tool

Databricks's repositories

Spark-The-Definitive-Guide

Spark: The Definitive Guide's Code Repository

Language:ScalaLicense:NOASSERTIONStargazers:2780Issues:187Issues:49

spark-sklearn

(Deprecated) Scikit-learn integration package for Apache Spark

Language:PythonLicense:Apache-2.0Stargazers:1076Issues:94Issues:49

spark-csv

CSV Data Source for Apache Spark 1.x

Language:ScalaLicense:Apache-2.0Stargazers:1051Issues:414Issues:0
Language:ScalaLicense:Apache-2.0Stargazers:571Issues:364Issues:61

spark-avro

Avro Data Source for Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:539Issues:70Issues:167

spark-corenlp

Stanford CoreNLP wrapper for Apache Spark

Language:ScalaLicense:GPL-3.0Stargazers:423Issues:52Issues:32

spark-perf

Performance tests for Apache Spark

Language:ScalaLicense:Apache-2.0Stargazers:379Issues:49Issues:55

spark-knowledgebase

Spark Knowledge Base

benchmarks

A place in which we publish scripts for reproducible benchmarks.

Language:PythonLicense:NOASSERTIONStargazers:108Issues:311Issues:5

mlflow

Open source platform for the machine learning lifecycle

Language:PythonLicense:Apache-2.0Stargazers:96Issues:13Issues:0

sbt-databricks

An sbt plugin for deploying code to Databricks Cloud

Language:ScalaLicense:NOASSERTIONStargazers:71Issues:348Issues:23

simr

Spark In MapReduce (SIMR) - launching Spark applications on existing Hadoop MapReduce infrastructure

Language:ScalaLicense:Apache-2.0Stargazers:37Issues:287Issues:8

pig-on-spark

proof-of-concept implementation of Pig-on-Spark integrated at the logical node level

Language:ScalaLicense:Apache-2.0Stargazers:28Issues:27Issues:0

xgb-regressor

MLflow XGBoost Regressor

Language:PythonStargazers:16Issues:7Issues:0

databricks-accelerators

Accelerate the use of Databricks for customers [public repo]

Language:PythonStargazers:15Issues:0Issues:0

genomics-pipelines

secondary analysis pipelines parallelized with apache spark

Language:ScalaLicense:Apache-2.0Stargazers:15Issues:6Issues:0
Language:Jupyter NotebookStargazers:10Issues:0Issues:0

knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

Language:PythonLicense:Apache-2.0Stargazers:3Issues:0Issues:0

subpar

Subpar is a utility for creating self-contained python executables. It is designed to work well with Bazel.

License:Apache-2.0Stargazers:2Issues:0Issues:0

terraform-databricks-aws-workspace

Terraform module to create Databricks AWS E2 workspace

Language:HCLLicense:Apache-2.0Stargazers:2Issues:0Issues:0

livegrep

Interactively grep source code. Source for http://livegrep.com/

Language:C++License:NOASSERTIONStargazers:1Issues:58Issues:0

spark-salesforce

Spark data source for Salesforce

Language:ScalaLicense:Apache-2.0Stargazers:1Issues:4Issues:0
Stargazers:0Issues:0Issues:0

build-tooling

Databricks Education department's curriculum build tool chain

Language:PythonLicense:Apache-2.0Stargazers:0Issues:4Issues:0

govmm-1

Virtual Machine Manager for Go (govmm) is a suite of packages that provide Go APIs for creating and managing virtual machines.

License:Apache-2.0Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0
Stargazers:0Issues:0Issues:0

test-infra

Test infrastructure for the Kubernetes project.

License:Apache-2.0Stargazers:0Issues:0Issues:0