Maksim Nikiforov (scimaksim)

scimaksim

Geek Repo

Company:Databricks

Location:North Carolina

Github PK Tool:Github PK Tool

Maksim Nikiforov's starred repositories

dbsql_sme

DBSQL SME Repo contains demos, tutorials, blog code, advanced production helper functions and more!

Language:PythonStargazers:18Issues:0Issues:0

mlflow

Open source platform for the machine learning lifecycle

Language:PythonLicense:Apache-2.0Stargazers:18147Issues:0Issues:0

spark

Apache Spark - A unified analytics engine for large-scale data processing

Language:ScalaLicense:Apache-2.0Stargazers:39123Issues:0Issues:0

ucx

Automated migrations to Unity Catalog

Language:PythonLicense:NOASSERTIONStargazers:207Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:316Issues:0Issues:0

Mr.-Ranedeer-AI-Tutor

A GPT-4 AI Tutor Prompt for customizable personalized learning experiences.

Stargazers:28328Issues:0Issues:0

jupyterlab-s3-browser

A JupyterLab extension for browsing S3-compatible object storage

Language:TypeScriptLicense:Apache-2.0Stargazers:119Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:8Issues:0Issues:0

delta-examples

Delta Lake examples

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:197Issues:0Issues:0

databricks

Repository of sample Databricks notebooks

Language:HTMLStargazers:238Issues:0Issues:0
Language:HTMLStargazers:1Issues:0Issues:0

mosaic

An extension to the Apache Spark framework that allows easy and fast processing of very large geospatial datasets.

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:267Issues:0Issues:0

geopandas

Python tools for geographic data

Language:PythonLicense:BSD-3-ClauseStargazers:4403Issues:0Issues:0

prql

PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

Language:RustLicense:Apache-2.0Stargazers:9711Issues:0Issues:0

ML-DataEng-Pipelines

To show the usefulness of data engineering and ML pipelines.

Language:Jupyter NotebookStargazers:43Issues:0Issues:0

delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Language:ScalaLicense:Apache-2.0Stargazers:7359Issues:0Issues:0

terraform-databricks-lakehouse-blueprints

Set of Terraform automation templates and quickstart demos to jumpstart the design of a Lakehouse on Databricks. This project has incorporated best practices across the industries we work with to deliver composable modules to build a workspace to comply with the highest platform security and governance standards.

Language:PythonLicense:NOASSERTIONStargazers:71Issues:0Issues:0
Language:PythonStargazers:324Issues:0Issues:0

dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language:PythonLicense:NOASSERTIONStargazers:297Issues:0Issues:0

debezium

Change data capture for a variety of databases. Please log issues at https://issues.redhat.com/browse/DBZ.

Language:JavaLicense:Apache-2.0Stargazers:10315Issues:0Issues:0

FLAML

A fast library for AutoML and tuning. Join our Discord: https://discord.gg/Cppx2vSPVP.

Language:Jupyter NotebookLicense:MITStargazers:3808Issues:0Issues:0

autogluon

Fast and Accurate ML in 3 Lines of Code

Language:PythonLicense:Apache-2.0Stargazers:7535Issues:0Issues:0

system-design-primer

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.

Language:PythonLicense:NOASSERTIONStargazers:267019Issues:0Issues:0

dbdemos

Demos to implement your Databricks Lakehouse

Language:HTMLLicense:NOASSERTIONStargazers:263Issues:0Issues:0

pyspark-style-guide

This is a guide to PySpark code style presenting common situations and the associated best practices based on the most frequent recurring topics across the PySpark repos we've encountered.

Language:PythonLicense:MITStargazers:1000Issues:0Issues:0

databricks-sync

An experimental tool to synchronize source Databricks deployment with a target Databricks deployment.

Language:PythonLicense:NOASSERTIONStargazers:46Issues:0Issues:0

migrate

Old scripts for one-off ST-to-E2 migrations. Use "terraform exporter" linked in the readme.

Language:PythonLicense:NOASSERTIONStargazers:180Issues:0Issues:0

diviner

Grouped time series forecasting engine

Language:PythonLicense:Apache-2.0Stargazers:36Issues:0Issues:0

mlops-v2

Azure MLOps (v2) solution accelerators. Enterprise ready templates to deploy your machine learning models on the Azure Platform.

Language:ShellLicense:MITStargazers:490Issues:0Issues:0

interpret

Fit interpretable models. Explain blackbox machine learning.

Language:C++License:MITStargazers:6166Issues:0Issues:0