georgehu0815's starred repositories

dbt-spark

dbt-spark contains all of the code enabling dbt to work with Apache Spark and Databricks

Language:PythonLicense:Apache-2.0Stargazers:378Issues:0Issues:0

airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Language:PythonLicense:Apache-2.0Stargazers:35410Issues:0Issues:0

awesome-chatgpt-prompts

This repo includes ChatGPT prompt curation to use ChatGPT better.

Language:HTMLLicense:CC0-1.0Stargazers:107224Issues:0Issues:0

ebm2onnx

A tool to convert EBM models to ONNX

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Language:PythonLicense:MITStargazers:953Issues:0Issues:0

lm-human-preferences

Code for the paper Fine-Tuning Language Models from Human Preferences

Language:PythonLicense:MITStargazers:1170Issues:0Issues:0

trl

Train transformer language models with reinforcement learning.

Language:PythonLicense:Apache-2.0Stargazers:8779Issues:0Issues:0

transformers_tasks

⭐️ NLP Algorithms with transformers lib. Supporting Text-Classification, Text-Generation, Information-Extraction, Text-Matching, RLHF, SFT etc.

Language:Jupyter NotebookStargazers:2055Issues:0Issues:0

batch-processing-gateway

The gateway component to make Spark on K8s much easier for Spark users.

Language:JavaLicense:Apache-2.0Stargazers:173Issues:0Issues:0

feathr

Feathr – A scalable, unified data and AI engineering platform for enterprise

Language:ScalaLicense:Apache-2.0Stargazers:1955Issues:0Issues:0

spark-jobserver

REST job server for Apache Spark

Language:ScalaLicense:NOASSERTIONStargazers:2841Issues:0Issues:0

delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs

Language:ScalaLicense:Apache-2.0Stargazers:7284Issues:0Issues:0

uvicorn-gunicorn-fastapi-docker

Docker image with Uvicorn managed by Gunicorn for high-performance FastAPI web applications in Python with performance auto-tuning.

Language:PythonLicense:MITStargazers:2630Issues:0Issues:0

conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

Language:GoLicense:Apache-2.0Stargazers:365Issues:0Issues:0

spark

Apache Spark - A unified analytics engine for large-scale data processing

Language:ScalaLicense:Apache-2.0Stargazers:38950Issues:0Issues:0

hopsworks

Hopsworks - Data-Intensive AI platform with a Feature Store

Language:JavaLicense:AGPL-3.0Stargazers:1110Issues:0Issues:0

delight

A Spark UI and Spark History Server alternative with CPU and Memory metrics! Delight is free, cross-platform, and open-source.

Language:ScalaLicense:NOASSERTIONStargazers:341Issues:0Issues:0

dask-jobqueue

Deploy Dask on job schedulers like PBS, SLURM, and SGE

Language:PythonLicense:BSD-3-ClauseStargazers:232Issues:0Issues:0

dask-kubernetes

Native Kubernetes integration for Dask

Language:PythonLicense:BSD-3-ClauseStargazers:312Issues:0Issues:0

yunikorn-scheduler-interface

Apache YuniKorn Scheduler Interface

Language:MakefileLicense:Apache-2.0Stargazers:27Issues:0Issues:0

yunikorn-core

Apache YuniKorn Core

Language:GoLicense:Apache-2.0Stargazers:775Issues:0Issues:0

yunikorn-web

Apache YuniKorn Web UI

Language:TypeScriptLicense:Apache-2.0Stargazers:30Issues:0Issues:0

dashboard

General-purpose web UI for Kubernetes clusters

Language:GoLicense:Apache-2.0Stargazers:14098Issues:0Issues:0

winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows

Language:ShellStargazers:1828Issues:0Issues:0
Language:Jupyter NotebookLicense:Apache-2.0Stargazers:207Issues:0Issues:0

iceberg-rest-image

Simple project to expose a catalog over REST using a Java catalog backend

Language:JavaLicense:Apache-2.0Stargazers:88Issues:0Issues:0

manageability-toolkits

Sample code to get quickly on-boarded to common Azure manageability tools and platforms like Azure Monitor.

Language:PowerShellLicense:MITStargazers:68Issues:0Issues:0

recommenders

Best Practices on Recommendation Systems

Language:PythonLicense:MITStargazers:18468Issues:0Issues:0

azure-event-hubs-spark

Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

Language:ScalaLicense:Apache-2.0Stargazers:232Issues:0Issues:0

azure-machine-learning-bicep

A set of Bicep templates for Azure Machine Learning.

License:MITStargazers:1Issues:0Issues:0