Ravi Hindocha's repositories
dbt-core-rh
dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
ingestd-tpcdi
Data Integration via Confluent Kafka
tpc-di_benchmark
Benchmark for Airflow with BigQuery as the Data Warehouse using TPC - DI
amundsen
Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.
aqueduct
The control center for ML in the cloud
awesome
😎 Awesome lists about all kinds of interesting topics
awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
build-your-own-x
Master programming by recreating your favorite technologies from scratch.
codon
A high-performance, zero-overhead, extensible Python compiler using LLVM
Computer-Science-Education-Resources
A place for programming language instructors to share educational materials
datacontract-specification
The Data Contract Specification Repository
delta
An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
free-for-dev
A list of SaaS, PaaS and IaaS offerings that have free tiers of interest to devops and infradev
free-programming-books
:books: Freely available programming books
h2o-llmstudio
H2O LLM Studio - a framework and no-code GUI for fine-tuning LLMs
hudi
Upserts, Deletes And Incremental Processing on Big Data.
iceberg
Apache Iceberg
jitsu
Jitsu is an open-source Segment alternative. Fully-scriptable data ingestion engine for modern data teams. Set-up a real-time data pipeline in minutes, not days
mage-ai
🧙 The modern replacement for Airflow. Build, run, and manage data pipelines for integrating and transforming data.
mlflow
Open source platform for the machine learning lifecycle
onnx
Open standard for machine learning interoperability
presto
The official home of the Presto distributed SQL query engine for big data
python-mastery
Advanced Python Mastery (course by @dabeaz)
ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.
spark
Apache Spark - A unified analytics engine for large-scale data processing
spec
The AsyncAPI specification allows you to create machine-readable definitions of your asynchronous APIs.
trino
Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
zenml
ZenML 🙏: Build portable, production-ready MLOps pipelines. https://zenml.io.