Ludovic Claude's starred repositories
data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
deepchecks
Deepchecks: Tests for Continuous Validation of ML Models & Data. Deepchecks is a holistic open-source solution for all of your AI & ML validation needs, enabling to thoroughly test your data and models from research to production.
ydata-synthetic
Synthetic data generators for tabular and time-series data
consulting-handbook
A guide for technical professionals looking to start consulting
PruningRadixTrie
PruningRadixTrie - 1000x faster Radix trie for prefix search & auto-complete
lakehouse-engine
The Lakehouse Engine is a configuration driven Spark framework, written in Python, serving as a scalable and distributed engine for several lakehouse algorithms, data flows and utilities for Data Products.
scalajs-sbt-vite-laminar-chartjs-example
An example of using Scala.js with sbt, Vite, Laminar and Chart.js
OncologyWG
Oncology Working Group Repository
lakehouse-tacklebox
This repo is a collection of tools to deploy, manage and operate a Databricks based Lakehouse.
json2spark-schema
Converting a json schema to a spark schema (struct) representation
terraform-azure-databricks-unity-catalog
How to Configure Azure Databricks Unity Catalog using Terraform
terraform-azurerm-data-landing-zone
Cloud Scale Analytics - Data Landing Zone Terraform Module
terra-dbrx-workshop
Quick start for setting up a new Databricks workspace, accompanying storage account, keyvault, and managed identity via an access connector.
terraform-databricks-databricks-runtime-premium
Terraform module for managing Databricks Premium Workspace
stacks-terraform
Modular terraform modules in use by Ensono Stacks and available for public
dbx_mws_example
An example of a multiple workspace deployment with reusable modules.