Fredrik Bakken's starred repositories
mattermost
Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..
data-engineering-zoomcamp
Free Data Engineering course!
data-engineer-roadmap
Roadmap to becoming a data engineer in 2021
data-engineer-handbook
This is a repo with links to everything you'd ever want to learn about data engineering
Data-Engineering-HowTo
A list of useful resources to learn Data Engineering from scratch
unitycatalog
Open, Multi-modal Catalog for Data & AI
awesome-opensource-data-engineering
An Awesome List of Open-Source Data Engineering Projects
data-engineering-practice
Data Engineering Practice Problems
pyspark-example-project
Implementing best practices for PySpark ETL jobs and applications.
data-engineering-wiki
The best place to learn data engineering. Built and maintained by the data engineering community.
polyfactory
Simple and powerful factories for mock data generation
sparkMeasure
This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.
datacontract-cli
CLI to manage your datacontract.yaml files
dbldatagen
Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines
datacontract-specification
The Data Contract Specification Repository
spark-expectations
A Python Library to support running data quality rules while the spark job is running⚡
iceberg-rust
Rust implementation of Apache Iceberg with integration for Datafusion
data-factory-testing-framework
A stand-alone test framework that allows to write unit tests for Data Factory pipelines on Microsoft Fabric, Azure Data Factory and Azure Synapse Analytics.
unitycatalog-rs
Open, Multi-modal Catalog for Data & AI, written in Rust
sparkdantic
✨ A Pydantic to PySpark schema library
nbstripout-fast
Strip metadata from jupyter notebooks
initiatives-talk
Repo with relevant code for my talk about some tools to help you meet your company's big fancy initiatives