Fredrik Bakken (FredrikBakken)

FredrikBakken

Geek Repo

Company:@norges-bank

Location:Oslo, Norway

Home Page:https://fredrikbakken.no/

Github PK Tool:Github PK Tool

Fredrik Bakken's starred repositories

mattermost

Mattermost is an open source platform for secure collaboration across the entire software development lifecycle..

Language:TypeScriptLicense:NOASSERTIONStargazers:30197Issues:547Issues:8434

OpenVoice

Instant voice cloning by MIT and MyShell.

Language:PythonLicense:MITStargazers:28556Issues:211Issues:236

data-engineering-zoomcamp

Free Data Engineering course!

Language:Jupyter NotebookStargazers:24655Issues:453Issues:125

Bend

A massively parallel, high-level programming language

Language:RustLicense:Apache-2.0Stargazers:17240Issues:91Issues:249

Cookbook

The Data Engineering Cookbook

data-engineer-roadmap

Roadmap to becoming a data engineer in 2021

data-engineer-handbook

This is a repo with links to everything you'd ever want to learn about data engineering

marimo

A reactive notebook for Python — run reproducible experiments, execute as a script, deploy as an app, and version with git.

Language:PythonLicense:Apache-2.0Stargazers:6486Issues:32Issues:621

Data-Engineering-HowTo

A list of useful resources to learn Data Engineering from scratch

unitycatalog

Open, Multi-modal Catalog for Data & AI

Language:JavaLicense:Apache-2.0Stargazers:2260Issues:48Issues:187

Daft

Distributed DataFrame for Python designed for the cloud, powered by Rust

Language:RustLicense:Apache-2.0Stargazers:2103Issues:16Issues:622

awesome-opensource-data-engineering

An Awesome List of Open-Source Data Engineering Projects

data-engineering-practice

Data Engineering Practice Problems

pyspark-example-project

Implementing best practices for PySpark ETL jobs and applications.

data-engineering-wiki

The best place to learn data engineering. Built and maintained by the data engineering community.

Language:CSSLicense:CC0-1.0Stargazers:1326Issues:27Issues:29

polaris

The interoperable, open source catalog for Apache Iceberg

Language:PythonLicense:Apache-2.0Stargazers:1033Issues:99Issues:75

polyfactory

Simple and powerful factories for mock data generation

Language:PythonLicense:MITStargazers:1015Issues:14Issues:198

nessie

Nessie: Transactional Catalog for Data Lakes with Git-like semantics

Language:JavaLicense:Apache-2.0Stargazers:987Issues:31Issues:814

sparkMeasure

This is the development repository for sparkMeasure, a tool and library designed for efficient analysis and troubleshooting of Apache Spark jobs. It focuses on easing the collection and examination of Spark metrics, making it a practical choice for both developers and data engineers.

Language:ScalaLicense:Apache-2.0Stargazers:696Issues:34Issues:40

datacontract-cli

CLI to manage your datacontract.yaml files

Language:PythonLicense:NOASSERTIONStargazers:435Issues:15Issues:178

dbldatagen

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

Language:PythonLicense:NOASSERTIONStargazers:312Issues:14Issues:76

sqruff

Fast SQL formatter/linter

Language:RustLicense:Apache-2.0Stargazers:295Issues:4Issues:47

datacontract-specification

The Data Contract Specification Repository

Language:HTMLLicense:MITStargazers:235Issues:17Issues:49

spark-expectations

A Python Library to support running data quality rules while the spark job is running⚡

Language:PythonLicense:Apache-2.0Stargazers:161Issues:15Issues:61

iceberg-rust

Rust implementation of Apache Iceberg with integration for Datafusion

Language:RustLicense:Apache-2.0Stargazers:86Issues:8Issues:22

data-factory-testing-framework

A stand-alone test framework that allows to write unit tests for Data Factory pipelines on Microsoft Fabric, Azure Data Factory and Azure Synapse Analytics.

Language:PythonLicense:MITStargazers:77Issues:11Issues:53

unitycatalog-rs

Open, Multi-modal Catalog for Data & AI, written in Rust

Language:RustLicense:Apache-2.0Stargazers:72Issues:9Issues:6

sparkdantic

✨ A Pydantic to PySpark schema library

Language:PythonLicense:MITStargazers:53Issues:4Issues:23

nbstripout-fast

Strip metadata from jupyter notebooks

Language:RustLicense:BSD-3-ClauseStargazers:18Issues:5Issues:4

initiatives-talk

Repo with relevant code for my talk about some tools to help you meet your company's big fancy initiatives

Language:PythonStargazers:3Issues:1Issues:0