Kyle Pierce's starred repositories

mermaid

Generation of diagrams like flowcharts or sequence diagrams from text in a similar manner as markdown

Language:JavaScriptLicense:MITStargazers:68798Issues:636Issues:2846

localstack

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline

Language:PythonLicense:NOASSERTIONStargazers:52973Issues:512Issues:5456

cli

🥧 HTTPie CLI — modern, user-friendly command-line HTTP client for the API era. JSON support, colors, sessions, downloads, plugins & more.

Language:PythonLicense:BSD-3-ClauseStargazers:32766Issues:87Issues:864

simdjson

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

Language:C++License:Apache-2.0Stargazers:18790Issues:241Issues:813

luigi

Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Language:PythonLicense:Apache-2.0Stargazers:17523Issues:474Issues:983

cube

📊 Cube — The Semantic Layer for Building Data Applications

Language:RustLicense:NOASSERTIONStargazers:17499Issues:155Issues:2250

prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Language:PythonLicense:Apache-2.0Stargazers:15362Issues:159Issues:5226

airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Language:PythonLicense:NOASSERTIONStargazers:14764Issues:180Issues:13860

data-engineer-roadmap

Roadmap to becoming a data engineer in 2021

dagster

An orchestration platform for the development, production, and observation of data assets.

Language:PythonLicense:Apache-2.0Stargazers:10714Issues:115Issues:7075

trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)

Language:JavaLicense:Apache-2.0Stargazers:9875Issues:170Issues:6397

gridstudio

Grid studio is a web-based application for data science with full integration of open source data science frameworks and languages.

Language:JavaScriptLicense:AGPL-3.0Stargazers:8875Issues:324Issues:131

mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

Language:PythonLicense:Apache-2.0Stargazers:7420Issues:61Issues:725

papermill

📚 Parameterize, execute, and analyze notebooks

Language:PythonLicense:BSD-3-ClauseStargazers:5718Issues:89Issues:398

google-cloud-python

Google Cloud Client Library for Python

Language:PythonLicense:Apache-2.0Stargazers:4729Issues:296Issues:3800

amundsen

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Language:PythonLicense:Apache-2.0Stargazers:4335Issues:234Issues:682

llm-app

Dynamic RAG for enterprise. Ready to run with Docker,⚡in sync with Sharepoint, Google Drive, S3, Kafka, PostgreSQL, real-time data APIs, and more.

texthero

Text preprocessing, representation and visualization from zero to hero.

Language:PythonLicense:MITStargazers:2879Issues:42Issues:120

dbml

Database Markup Language (DBML), designed to define and document database structures

Language:JavaScriptLicense:Apache-2.0Stargazers:2526Issues:37Issues:223

DataProfiler

What's in your data? Extract schema, statistics and entities from datasets

Language:PythonLicense:Apache-2.0Stargazers:1385Issues:21Issues:179

pydash

The kitchen sink of Python utility libraries for doing "stuff" in a functional way. Based on the Lo-Dash Javascript library.

Language:PythonLicense:MITStargazers:1282Issues:19Issues:135

streamz

Real-time stream processing for python

Language:PythonLicense:BSD-3-ClauseStargazers:1231Issues:35Issues:262

name-dataset

The Python library for names.

Language:PythonLicense:Apache-2.0Stargazers:803Issues:17Issues:30

sql-metadata

Uses tokenized query returned by python-sqlparse and generates query metadata

Language:PythonLicense:MITStargazers:762Issues:16Issues:170

pdpipe

Easy pipelines for pandas DataFrames.

Language:Jupyter NotebookLicense:MITStargazers:714Issues:17Issues:53

AttackVectors

A repository to monitor attack vectors from state-backed information operations

Language:HTMLLicense:MITStargazers:392Issues:41Issues:39

dbt-ga4

dbt Package for modeling raw data exported by Google Analytics 4. BigQuery support, only.

Language:SQLLicense:MITStargazers:291Issues:27Issues:150

droughty

Droughty helps keep your workflow dry

Language:PythonLicense:MITStargazers:59Issues:5Issues:34